Data Search for Data MiningData Search for Data Mining

This extension provides various data search and integration methods for enriching (extending) a data table, using a heterogenous tabular corpus. These include Correspondence Search (Search-Join for single attribute) including human-in-the-loop refinements, Unconstrained and Correlation Search. Some operators of this extension require a Data Search server developed by University of Mannheim, which maintains the public enpoints.

This extension provides automated and semi-automated methods for data augmentation, which includes data search, attribute discovery and integration of new attributes to a data set.

The extension provides i) Single-attribute data augmentation, also called as Constrained augmentation or governed data discovery. This discovers a specific attribute as dictated by the user from a given corpus and ii) Multi-attribute data augmentation, also called as Unconstrained augmentation. This discovers relevant attribute from the corpus and augments these to the given data set.

Currently, the extension provides the following operators:

  • Legacy: Some of the operators are now considerd as legacy and are replaced by other operators to make the extension more independent of any Backend search server. These include:
    • Data Search
    • Fuse
    • Correlation-Based Search
    • Unconstrained Search
  • Single Attribute Augmentation: This group includes operators that work together in a single operator chain.
    • Create Correspondences
    • Translate
    • Advanced Fuse
  • Multi Attribute Augmentation
    • Enrich Table by Data Fusion
  • Repository Management: This group contains operators for creating repositories and uploading data in them. Currently, you may setup an instance of data search server (developed by University of Mannheinm) on your premise.
    • Create Repository
    • Data Table Upload
    • Data Tables Upload
  • Data Table Search: This group provides access to search engines.
    • Google Table Search

Version 2.1.0 (26-04-2019)

  • Two bugfixes in Enrich Table by Data Fusion operator.
  • A parameter added in Enrich Table by Data Fusion to balance coverage with precision.
  • A new dataset, a tutorial process and application template added for fully automated augmentation.

Version 2.0.0 (16-11-2018)

  • New operator Create Correspondences. This operator implements Constrained Data Augmentation algorithm without depending on Mannheim data search server.
  • Rearranged operators into new operator groups (Legacy, Single Attribute Augmentation and Multi Attribute Augmentation).

Version 1.0.1 (30-07-2018)

  • New operator Enrich Table by Data Fusion. This operator implements Unconstrained Data Augmentation algorithm without depending on Mannheim data search server.
  • New operator Unconstrained Search. This operator depends on Mannheim data search server.
  • New operator Correlation-Based Search. This operator depends on Mannheim data search server.

Version 0.2.0 (30-01-2018)

  • New component Connection Manager added to easily maintain connections with multiple instances or endpoints of data search server.
  • Repository Management operator group added with following new operators to create repository and upload data:
    • Create Repository
    • Data Table Upload 
    • Data Tables Upload

Version 0.1.3 (19-10-2017)

  • New operator Google Table Search

Product Details

Version 2.1.0
File size 15 MB
Downloads 12699 (8 Today)12699 downloads
Vendor RapidMiner Labs
Category Operators
Released 4/26/19
Last Update 4/26/19 4:18 PM
(Changes)
License AGPL
Product web site http://ds4dm.de
Rating 0.0 stars(0)