Data Search for Data MiningData Search for Data Mining

This extension provides various data search and integration methods for enriching (extending) a data table, using a heterogenous tabular corpus. These include Correspondence Search (Search-Join for single attribute) including human-in-the-loop refinements, Unconstrained and Correlation Search. Some operators of this extension require a Data Search server developed by University of Mannheim, which maintains the public enpoints.

[Unsupported Extension Notice] - The extension Keras Extension is not officially supported by Altair RapidMiner. While we're always improving and updating our offerings, we can't guarantee any help or fixes for this extension.

 

This extension provides automated and semi-automated methods for data augmentation, which includes data search, attribute discovery and integration of new attributes to a data set.

The extension provides i) Single-attribute data augmentation, also called as Constrained augmentation or governed data discovery. This discovers a specific attribute as dictated by the user from a given corpus and ii) Multi-attribute data augmentation, also called as Unconstrained augmentation. This discovers relevant attribute from the corpus and augments these to the given data set.

Currently, the extension provides the following operators:

  • Legacy: Some of the operators are now considerd as legacy and are replaced by other operators to make the extension more independent of any Backend search server. These include:
    • Data Search
    • Fuse
    • Correlation-Based Search
    • Unconstrained Search
  • Single Attribute Augmentation: This group includes operators that work together in a single operator chain.
    • Create Correspondences
    • Translate
    • Advanced Fuse
  • Multi Attribute Augmentation
    • Enrich Table by Data Fusion
  • Repository Management: This group contains operators for creating repositories and uploading data in them. Currently, you may setup an instance of data search server (developed by University of Mannheinm) on your premise.
    • Create Repository
    • Data Table Upload
    • Data Tables Upload
  • Data Table Search: This group provides access to search engines.
    • Google Table Search

Version 2.1.0 (26-04-2019)

  • Two bugfixes in Enrich Table by Data Fusion operator.
  • A parameter added in Enrich Table by Data Fusion to balance coverage with precision.
  • A new dataset, a tutorial process and application template added for fully automated augmentation.

Version 2.0.0 (16-11-2018)

  • New operator Create Correspondences. This operator implements Constrained Data Augmentation algorithm without depending on Mannheim data search server.
  • Rearranged operators into new operator groups (Legacy, Single Attribute Augmentation and Multi Attribute Augmentation).

Version 1.0.1 (30-07-2018)

  • New operator Enrich Table by Data Fusion. This operator implements Unconstrained Data Augmentation algorithm without depending on Mannheim data search server.
  • New operator Unconstrained Search. This operator depends on Mannheim data search server.
  • New operator Correlation-Based Search. This operator depends on Mannheim data search server.

Version 0.2.0 (30-01-2018)

  • New component Connection Manager added to easily maintain connections with multiple instances or endpoints of data search server.
  • Repository Management operator group added with following new operators to create repository and upload data:
    • Create Repository
    • Data Table Upload 
    • Data Tables Upload

Version 0.1.3 (19-10-2017)

  • New operator Google Table Search

Product Details

Version 2.1.0
File size 15 MB
Downloads 14013 (24 Today)14013 downloads
Vendor RapidMiner Labs
Category Operators
Released 4/26/19
Last Update 4/26/19 4:18 PM
(Changes)
License AGPL
Product web site http://ds4dm.de
Rating 0.0 stars(0)