Data Search for Data Mining
This extension provides various data search and integration methods for enriching (extending) a data table, using a heterogenous tabular corpus. These include Correspondence Search (Search-Join for single attribute) including human-in-the-loop refinements, Unconstrained and Correlation Search. Some operators of this extension require a Data Search server developed by University of Mannheim, which maintains the public enpoints.
This extension provides automated and semi-automated methods for data augmentation, which includes data search, attribute discovery and integration of new attributes to a data set.
The extension provides i) Single-attribute data augmentation, also called as Constrained augmentation or governed data discovery. This discovers a specific attribute as dictated by the user from a given corpus and ii) Multi-attribute data augmentation, also called as Unconstrained augmentation. This discovers relevant attribute from the corpus and augments these to the given data set.
Currently, the extension provides the following operators:
- Legacy: Some of the operators are now considerd as legacy and are
replaced by other operators to make the extension more independent
of any Backend search server. These include:
- Data Search
- Fuse
- Correlation-Based Search
- Unconstrained Search
- Single Attribute Augmentation: This group includes operators that
work together in a single operator chain.
- Create Correspondences
- Translate
- Advanced Fuse
- Multi Attribute Augmentation
- Enrich Table by Data Fusion
- Repository Management: This group contains operators for creating
repositories and uploading data in them. Currently, you may setup an
instance of data search server (developed by University of
Mannheinm) on your premise.
- Create Repository
- Data Table Upload
- Data Tables Upload
- Data Table Search: This group provides access to search engines.
- Google Table Search
Version 2.1.0 (26-04-2019)
- Two bugfixes in Enrich Table by Data Fusion operator.
- A parameter added in Enrich Table by Data Fusion to balance coverage with precision.
- A new dataset, a tutorial process and application template added for fully automated augmentation.
Version 2.0.0 (16-11-2018)
- New operator Create Correspondences. This operator implements Constrained Data Augmentation algorithm without depending on Mannheim data search server.
- Rearranged operators into new operator groups (Legacy, Single Attribute Augmentation and Multi Attribute Augmentation).
Version 1.0.1 (30-07-2018)
- New operator Enrich Table by Data Fusion. This operator implements Unconstrained Data Augmentation algorithm without depending on Mannheim data search server.
- New operator Unconstrained Search. This operator depends on Mannheim data search server.
- New operator Correlation-Based Search. This operator depends on Mannheim data search server.
Version 0.2.0 (30-01-2018)
- New component Connection Manager added to easily maintain connections with multiple instances or endpoints of data search server.
- Repository Management operator group added with following new
operators to create repository and upload data:
- Create Repository
- Data Table Upload
- Data Tables Upload
Version 0.1.3 (19-10-2017)
- New operator Google Table Search
Product Details
Version | 2.1.0 |
File size | 15 MB |
Downloads | 13781 (4 Today) |
Vendor | RapidMiner Labs |
Category | Operators |
Released | 4/26/19 |
Last Update | 4/26/19 4:18 PM |
License | AGPL |
Product web site | http://ds4dm.de |
Rating | (0)
|
Comments
Log in to post comments.