Operator ToolboxOperator Toolbox

This extension couples some useful additional operators together.

This extension adds a bunch of new operators to RapidMiner. They range from utility operators to improve the flexibility and usability of the process design, over additional outlier detection algorithm and additional performance criteria to advanced analysis methods like Local Interpretation or the SMOTE algorithm.

Since 2.6.0 the extension also adds new functions to the expression parser. These function can be used in operators like 'Generate Attributes', 'Filter Examples', 'Create ExampleSet'.

The extension provides the following additional functions to the expression parser:

  • Fuzzy Matching:

    • fuzzy_match ( Nominal first, Nominal second, Constant method )

The extension provides the following Operators:

  • Blending

    • Table

      • Append (Superset)

      • Collect and Persist

      • Fuzzy Matching (new in 2.7.0)

      • Generate Aggregation (Advanced) (new in 2.7.0)

      • Group Into Collection

      • Merge Attributes

      • Sample Collection

      • Sort (Multiple) (new in 2.7.0)

    • Attribute Generation

      • Calculate Overlaps

      • Generate Levenshtein Distance

      • Generate Phonetic Encoding

      • Generate Session ID 

    • Build Simulation

    • Extract Statistics
    • Filter Attributes with Missing Values

    • Filter Examples with Missing Values

    • Generate Partial Dependency Plot Data

    • Get Holidays

    • Rename by Multiple Examples

    • Replace Rare Values

    • SMOTE Upsampling (bugfix in 2.7.0)

    • Weight of Evidence

  • Data Access:

    • Read Excel Sheet Names

    • Read Office File (renamed and enhanced in 2.7.0)

    • Read SFTP

    • Write SFTP

    • Un-gzip

  • Macros:

    • Extract Macro (Format)

    • Extract Last Modifying Operator

    • Set Macros from ExampleSet

    • Set Macro (Real)

  • Models:

    • Check Model Conformance

    • Get Decision Tree Path (bugfix in 2.7.0)

    • GLM Contribution

    • Local Interpretation (LIME)

    • Optimize Threshold

    • Optimize Threshold (Subprocess)

    • Random Forest Encoder

  • Outliers:

    • Tukey Test

  • Parameters:

    • Get Parameters

    • Set Parameters from ExampleSet

  • Performance

    • Performance (AUPRC)

  • Text Processing:

    • Apply Model (Documents) 

    • Extract Sentiment

    • Dictionary-Based Sentiment

    • Extract Topics from Data (LDA) (bugfix in 2.7.0)

    • Extract Topics from Documents (LDA)

    • Filter Tokens Using ExampleSet

    • Split Document into Collection

    • Stem Tokens Using ExampleSet

  • Utility

    • Set Meta Data (new in 2.7.0)

    • Subprocess (Caching)

    • Try (Multiple) (new in 2.7.0)

Version 2.7.0 (2020-09-15)

  • New Operator: Sort (Multiple)
    • Allows to sort ExampleSets according to more than one attribute.
  • New Operator: Try (Multiple)
    • Similar to Handle Exception but lets you try multiple variants.
  • New Operator: Fuzzy Matching:
    • Similar to Cross Distance but works with Levenshtein Distances.
  • New Operator: Set Meta Data
    • Allows you to change the meta data of an example set manually.
  • New Operator: Generate Aggregation (Advanced)
    • Similar to Generate Aggregation, but has additional functions.
  • Renamed Read Word Files to Read Office Files
    • Read Office Files can read ppt and pptx as well as doc and docx.
  • Bugfixes:
    • Bug in Get Decision Tree Path operator, which created seemingly wrong and random splits for nominal attributes
    • Bug causing a NPE in Extract Topics (Data) if you add a non-existing text attribute. It now throws an error.
    • Bug where an ending white space in a label class causes misleading error messages in SMOTE.

Version 2.6.0 (2020-06-10)

  • New function in the expression parser:
    • ?fuzzy_match: The function allows you to compare two nominal values using various Levenshtein based measures
  • Bugfix Build Simulation operator:
    • Fixed a bug which prevented the simulation model to be retrieved after it was stored in the repository

Version 2.5.0 (2020-06-03)

  • Replaced operators SFTP Download File and SFTP Upload File with Read SFTP and Write SFTP

    • Old processes can still use the old implementation.

    • Read SFTP and Write SFTP support the new connection management framework and proxies and use file objects (purple colored IOObjects) for easier usage.

  • New operator: Apply Association Rules (detailed)

    • It applies association rules and gives you more detailed results than the original one. This includes all applying rules for any given example and their respective measures.

  • Enhancement Build Simulation operator

    • Build Simulation is now able to enforce some of the generated attributes to be constant either by specifying a parameter or providing an exampleset

Version 2.4.0 (2020-04-28)

  • New operator: Build Simulation
    • Generate new artificial data with similar statistical properties to a reference data set
  • New operator: Rename by Multiple Examples
    • Rename attributes using multiple examples and specified fill characters
  • New operator: Un-Gzip
    • Unpack gzip compressed files
  • Enhancements:
    • Dictionary-Based Sentiment (Documents):
      • Operator now supports intensifier word (like "very" or "relatively" which enhance or dehance pre- or succeeding words)
      • Operator is now able to have weights for the negativity
    • Extract Sentiment
      • Operator is now using the new Connection Management
  • Other:
    • Operator Toolbox Extension 2.4.0 needs at least RM Studio 9.4.1

Version 2.3.0 (2019-12-18)

  • New operator: Get Holidays
    • Returns every holiday (national and state holidays) in a given country for given days
  • New operator: Subprocess (Caching)
    • A drop in replacement for the 'Subprocess' operator which allows to cache results during design time. This allows for faster process prototyping
  • Enhancements
    • Sample (Collection): samplesize does not need be be smaller than collection size anymore for bootstrapping. This allows to oversample.
    • Extract Macro (Format) allows now to extract the number of items in a collection.
    • Optimize Threshold and Optimize Threshold (Subprocess) are now logging their respective performance values.
  • Bugfixes:
    • Fixed a bug in 'Smote Upsampling' which considered dates as nominal attributes and caused crashes. Dates are now treated like any other numerical attribute.
    • Fixed a bug, that Sample (Collection) was taking mostly items from the beginning of a collection in Boostrap mode

Version 2.2.0 (2019-09-11)

  • New operator: GLM Contribution
    • calculates the influence of individual attribute values to a GLM prediction
  • New operator: Generate Partial Dependency Plot Data
    • generates Partial Dependency Plot Data for all numeric attributes used in a model. This is a useful technique to understand the influence of an attribute to the overall prediction of the model
  • New operator: Sample Collection
    • takes an input collection and sample it to a given sample size
  • Enhancement: Apply Model (Documents)
    • Improved meta data propagation when applied on new documents

Version 2.1.0 (2019-06-27)

  • New operator: Optimize Threshold (Subprocess):
    • works like Optimize Threshold, but lets you define the performance measure manually
  • Improvements of Read Excel Sheet Names operator:
    • added throughput port for the input file object
    • changed the "id" attribute to be called "Sheet Number", to have an id role and to start counting at 1 (as the Read Excel operator also starts with 1), operator is not compatible with earlier processes anymore!
    • added the parameter "file name", similar to the ReadExcel operator
  • Improvements of the Replace Rare Values operator
    • Operator now cleans the mapping of the selected attributes
    • Removed "create view" parameter (which have had no effect)
    • Added proper Meta Data handling
  • Enhancements
    • Added appropriate aylien and meaning cloud icons
    • Fixed a "unknown error message" in Apply Model (Documents)
    • Added new UserError to Tukey Test Preprocessing model, in case the attribute type is wrong.

Version 2.0.1 (2019-04-23)

  • Enhancements:
    • Extract Sentiment has now the option to use custom words in dictionary based methods and scores for dictionary based methods are now normalized. The max(abs(score)) is defined as 1.
  • Bugfixes:
    • Extract Sentiment: Fixed a bug in SentiWordNet where Positivity and Negativity was wrongly calculated

    • Append (Superset): BugFix when matching attributes of mismatched types (Real and Integer) they were always converted to polynominal.

Version 2.0.0 (2019-03-26)

  • New operator: Optimize Thresholds
    • automatically determines the best threshold for a binominal classification problem
  • New operator: Extract Macro (Format)
    • same functionality as Extract Macro, but you can specify the format of numbers and dates
  • New operator: Extract Sentiment
    • extracting sentiment from text inputs using dictionary and API based methods (Vader, Sentinet, Aylien, MeaningCloud)
  • New operator: Append (Superset)
    • same functionality as Append, but you can append ExampleSets with different attributes
  • Enhancements:
    • Restructured the blending operator folder
      • added subfolders 'Tables' and 'Attribute Generation'
      • changed the color of the operators in these subfolders
    • Added several tags for Replace Rare Values
    • Merge operator:
      • Corrected Meta Data
      • Changed that in case attributes or annotations are renamed, that the first occuring attribute/annotation is also renamed
    • Bugfix in Phonetic Encoding and Extract Statistics in rare cases of attribute selection

Version 1.8.0 (2019-02-08)

  • Get Decision Tree Path now also works for regression trees
  • Bugfix for SMOTE operator when not using normalized distances
  • Create ExampleSet is now deprecated cause it is in RM Studio Core since 9.2.0
  • Moved t-SNE operator and Parameteric Probability Estimator to the separate Smile Extension (available on Marketplace)
    • Removed all functionality from the two operators
    • Only ProcessSetupError occurs when operators are inserted and a UserError is thrown when operator is executed
    • Both operators are deprecated now
    • Smile library dependency is removed

Product Details

Version 2.7.0
File size 15 MB
Downloads 112890 (174 Today)112890 downloads
Vendor RapidMiner Labs
Category Operators
Released 9/15/20
Last Update 9/15/20 3:29 PM
(Changes)
License AGPL
Product web site
Rating 0.0 stars(0)