Operator ToolboxOperator Toolbox

This extension couples some useful additional operators together.

This extension adds a bunch of new operators to RapidMiner. They range from utility operators to improve the flexibility and usability of the process design, over additional outlier detection algorithm and additional performance criteria to advanced analysis methods like Local Interpretation or the SMOTE algorithm.

Currently the extension provides the following Operators:

  • Blending

    • Table

      • Collect and Persist

      • Group Into Collection

      • Merge Attributes

      • Append (Superset)

    • Attribute Generation

      • Calculate Overlaps

      • Generate Levenshtein Distance

      • Generate Phonetic Encoding

      • Generate Session ID 

    • Extract Statistics
    • Filter Attributes with Missing Values

    • Filter Examples with Missing Values

    • Replace Rare Values (improved in 2.1.0)

    • SMOTE Upsampling

    • Weight of Evidence

  • Data Access:

    • Read Excel Sheet Names (improved in 2.1.0)

    • SFTP Download File

    • SFTP Upload File

  • Macros:

    • Extract Macro (Format)

    • Extract Last Modifying Operator

    • Set Macros from ExampleSet

    • Set Macro (Real)

  • Models:

    • Check Model Conformance

    • Get Decision Tree Path

    • Local Interpretation (LIME)

    • Optimize Threshold

    • Optimize Threshold (Subprocess) (new in 2.1.0)

    • Random Forest Encoder

  • Outliers:

    • Tukey Test (user error fix in 2.1.0)

  • Parameters:

    • Get Parameters

    • Set Parameters from ExampleSet

  • Performance

    • Performance (AUPRC)

  • Text Processing:

    • Apply Model (Documents) (user error fix in 2.1.0)

    • Extract Sentiment

    • Dictionary-Based Sentiment

    • Extract Topics from Data (LDA)

    • Extract Topics from Documents (LDA)

    • Filter Tokens Using ExampleSet

    • Split Document into Collection

    • Stem Tokens Using ExampleSet

Version 2.1.0 (2019-06-27)

  • New operator: Optimize Threshold (Subprocess):
    • works like Optimize Threshold, but lets you define the performance measure manually
  • Improvements of Read Excel Sheet Names operator:
    • added throughput port for the input file object
    • changed the "id" attribute to be called "Sheet Number", to have an id role and to start counting at 1 (as the Read Excel operator also starts with 1), operator is not compatible with earlier processes anymore!
    • added the parameter "file name", similar to the ReadExcel operator
  • Improvements of the Replace Rare Values operator
    • Operator now cleans the mapping of the selected attributes
    • Removed "create view" parameter (which have had no effect)
    • Added proper Meta Data handling
  • Enhancements
    • Added appropriate aylien and meaning cloud icons
    • Fixed a "unknown error message" in Apply Model (Documents)
    • Added new UserError to Tukey Test Preprocessing model, in case the attribute type is wrong.

Version 2.0.1 (2019-04-23)

  • Enhancements:
    • Extract Sentiment has now the option to use custom words in dictionary based methods and scores for dictionary based methods are now normalized. The max(abs(score)) is defined as 1.
  • Bugfixes:
    • Extract Sentiment: Fixed a bug in SentiWordNet where Positivity and Negativity was wrongly calculated

    • Append (Superset): BugFix when matching attributes of mismatched types (Real and Integer) they were always converted to polynominal.

Version 2.0.0 (2019-03-26)

  • New operator: Optimize Thresholds
    • automatically determines the best threshold for a binominal classification problem
  • New operator: Extract Macro (Format)
    • same functionality as Extract Macro, but you can specify the format of numbers and dates
  • New operator: Extract Sentiment
    • extracting sentiment from text inputs using dictionary and API based methods (Vader, Sentinet, Aylien, MeaningCloud)
  • New operator: Append (Superset)
    • same functionality as Append, but you can append ExampleSets with different attributes
  • Enhancements:
    • Restructured the blending operator folder
      • added subfolders 'Tables' and 'Attribute Generation'
      • changed the color of the operators in these subfolders
    • Added several tags for Replace Rare Values
    • Merge operator:
      • Corrected Meta Data
      • Changed that in case attributes or annotations are renamed, that the first occuring attribute/annotation is also renamed
    • Bugfix in Phonetic Encoding and Extract Statistics in rare cases of attribute selection

Version 1.8.0 (2019-02-08)

  • Get Decision Tree Path now also works for regression trees
  • Bugfix for SMOTE operator when not using normalized distances
  • Create ExampleSet is now deprecated cause it is in RM Studio Core since 9.2.0
  • Moved t-SNE operator and Parameteric Probability Estimator to the separate Smile Extension (available on Marketplace)
    • Removed all functionality from the two operators
    • Only ProcessSetupError occurs when operators are inserted and a UserError is thrown when operator is executed
    • Both operators are deprecated now
    • Smile library dependency is removed

Version 1.7.0 (2018-11-26)

  • LDA changes:
    • New operator Extract Topics from Data (LDA), able to run LDA on ExampleSets
    • Renamed Extract Topics from Document (LDA), to Extract Topics from Documents (LDA)
    • Updated default optimization interval of the hyperparameters in both LDA operators from 50 to 10
    • Added AlphaSum, Beta and BetaSum to the performance vector output.
    • Bugfix in case numerical meta data element is missing
    • Bugfix that causes the LDA to not respect preprocessing step like Filter Stopwords
  • Bugfix TSNE operator: fixed a bug in handling special attributes
  • Bugfix in Check Model Conformance, that nominal attributes were checked if fail on error is true, even if check nominals is false

Version 1.6.0 (2018-11-12)

  • New operator Check Model Conformance
  • New operator Filter Attributes with Missing Values
  • New operator Filter Examples with Missing Values

Version 1.5.0 (2018-09-14)

  • Improvement of Dictionary-Based Sentiment
    • Added a symmetric negation window option. If selected the negation also look backwards
    • Added the negation token to the result string
    • Fixed a bug in case double (or more negations) occur
  • Improvement of SMOTE Upsampling
    • SMOTE is now throwing a UserError if the wrong label types (Numeric or None) are used.
    • Bugfix for label attributes which contains comparison characters.
  • Local Interpretation (LIME)
    • Renamed operator to Local Interpretation (LIME)
    • Improved Meta Data Handling
  • General:
    • Improved parameter descriptions for several operators
    • Improved expert and mandatory parameters settings and removed unused encoding parameters.
    • Added a progress bar for several operators

Version 1.4.0 (2018-08-27)

  • New operator Calculate Overlaps

Version 1.3.0 (2018-07-27)

  • New operator Set Macro (Real)
  • Additional parameter 'use absolutes' for Generate Session Id
  • Enhancement for meta data of document models
  • Bugfix for Performance (AUPRC)
  • Enhancements of the Extract Topics from Document (LDA) operator
    • LDA model now has a overview over various topic diagnostics measures.
      • Accessible over the LDA model table renderer in the result view.
    • Parameters seed, thinning, burnin and iterations for the LDA model can now be set by the application parameters of the Apply Model operator
    • Added perplexity as a performance measure of LDA.
    • Added an option to get the mallet logging into the RapidMiner log panel
    • Bugfix for meta data when storing a LDA model
    • LDA is now correctly using the token text of a document, not the display text

Product Details

Version 2.1.0
File size 14 MB
Downloads 65226 (110 Today)65226 downloads
Vendor RapidMiner Labs
Category Operators
Released 6/27/19
Last Update 6/27/19 1:57 PM
(Changes)
License AGPL
Product web site
Rating 0.0 stars(0)