Operator ToolboxOperator Toolbox

This extension couples some useful additional operators together.

This extension adds a bunch of new operators to RapidMiner. They range from utility operators to improve the flexibility and usability of the process design, over additional outlier detection algorithm and additional performance criteria to advanced analysis methods like Local Interpretation or the SMOTE algorithm.

Since 2.6.0 the extension also adds new functions to the expression parser. These function can be used in operators like 'Generate Attributes', 'Filter Examples', 'Create ExampleSet'.

The extension provides the following additional functions to the expression parser:

  • Fuzzy Matching:

    • fuzzy_match ( Nominal first, Nominal second, Constant method )

The extension provides the following Operators:

  • Blending

    • Table

      • Append (Superset)

      • Collect and Persist

      • Fuzzy Matching

      • Generate Aggregation (Advanced)

      • Group Into Collection

      • Merge Attributes

      • Sample Collection

      • Sort (Multiple)

    • Attribute Generation

      • Calculate Overlaps

      • Generate Levenshtein Distance

      • Generate Phonetic Encoding

      • Generate Session ID 

      • Generate Power Tranform

    • Build Simulation

    • Extract Statistics
    • Filter Attributes with Missing Values

    • Filter Examples with Missing Values

    • Generate Partial Dependency Plot Data

    • Get Holidays

    • Rename by Multiple Examples

    • Replace Rare Values

    • SMOTE Upsampling

    • Weight of Evidence

  • Data Access:

    • Read Excel Sheet Names

    • Read Office File

    • Read SFTP

    • Read Yahoo Finance

    • Write SFTP

    • Un-gzip

  • Data Export

    • Store (Tagged)

  • Feature Selection

    • Select by Weights (Multi)

  • Macros:

    • Extract Macro (Format)

    • Extract Last Modifying Operator

    • Set Macros from ExampleSet

    • Set Macro (Real)

  • Models:

    • Check Model Conformance

    • Get Decision Tree Path

    • GLM Contribution

    • Local Interpretation (LIME)

    • Optimize Threshold

    • Optimize Threshold (Subprocess)

    • Random Forest Encoder

    • Subset Modeling (2.16)

  • Outliers:

    • Detect Outliers (Univariate)

    • Isolation Forest

    • Tukey Test

  • Parameters:

    • Get Parameters

    • Set Parameters from ExampleSet

  • Performance

    • Performance (AUPRC)

  • Text Processing:

    • Apply Model (Documents) 

    • Extract Sentiment

    • Dictionary-Based Sentiment

    • Extract Topics from Data (LDA)

    • Extract Topics from Documents (LDA)

    • Filter Tokens Using ExampleSet

    • Split Document into Collection

    • Stem Tokens Using ExampleSet

  • Utility

    • Base64 to Image (new in 2.14.0)

    • Execute Remote Program

    • Image to Base64 (new in 2.14.0)

    • List Repository Objects (enhanced in 2.14.0)

    • Log Operator Runtimes (new in 2.14.0)

    • Loop (While) (2.16)

    • Scan Processes

    • Set Meta Data

    • Subprocess (Caching)

    • Subprocess (Parallel)

    • Try (Multiple)

Changes 2.17.0 (2023-10-26)

  • * Adding an option to SMOTE to not append to the original data set but only return artificial examples.
  • Changed the role of the id attribute introduced in 2.16 in Generate Session id to metadata, not id anymore because this would drop other ids. Added Compability levels so that you still get the old behaviour for old processes.
  • Added an error message for Generate Session ID if there was already an id present.

Version 2.16.0 (2023-08-23)

  • Added an operator Subset Modeling, which allows you to build one meta model on subsets of the same data set.
  • Adding a new operator Loop (While) which allows a while loop with an expression based break criterium.

Version 2.14.0 (2022-08-25)

  • New operators: Base64 to Image and Image to Base64

    • Allows to transform image file objects into base64 encoded strings and back

  • New operator: Log Operator Runtimes

    • Allows to log the runtimes of all operator executions within a process

  • Enhancements:

    • List Repository Objects now returns the size of the object and the access
      rights for legacy server repositories

    • Updated the version of the yahoo finance lib to fix a bug which only appears for US Ips

Version 2.13.0 (2022-02-23)

  • New operator: Generate Power Transform
    • Allows to easily compute BoxCox and YeoJohnson transformations.

Version 2.12.0 (2021-10-06)

  • New Operator: Subprocess (Parallel)
    • Allows the explicit parallelization of subprocesses (parallel execution capabilities are still limited by the license).
  • Improvements:
    • Execute Remote Program now also provides the error output, fails by default if the error code is not 0 and can log the results in the process log file.
  • Bugfixes:
    • Fixed an error in Extract Topics from Data/Documents (LDA), which created errors when using optimize - Fixed a bug that Detect Outliers (Univariate) does not have proper meta data.
    • Bugfix for Merge Attributes operator. Reduced the general memory allocation of the Merge Attributes operator after it is executed
      • This bug could cause, that not finished process instances (and possible large parallel executions of the operator) could run into memory leaks

Version 2.11.0 (2021-06-22)

  • New Operator: Execute Remote Program
    • This operator execute commands via SSH on another computer.

Version 2.10.0 (2021-04-30)

  • New Operator: Isolation Forest

    • Tree Learner for anomaly detection.

  • New Operator: Select by Weights (Multi)

    • This operator combines the Weights by ... operators and the Select by Weights operators into a single more convenient operator.

    • The new operator also supports example weights for the computation of the attribute weights.

  • Enhancements

    • Tukey Test operator now has an option to ignore missing values.

    • The color and icons of all outlier operators are changed to indicate for the modelling aspect of this types of operators

    • The scoring of LDA Models (trained by Extract Topics from Data (LDA) and Extract Topics from Documents (LDA)) was adapted, so that there aren't any differences between scoring in the training or the application of the models.

      • Results of LDA can differ on an insignificant level to previous versions

    • Updated the dependency to the RM Text extension from 7.4.1 to 9.3.1

    • Internal refactoring of the code base

Version 2.9.0 (2021-01-22)

  • New Operator: Read Yahoo Finance

    • Pulls stock data for a certain time frame from yahoo finance.

  • Enhancements Extract Sentiment:
    • Added two new dictionaries: Vader (french) and Vader (german).

    • Changed the default tokenization regex from \W to [\s-!"#$%&'()*+,./:;<=>?@[\]_`{|}~]. Now words like 'großartig' are handled correctly.

  • Enhancements LDA:

    • Added option to use stop-words elimination in several languages. By default English stop-word removal is used.

    • Added a 'show optimization settings' setting to LDA to make it easier to use.

  • Bugfix Read SFTP:

    • Fixed a bug which makes Read SFTP to ignore proxies.

Version 2.8.1 (2020-12-01)

  • New Operator: Store (Tagged)
    • Tags each storage operation with the current Date, Process and GIT information (commit hash, author, commit msg).
  • New Operator: Detect Outliers (Univariate)
    • Detects univariate outliers in a data set.
  • New Operator: Scan Process
    • Iterates over a repository and returns you a list of operators used in each process.
  • New Operator: List Repository Objects
    • Lists all objects (Tables, Models, Processes, ...) in a folder structure in a repository.
  • Improved Read and Write SFTP:
    • Added private key option for the SFTP connection objects
  • Bugfix for Set Meta Data
    • Fixed exception if metadata does not exist

Version 2.7.0 (2020-09-15)

  • New Operator: Sort (Multiple)
    • Allows to sort ExampleSets according to more than one attribute.
  • New Operator: Try (Multiple)
    • Similar to Handle Exception but lets you try multiple variants.
  • New Operator: Fuzzy Matching:
    • Similar to Cross Distance but works with Levenshtein Distances.
  • New Operator: Set Meta Data
    • Allows you to change the meta data of an example set manually.
  • New Operator: Generate Aggregation (Advanced)
    • Similar to Generate Aggregation, but has additional functions.
  • Renamed Read Word Files to Read Office Files
    • Read Office Files can read ppt and pptx as well as doc and docx.
  • Bugfixes:
    • Bug in Get Decision Tree Path operator, which created seemingly wrong and random splits for nominal attributes
    • Bug causing a NPE in Extract Topics (Data) if you add a non-existing text attribute. It now throws an error.
    • Bug where an ending white space in a label class causes misleading error messages in SMOTE.

Version 2.6.0 (2020-06-10)

  • New function in the expression parser:
    • ?fuzzy_match: The function allows you to compare two nominal values using various Levenshtein based measures
  • Bugfix Build Simulation operator:
    • Fixed a bug which prevented the simulation model to be retrieved after it was stored in the repository

Version 2.5.0 (2020-06-03)

  • Replaced operators SFTP Download File and SFTP Upload File with Read SFTP and Write SFTP

    • Old processes can still use the old implementation.

    • Read SFTP and Write SFTP support the new connection management framework and proxies and use file objects (purple colored IOObjects) for easier usage.

  • New operator: Apply Association Rules (detailed)

    • It applies association rules and gives you more detailed results than the original one. This includes all applying rules for any given example and their respective measures.

  • Enhancement Build Simulation operator

    • Build Simulation is now able to enforce some of the generated attributes to be constant either by specifying a parameter or providing an exampleset

 


Product Details

Version 2.17.0
File size 126 MB
Downloads 238279 (190 Today)238279 downloads
Vendor RapidMiner Labs
Category Operators
Released 10/26/23
Last Update 10/26/23 1:43 PM
(Changes)
License AGPL
Product web site
Rating 0.0 stars(0)