Operator ToolboxOperator Toolbox

This extension couples some useful additional operators together.

Generate Levenshtein Distance

In text analytics you often challenge the problem of miss spelled words. One of the most common ways to find miss spelled words is to use a distance between the two words. The most frequently used distance measure is the Levenshtein Distance. The Levenshtein distance is defined as the minimum number of single-character edits to transform one string into another. This can be used to generate a replacement dictionary.

Generate Phonetic Encoding

During text processing you might encounter the problem that words are differently spelled but pronounced the same way. Often you want to map these words to the same string. A good example are names like like Jennie, Jenny and Jenni. Algorithms doing these kind of encodings are called phonetic encoders. The current version of the operator supports a broad range of possible algorithms namely: BeiderMorse, Caverphone2, Cologne Phonetic, Double Metaphone, Metaphone, NYSIIS, Refined Soundex, Soundex.

Tukey Test

When is a value an outlier? is one of the most frequently asked question in anomaly detection. No matter if you do univariate outlier detection on single attributes or use RapidMiner's Anomaly Detection extension to generate a multivariate score - you still need to define a threshold. A common technique to do this is the Tukey Test (or criterion). It results in a outlier flag as well as a confidence for each example. It can also be applied on several attributes at a time.

Group Into Collection

This operator enables you to split an ExampleSet into various ExampleSets using a Group By. The result is a collection of ExampleSets. This can be used in combination with a Loop Collection to apply arbitrary functions with a group by statement. A possible example would be to find the last 3 transaction for each customer in transactional data.

Get Last Modifying Operator

If you dive a bit deeper into modeling you might want to try different feature selection techniques and treat it as a parameter of your modeling process. This can be achieved using a Select Subprocess in a Optimize Parameters Operator. In order to figure out which Feature Selection technique has won you would need to add at least one additional operator per method. To overcome this it is possible to extract the last modifying Operator for every object. This way you can easier annotate which feature selection technique was the best.


Generate Date Series

Given a start date and end date this operator generates all possible date times in between that range at regular intervals specified by the user. The Regular intervals could be year, month, day, hour, minute, second, , millisecond.

Get Decision Tree Path

This operator works very similar to the Apply Model Operator for a Decision Tree. Instead of returning prediction and confidences the path to the leaf is returned.

Get Parameters

This Operator extracts all parameters, including expert parameters, of another Operator and converts them to a ParamterSet.

Product Details

Version 0.5.0
File size 154 kB
Downloads 4762 (8 Today)4762 downloads
Vendor RapidMiner Labs
Category Operators
Released 8/21/17
Last Update 8/21/17 9:52 AM
License AGPL
Product web site
Rating 0.0 stars(0)