Category: Machine Learning

New learning algorithms not included in the RapidMiner core.

Anomaly Detection
The Anomaly Detection Extension comprises the most well know unsupervised anomaly detection algorithms, assigning individual anomaly scores to data rows of example sets

Deep Learning
This extension provides Deep Learning capabilities for execution on CPU and GPU.

Edda - Extensions for Binominal Text Classification (Topic Models and Regex)
Evidence in Documents, Discovery, and Analysis (EDDA). Work made possible by US National Library of Medicine, National Institutes of Health grant no. R00LM010943. PI Tanja Bekhuis. Developers Kevin Mitchell and Eugene Tseytlin. All software made available by the EDDA team under the GNU Affero General Public License.

Feature Selection Extension
This RapidMiner-plugin consists of operators for feature selection and classification - mainly on high-dimensional (microarray-) data - and some helper-classes/operators.

This extension provides two new operators, Foreacast (Univariate) and Forecast (Multivariate) which allow simple but powerful forecasting of time series.

This extension provides two new operators, Foreacast (Univariate) and Forecast (Multivariate) which allow simple but powerful forecasting of time series.

Generative AI
This extension offers two operators to access OpenAI's APIs for generating text and images.

Generative Models
The Generative Models extension (aka Generative AI) offers access to large language models (LLM) from Huggingface and OpenAI as well as finetuning of those models. It also offers embedding operators and vector stores and therefore support Retrieval-Augmented Generation (RAG).

Holt-Winters Filtering
This is a time series forecasting Operator. It computes Holt-Winters Filtering of a given time series. Unknown parameters are determined by minimizing the squared prediction error.

Information Selection
This extension includes a set of operators for information selection form the training set for classification and regression problems. These are operators for instance selection (example set selection), instance construction (creation of new examples that represent a set of other instances), clustering, LVQ neural networks, dimensionality reduction, and other. These operators can be used for outlier elimination and training set compression.

This extension gives you additional operators from the space of interpretation and explainable AI. At the moment it covers LIME, SHAP and Shapely. Note that this is an alpha version.

Keras Extension
The Keras extension allows to use Keras, a high-level Python library for Deep Learning leveraging Tensorflow, Microsoft Cognitive Toolkit (CNTK) or Theano as computation backends.

LifeStyle Marketing
LifeStyle Marketing allows financial outcome forecasting compared to control group/average based on raw transactions and questionnaire/impact data. It auto-generates behavioral and demographic characteristics for all keywords and values, including RFM, and builds most financially profitable (with given statistical confidence) forecasting models. Analysis of millions of observations on a regular PC takes minutes to hours. Free version runs on Windows and is limited by 100K observations.

This is a project for the implementation of MDL (Minimum Description Length) based extensions. The MDL principle can be applied to get a shorter description of a dataset, using its regularities in reference to a suitable compression. So, the best description is seen as the one that compresses the dataset best. The MDL extension currently includes an operator for the implementation of KRIMP algorithm, which might be used to prune a set of frequent patterns.

MonkeyLearn is an AI platform that allows companies to easily analyze text with Machine Learning. Customers like Clearbit, Segment and Drift are using MonkeyLearn to turn emails, support tickets, customer feedback, and documents into actionable data.

Prescriptive Analytics
This extension offers an operator to do prescriptive optimization. This means you vary the values of an example to optimize a custom fitness function which may derive from a model. Currently supported optimizers: - Grid - Evolutionary - BYOBA
Note: This is a BETA version!

Sales Forecasting Model
The Sales Forecasting model developed by Cappius uses a user defined window to predict future value of a time series by using Linear regression. The model that could be used are Neural networks or SVMs. The model performance is also evaluated by performing Residual analysis.

This extension wraps functionality from the Smile library ( and provides them as operators.

WhiBo is a framework for design and evaluation of ?white-box? component-based decision tree algorithms and their parts. It is intended for use by data mining practitioners, researchers and algorithm developers, but also for teaching of decision tree algorithms. The official web page of extension is

Word2Vec is a popular algorithm based on: Efficient Estimation of Word Representations in Vector Space, Mikolov et. al (2013). Training on a single corpus the algorithm will generate one multidimensional vector for each word. These vectors are known to have symantic meanings. A commonly used distance measure is cosine similarity. This implementation is based on the word2vec port available at:

XGBoost Extension
This extension embeds the XGBoost eXtreme Gradient Boosting library for use in RapidMiner. It implements a single operator named XGBoost compatible with RapidMiner's builtin learners.