Category: Machine Learning

New learning algorithms not included in the RapidMiner core.

Anomaly Detection
The Anomaly Detection Extension comprises the most well know unsupervised anomaly detection algorithms, assigning individual anomaly scores to data rows of example sets

Edda - Extensions for Binominal Text Classification (Topic Models and Regex)
Evidence in Documents, Discovery, and Analysis (EDDA). Work made possible by US National Library of Medicine, National Institutes of Health grant no. R00LM010943. PI Tanja Bekhuis. Developers Kevin Mitchell and Eugene Tseytlin. All software made available by the EDDA team under the GNU Affero General Public License.

Feature Selection Extension
This RapidMiner-plugin consists of operators for feature selection and classification - mainly on high-dimensional (microarray-) data - and some helper-classes/operators.

Information Selection
This extension includes a set of operators for information selection form the training set for classification and regression problems. These are operators for instance selection (example set selection), instance construction (creation of new examples that represent a set of other instances), clustering, LVQ neural networks, dimensionality reduction, and other. These operators can be used for outlier elimination and training set compression.

LifeStyle Marketing
LifeStyle Marketing allows financial outcome forecasting compared to control group/average based on raw transactions and questionnaire/impact data. It auto-generates behavioral and demographic characteristics for all keywords and values, including RFM, and builds most financially profitable (with given statistical confidence) forecasting models. Analysis of millions of observations on a regular PC takes minutes to hours. Free version runs on Windows and is limited by 100K observations.

This is a project for the implementation of MDL (Minimum Description Length) based extensions. The MDL principle can be applied to get a shorter description of a dataset, using its regularities in reference to a suitable compression. So, the best description is seen as the one that compresses the dataset best. The MDL extension currently includes an operator for the implementation of KRIMP algorithm, which might be used to prune a set of frequent patterns.

Sales Forecasting Model
The Sales Forecasting model developed by Cappius uses a user defined window to predict future value of a time series by using Linear regression. The model that could be used are Neural networks or SVMs. The model performance is also evaluated by performing Residual analysis.

WhiBo is a framework for design and evaluation of ?white-box? component-based decision tree algorithms and their parts. It is intended for use by data mining practitioners, researchers and algorithm developers, but also for teaching of decision tree algorithms. The official web page of extension is