This extension wraps functionality from the Smile library ( and provides them as operators.

This extension wraps functionality from the Smile library ( and provides them as operators.

Smile is a fast and comprehensive machine learning engine. They focus on Speed, Ease of Use, Comprehensive, Natural Language Processing and Mathematics and Statisitcs.

Currently the extension provides the following Operators:

  • Anomaly:
    • Gaussian Mixture
  • Blending:
    • t-SNE
  • Cleansing:
    • Probabilistic Principal Component Analysis (PPCA) 
  • Clustering:
    • G-Means
  • Models:
    • Parametric Probability Estimator
  • Learner:
    • Lasso Regression
    • Random Forest (Smile) (now with classification in 0.4.0)
    • Gradient Boosted Tree (Smile)  (now with classification in 0.4.0)
  • Statisitics
    • Compare Distribution (enhanced in 0.4.0)

Changes in 0.6.0 (2021-09-10)

* GMM throws a proper exception if there are missing values.

* GMM is now normalizing the data before fitting. This reduces numerical issues which may occur

* GMM has now several ways of calculating it's anomaly score. The default is negative log likelihood instead of 1/likelihood.

* Changed the way how confidences are calculated in GMM to avoid numerical instability.

* GMM is now reporting it's BIC as a performance measure. * Fixed a bug that you could apply the GMM model to a data set with a different schema and receive missing values. A proper error message is thrown.


Version 0.5 (2021-08-11)

  • Reworked the GMM operator. It now:
    • provides cluster model, not a custom model anymore
    •     provides cluster assignments and confidences
    •     provides a score, which is 1/p by default with a setting to change the invert
    •     provides scores for each component of the mixture
    •     has all information about the model as a text output of the model

Version 0.4.1 (2021-04-08)

  • Fixed a bug that GMM was not able to handle one-class or unlabeled data even though it was able to do.

Version 0.4.0 (2019-12-18)

  • Random Forest (Smile) and Gradient Boosted Trees (Smile) now support Classification.
    • Random Forest Regression (Smile) renamed to Random Forest (Smile)
  • Compare Distributions: 
    • Added Kullback-Leibler and Jensen-Shannon as options to compare distributions. They run on a normalized bin version of the distribution.
    • Binning for Chi-Square, KL and JS are done on the superset of the data (i.e. min/max are determined on the superset).
    • A proper error message is thrown if you use Compare Distributions on data with missing values, which is not supported.

Version 0.3.0 (2019-09-11)

  • New operator: Compare Distributions
    • test the compatibility of two ExampleSets.
  • New operator: Gradient Boosted Tree (Smile)
    • Train a gradient boosted tree for Regression (classification currently not supported)
  • Renamed Regression operator folder to Learner
  • Major internal code refactoring. This may cause that previously trained models are not applicable anymore.


Version 0.2.0 (2019-07-30)

  • Added new operator Random Forest Regression (Smile)
  • Added the corresponding Random Forest Model

Version 0.1.0 (2019-02-08)

  • Extension release
  • New operator Gaussian Mixture
  • New operator G-Means 
  • New operator Probabilistic Principal Component Analysis 
  • New operator Lasso Regression
  • Operator t-SNE copied from Operator Toolbox Extension
  • Operator Parametric Probability Estimator copied from Operator Toolbox Extension

Product Details

Version 0.6.0
File size 1.3 MB
Downloads 10661 (4 Today)10661 downloads
Vendor RapidMiner Labs
Category Machine Learning
Released 9/10/21
Last Update 9/10/21 11:34 AM
License AGPL
Product web site
Rating 0.0 stars(0)