mlinsights.mlmodel (trees)#

Note about potentiel issues#

The main estimator PiecewiseTreeRegressor is based on the implementation on new criterion. It relies on a non-public API and as such is more likely to break. The unit test are unstable. They work when scikit-learn and this package are compiled with the same set of tools. If installed from PyPi, you can check which versions were used at compilation time.

<<<

from mlinsights._config import (
    CYTHON_VERSION,
    NUMPY_VERSION,
    SCIPY_VERSION,
    SKLEARN_VERSION,
)

print(f"CYTHON_VERSION: {CYTHON_VERSION}")
print(f"NUMPY_VERSION: {NUMPY_VERSION}")
print(f"SCIPY_VERSION: {SCIPY_VERSION}")
print(f"SKLEARN_VERSION: {SKLEARN_VERSION}")

>>>

    CYTHON_VERSION: 3.0.5
    NUMPY_VERSION: 1.26.1
    SCIPY_VERSION: 1.11.3
    SKLEARN_VERSION: 1.4.dev0

The signature of method impurity_improvement has changed in version 0.24. That’s usually easy to handle two versions of scikit-learn even overloaded in a class except that method is implemented in cython. The method must be overloaded the same way with the same signature. Tricks such as *args or **kwargs cannot be used. The way it was handled is implemented in PR 88.

Estimators#

PiecewiseTreeRegressor#

class mlinsights.mlmodel.piecewise_tree_regression.PiecewiseTreeRegressor(criterion='mselin', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0)[source]#

Implements a kind of piecewise linear regression by modifying the criterion used by the algorithm which builds a decision tree. See sklearn.tree.DecisionTreeRegressor to get the meaning of the parameters except criterion:

  • mselin: optimizes for a piecewise linear regression

  • simple: optimizes for a stepwise regression (equivalent to mse)

If the file does not compile or crashes, some explanations are given in Note about potentiel issues.

fit(X, y, sample_weight=None, check_input=True)[source]#

Replaces the string stored in criterion by an instance of a class.

predict(X, check_input=True)[source]#

Overloads method predict. Falls back into the predict from a decision tree is criterion is mse, mae, simple. Computes the predictions from linear regression if the criterion is mselin.

predict_leaves(X)[source]#

Returns the leave index for each observation of X.

Parameters:

X – array

Returns:

array leaves index in self.leaves_index_

set_fit_request(*, check_input: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseTreeRegressor#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

check_inputstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for check_input parameter in fit.

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_predict_request(*, check_input: bool | None | str = '$UNCHANGED$') PiecewiseTreeRegressor#

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

check_inputstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for check_input parameter in predict.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseTreeRegressor#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

Criterions#

The following classes require scikit-learn >= 1.3.0, otherwise, they do not get compiled. Section Note about potentiel issues explains why the execution may crash.

SimpleRegressorCriterion#

class mlinsights.mlmodel.piecewise_tree_regression_criterion.SimpleRegressorCriterion#

Implements mean square error criterion in a non efficient way. The code was inspired from hellinger_distance_criterion.pyx, Cython example of exposing C-computed arrays in Python without data copies, _criterion.pyx. This implementation is not efficient but was made that way on purpose. It adds the features to the class.

If the file does not compile or crashes, some explanations are given in Note about potentiel issues.

printd(self)#

debug print

SimpleRegressorCriterionFast#

A similar design but a much faster implementation close to what scikit-learn implements.

class mlinsights.mlmodel.piecewise_tree_regression_criterion_fast.SimpleRegressorCriterionFast#

Criterion which computes the mean square error assuming points falling into one node are approximated by a constant. The implementation follows the same design used in SimpleRegressorCriterion. This implementation is faster as it computes cumulated sums and avoids loops to compute intermediate gains.

If the file does not compile or crashes, some explanations are given in Note about potentiel issues.

LinearRegressorCriterion#

The next one implements a criterion which optimizes the mean square error assuming the points falling into one node of the tree are approximated by a line. The mean square error is the error made with a linear regressor and not a constant anymore. The documentation will be completed later.

mlinsights.mlmodel.piecewise_tree_regression_criterion_linear.LinearRegressorCriterion

mlinsights.mlmodel.piecewise_tree_regression_criterion_linear_fast.SimpleRegressorCriterionFast