mlinsights.mlmodel (trees)¶
Note about potentiel issues¶
The main estimator PiecewiseTreeRegressor is based on the implementation on new criterion. It relies on a non-public API and as such is more likely to break. The unit test are unstable. They work when scikit-learn and this package are compiled with the same set of tools. If installed from PyPi, you can check which versions were used at compilation time.
<<<
from mlinsights._config import (
CYTHON_VERSION,
NUMPY_VERSION,
SCIPY_VERSION,
SKLEARN_VERSION,
)
print(f"CYTHON_VERSION: {CYTHON_VERSION}")
print(f"NUMPY_VERSION: {NUMPY_VERSION}")
print(f"SCIPY_VERSION: {SCIPY_VERSION}")
print(f"SKLEARN_VERSION: {SKLEARN_VERSION}")
>>>
CYTHON_VERSION: 3.0.11
NUMPY_VERSION: 2.1.1
SCIPY_VERSION: 1.14.1
SKLEARN_VERSION: 1.6.dev0
The signature of method impurity_improvement has changed in version 0.24. That’s usually easy to handle two versions of scikit-learn even overloaded in a class except that method is implemented in cython. The method must be overloaded the same way with the same signature. Tricks such as *args or **kwargs cannot be used. The way it was handled is implemented in PR 88.
Estimators¶
PiecewiseTreeRegressor¶
- class mlinsights.mlmodel.piecewise_tree_regression.PiecewiseTreeRegressor(criterion='mselin', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0)[source]¶
Implements a kind of piecewise linear regression by modifying the criterion used by the algorithm which builds a decision tree. See sklearn.tree.DecisionTreeRegressor to get the meaning of the parameters except criterion:
mselin
: optimizes for a piecewise linear regressionsimple
: optimizes for a stepwise regression (equivalent to mse)
If the file does not compile or crashes, some explanations are given in Note about potentiel issues.
- fit(X, y, sample_weight=None, check_input=True)[source]¶
Replaces the string stored in criterion by an instance of a class.
- predict(X, check_input=True)[source]¶
Overloads method predict. Falls back into the predict from a decision tree is criterion is mse, mae, simple. Computes the predictions from linear regression if the criterion is mselin.
- predict_leaves(X)[source]¶
Returns the leave index for each observation of X.
- Parameters:
X – array
- Returns:
array leaves index in
self.leaves_index_
- set_fit_request(*, check_input: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseTreeRegressor ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- check_inputstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
check_input
parameter infit
.- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns¶
- selfobject
The updated object.
- set_predict_request(*, check_input: bool | None | str = '$UNCHANGED$') PiecewiseTreeRegressor ¶
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- check_inputstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
check_input
parameter inpredict
.
Returns¶
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseTreeRegressor ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns¶
- selfobject
The updated object.
Criterions¶
The following classes require scikit-learn >= 1.3.0, otherwise, they do not get compiled. Section Note about potentiel issues explains why the execution may crash.
SimpleRegressorCriterion¶
- class mlinsights.mlmodel.piecewise_tree_regression_criterion.SimpleRegressorCriterion¶
Implements mean square error criterion in a non efficient way. The code was inspired from hellinger_distance_criterion.pyx, Cython example of exposing C-computed arrays in Python without data copies, _criterion.pyx. This implementation is not efficient but was made that way on purpose. It adds the features to the class.
If the file does not compile or crashes, some explanations are given in Note about potentiel issues.
- printd(self)¶
debug print
SimpleRegressorCriterionFast¶
A similar design but a much faster implementation close to what scikit-learn implements.
- class mlinsights.mlmodel.piecewise_tree_regression_criterion_fast.SimpleRegressorCriterionFast¶
Criterion which computes the mean square error assuming points falling into one node are approximated by a constant. The implementation follows the same design used in
SimpleRegressorCriterion
. This implementation is faster as it computes cumulated sums and avoids loops to compute intermediate gains.If the file does not compile or crashes, some explanations are given in Note about potentiel issues.
LinearRegressorCriterion¶
The next one implements a criterion which optimizes the mean square error assuming the points falling into one node of the tree are approximated by a line. The mean square error is the error made with a linear regressor and not a constant anymore. The documentation will be completed later.
mlinsights.mlmodel.piecewise_tree_regression_criterion_linear.LinearRegressorCriterion
mlinsights.mlmodel.piecewise_tree_regression_criterion_linear_fast.SimpleRegressorCriterionFast