mlinsights.mlmodel#

Helpers#

model_featurizer#

mlinsights.mlmodel.ml_featurizer.model_featurizer(model, **params)[source]#

Converts a machine learned model into a function which converts a vector into features produced by the model. It can be the output itself or intermediate results. The model can come from scikit-learn, torch.

Parameters:
  • model – model

  • params – additional parameters

Returns:

function

Clustering#

ConstraintKMeans#

class mlinsights.mlmodel.kmeans_constraint.ConstraintKMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=500, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='lloyd', balanced_predictions=False, strategy='gain', kmeans0=True, learning_rate=1.0, history=False)[source]#

Defines a constraint k-means. Clusters are modified to have an equal size. The algorithm is initialized with a regular KMeans and continues with a modified version of it.

Computing the predictions offer a choice. The first one is to keep the predictions from the regular k-means algorithm but with the balanced clusters. The second is to compute balanced predictions over the test set. That implies that the predictions for the same observations might change depending on the set it belongs to.

The parameter strategy determines how obseervations should be assigned to a cluster. The value can be:

  • 'distance': observations are ranked by distance to a cluster, the algorithm assigns first point to the closest center unless it reached the maximum size, it deals first with the further point and maps it to the closest center

  • 'gain': follows the algorithm described at

    see Same-size k-Means Variation,

  • 'weights': estimates weights attached to each cluster,

    it weights the distance to each cluster in order to balance the number of points mapped to every cluster, the strategy uses a learning rate.

The first two strategies cannot reach a good compromise without using function _switch_clusters which tries every switch between clusters: two points change clusters. It keeps the number of points and checks that the inertia is reduced.

Parameters:
  • n_clusters – number of clusters

  • init – used by k-means

  • n_init – used by k-means

  • max_iter – used by k-means

  • tol – used by k-means

  • verbose – used by k-means

  • random_state – used by k-means

  • copy_x – used by k-means

  • algorithm – used by k-means

  • balanced_predictions – produced balanced prediction or the regular ones

  • strategy – strategy or algorithm used to abide by the constraint

  • kmeans0 – if True, applies k-means algorithm first

  • history – keeps centers accress iterations

  • learning_rate – learning rate, used by strategy ‘weights’

cluster_edges()[source]#

Computes edges between clusters based on a Delaunay graph.

constraint_kmeans(X, sample_weight=None, state=None, learning_rate=1.0, history=False)[source]#

Completes the constraint k-means.

Parameters:
  • X – features

  • sample_weight – sample weight

  • state – state

  • learning_rate – learning rate

  • history – keeps evolution of centers

fit(X, y=None, sample_weight=None)[source]#

Compute k-means clustering.

Parameters:
  • X – array-like or sparse matrix, shape=(n_samples, n_features) Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous.

  • y – Ignored

  • sample_weight – sample weight

predict(X)[source]#

Computes the predictions.

Parameters:

X – features.

Returns:

prediction

score(X, y=None, sample_weight=None)[source]#

Returns the distances to all clusters.

Parameters:
  • X – features

  • y – unused

  • sample_weight – sample weight

Returns:

distances

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ConstraintKMeans#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ConstraintKMeans#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

transform(X)[source]#

Computes the predictions.

Parameters:

X – features.

Returns:

prediction

KMeansL1L2#

class mlinsights.mlmodel.kmeans_l1.KMeansL1L2(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='lloyd', norm='L2')[source]#

K-Means clustering with either norm L1 or L2. See notebook KMeans with norm L1 for an example.

Parameters:
  • n_clusters – int, default=8 The number of clusters to form as well as the number of centroids to generate.

  • init

    {‘k-means++’, ‘random’} or ndarray of shape (n_clusters, n_features), default=’k-means++’ Method for initialization, defaults to ‘k-means++’:

    ’k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.

    ’random’: choose k observations (rows) at random from data for the initial centroids.

    If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

  • n_init – int, default=10 Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

  • max_iter – int, default=300 Maximum number of iterations of the k-means algorithm for a single run.

  • tol – float, default=1e-4 Relative tolerance with regards to inertia to declare convergence.

  • verbose – int, default=0 Verbosity mode.

  • random_state – int, RandomState instance, default=None Determines random number generation for centroid initialization. Use an int to make the randomness deterministic.

  • copy_x – bool, default=True When pre-computing distances it is more numerically accurate to center the data first. If copy_x is True (default), then the original data is not modified, ensuring X is C-contiguous. If False, the original data is modified, and put back before the function returns, but small numerical differences may be introduced by subtracting and then adding the data mean, in this case it will also not ensure that data is C-contiguous which may cause a significant slowdown.

  • algorithm – {“lloyd”, “elkan”}, default=”lloyd” K-means algorithm to use. The classical EM-style algorithm is “lloyd”. The “elkan” variation is more efficient by using the triangle inequality, but currently doesn’t support sparse data.

  • norm – {“L1”, “L2”} The norm L2 is identical to KMeans. Norm L1 uses a complete different path.

Fitted attributes:

  • cluster_centers_: ndarray of shape (n_clusters, n_features)

    Coordinates of cluster centers. If the algorithm stops before fully converging (see tol and max_iter), these will not be consistent with labels_.

  • labels_: ndarray of shape (n_samples,)

    Labels of each point

  • inertia_: float

    Sum of squared distances of samples to their closest cluster center.

  • n_iter_: int

    Number of iterations run.

fit(X, y=None, sample_weight=None)[source]#

Computes k-means clustering.

Parameters:
  • X – array-like or sparse matrix, shape=(n_samples, n_features) Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous.

  • y – Ignored Not used, present here for API consistency by convention.

  • sample_weight – array-like, shape (n_samples,), optional The weights for each observation in X. If None, all observations are assigned equal weight (default: None).

Returns:

self Fitted estimator.

predict(X, sample_weight=None)[source]#

Predicts the closest cluster each sample in X belongs to.

In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters:
  • X – {array-like, sparse matrix} of shape (n_samples, n_features) New data to predict.

  • sample_weight – array-like, shape (n_samples,), optional The weights for each observation in X. If None, all observations are assigned equal weight (default: None), unused here

Returns:

labels : array, shape [n_samples,] Index of the cluster each sample belongs to.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KMeansL1L2#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_predict_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KMeansL1L2#

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in predict.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KMeansL1L2#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

transform(X)[source]#

Transforms X to a cluster-distance space.

In the new space, each dimension is the distance to the cluster centers. Note that even if X is sparse, the array returned by transform will typically be dense.

Parameters:

X – {array-like, sparse matrix} of shape (n_samples, n_features) New data to transform.

Returns:

X_new : array, shape [n_samples, k] X transformed in the new space.

Trainers#

ClassifierAfterKMeans#

class mlinsights.mlmodel.classification_kmeans.ClassifierAfterKMeans(estimator=None, clus=None, **kwargs)[source]#

Applies a k-means (see sklearn.cluster.KMeans) for each class, then adds the distance to each cluster as a feature for a classifier. See example LogisticRegression and Clustering.

Parameters:
  • estimatorsklearn.linear_model.LogisiticRegression by default

  • clus – clustering applied on each class, by default k-means with two classes

  • kwargs – sent to set_params, see its documentation to understand how to specify parameters

decision_function(X)[source]#

Calls decision_function.

fit(X, y, sample_weight=None)[source]#

Runs a k-means on each class then trains a classifier on the extended set of features.

Parameters:
  • X – numpy array or sparse matrix of shape [n_samples,n_features] Training data

  • y – numpy array of shape [n_samples, n_targets] Target values. Will be cast to X’s dtype if necessary

  • sample_weight – numpy array of shape [n_samples] Individual weights for each sample

Returns:

self : returns an instance of self.

Fitting attributes: * labels_: dictionary of clustering models * clus_: array of clustering models * estimator_: trained classifier

get_params(deep=True)[source]#

Returns the parameters for both the clustering and the classifier.

Parameters:

deep – unused here

Returns:

dict

set_params describes the pattern parameters names follow.

predict(X)[source]#

Runs the predictions.

predict_proba(X)[source]#

Converts predictions into probabilities.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ClassifierAfterKMeans#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_params(**values)[source]#

Sets the parameters before training. Every parameter prefixed by 'e_' is an estimator parameter, every parameter prefixed by 'c_' is for the sklearn.cluster.KMeans.

Parameters:

values – valeurs

Returns:

dict

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ClassifierAfterKMeans#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

transform_features(X)[source]#

Applies all the clustering objects on every observations and extends the list of features.

Parameters:

X – features

Returns:

extended features

CustomizedMultilayerPerceptron#

class mlinsights.mlmodel.quantile_mlpregressor.CustomizedMultilayerPerceptron(hidden_layer_sizes, activation, solver, alpha, batch_size, learning_rate, learning_rate_init, power_t, max_iter, loss, shuffle, random_state, tol, verbose, warm_start, momentum, nesterovs_momentum, early_stopping, validation_fraction, beta_1, beta_2, epsilon, n_iter_no_change, max_fun)[source]#

Customized MLP Perceptron based on BaseMultilayerPerceptron.

IntervalRegressor#

class mlinsights.mlmodel.interval_regressor.IntervalRegressor(estimator=None, n_estimators=10, n_jobs=None, alpha=1.0, verbose=False)[source]#

Trains multiple regressors to provide a confidence interval on prediction. It only works for single regression. Every training is made with a new sample of the training data, parameter alpha let the user choose the size of this sample. A smaller alpha increases the variance of the predictions. The current implementation draws sample by random but keeps the weight associated to each of them. Another way could be to draw a weighted sample but give them uniform weights.

Parameters:
  • estimator – predictor trained on every bucket

  • n_estimators – number of estimators to train

  • n_jobs – number of parallel jobs (for training and predicting)

  • alpha – proportion of samples resampled for each training

  • verbose – boolean or use 'tqdm' to use tqdm to fit the estimators

fit(X, y, sample_weight=None)[source]#

Trains the binner and an estimator on every bucket.

Parameters:
  • X – features, X is converted into an array if X is a dataframe

  • y – target

  • sample_weight – sample weights

Returns:

self: returns an instance of self.

Fitted attributes:

  • binner_: binner

  • estimators_: dictionary of estimators, each of them

    mapped to a leave to the tree

  • mean_estimator_: estimator trained on the whole

    datasets in case the binner can find a bucket for a new observation

  • dim_: dimension of the output

  • mean_: average targets

property n_estimators_#

Returns the number of estimators = the number of buckets the data was split in.

predict(X)[source]#

Computes the average predictions.

Parameters:

X – features, X is converted into an array if X is a dataframe

Returns:

predictions

predict_all(X)[source]#

Computes the predictions for all estimators.

Parameters:

X – features, X is converted into an array if X is a dataframe

Returns:

predictions

predict_sorted(X)[source]#

Computes the predictions for all estimators. Sorts them for all observations.

Parameters:

X – features, X is converted into an array if X is a dataframe

Returns:

predictions sorted for each observation

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IntervalRegressor#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IntervalRegressor#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

ApproximateNMFPredictor#

class mlinsights.mlmodel.anmf_predictor.ApproximateNMFPredictor(force_positive=False, **kwargs)[source]#

Converts sklearn.decomposition.NMF into a predictor so that the prediction does not involve training even for new observations. The class uses a sklearn.decomposition.TruncatedSVD of the components found by the sklearn.decomposition.NMF. The prediction projects the test data into the components vector space and retrieves them back into their original space. The issue is it does not necessarily produce results with only positive results as the sklearn.decomposition.NMF would do unless parameter force_positive is True.

<<<

import numpy
from mlinsights.mlmodel.anmf_predictor import ApproximateNMFPredictor

train = numpy.array(
    [[1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0]],
    dtype=numpy.float64,
)
train[: train.shape[1], :] += numpy.identity(train.shape[1])

model = ApproximateNMFPredictor(n_components=2, force_positive=True)
model.fit(train)

test = numpy.array([[1, 1, 1, 0]], dtype=numpy.float64)
pred = model.predict(test)
print(pred)

>>>

    [[1.155 0.84  0.05  0.05 ]]
fit(X, y=None)[source]#

Trains a sklearn.decomposition.NMF then a multi-output regressor.

get_params(deep=True)[source]#

Returns the parameters of the estimator as a dictionary.

predict(X)[source]#

Predicts based on the multi-output regressor. The output has the same dimension as X.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ApproximateNMFPredictor#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

PiecewiseClassifier#

class mlinsights.mlmodel.piecewise_estimator.PiecewiseClassifier(binner=None, estimator=None, n_jobs=None, random_state=None, verbose=False)[source]#

Uses a decision tree to split the space of features into buckets and trains a logistic regression (default) on each of them. The second estimator is usually a sklearn.linear_model.LogisticRegression. It can also be sklearn.dummy.DummyClassifier to just get the average on each bucket.

Parameters:
  • binner – transformer or predictor which creates the buckets

  • estimator – predictor trained on every bucket

  • n_jobs – number of parallel jobs (for training and predicting)

  • random_state – to pick up random examples when buckets do not contain enough examples of each class

  • verbose – boolean or use 'tqdm' to use tqdm to fit the estimators

binner allows the following values:

estimator allows the following values:

The main issue with the PiecewiseClassifier is that each piece requires one example of each class in each bucket which may not happen. To avoid that, the training will pick up random example from other bucket to ensure this case does not happen.

decision_function(X)[source]#

Computes the predictions probabilities.

Parameters:

X – features, X is converted into an array if X is a dataframe

Returns:

predictions probabilities

predict(X)[source]#

Computes the predictions.

Parameters:

X – features, X is converted into an array if X is a dataframe

Returns:

predictions

predict_proba(X)[source]#

Computes the predictions probabilities.

Parameters:

X – features, X is converted into an array if X is a dataframe

Returns:

predictions probabilities

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseClassifier#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseClassifier#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

PiecewiseRegressor#

class mlinsights.mlmodel.piecewise_estimator.PiecewiseRegressor(binner=None, estimator=None, n_jobs=None, verbose=False)[source]#

Uses a decision tree to split the space of features into buckets and trains a linear regression (default) on each of them. The second estimator is usually a sklearn.linear_model.LinearRegression. It can also be sklearn.dummy.DummyRegressor to just get the average on each bucket.

Parameters:
  • binner – transformer or predictor which creates the buckets

  • estimator – predictor trained on every bucket

  • n_jobs – number of parallel jobs (for training and predicting)

  • verbose – boolean or use 'tqdm' to use tqdm to fit the estimators

binner allows the following values:

estimator allows the following values:

predict(X)[source]#

Computes the predictions.

Parameters:

X – features, X is converted into an array if X is a dataframe

Returns:

predictions

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseRegressor#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseRegressor#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

QuantileMLPRegressor#

class mlinsights.mlmodel.quantile_mlpregressor.QuantileMLPRegressor(hidden_layer_sizes=(100,), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, **kwargs)[source]#

Quantile MLP Regression or neural networks regression trained with norm L1. This class inherits from sklearn.neural_networks.MLPRegressor. This model optimizes the absolute-loss using LBFGS or stochastic gradient descent. See CustomizedMultilayerPerceptron and absolute_loss.

Parameters:
  • hidden_layer_sizes – tuple, length = n_layers - 2, default (100,) The ith element represents the number of neurons in the ith hidden layer.

  • activation – {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default ‘relu’ Activation function for the hidden layer. ‘identity’, no-op activation, useful to implement linear bottleneck, returns f(x) = x, ‘logistic’, the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)). ‘tanh’, the hyperbolic tan function, returns f(x) = tanh(x). ‘relu’, the rectified linear unit function, returns f(x) = \max(0, x).

  • solver{'lbfgs', 'sgd', 'adam'}, default ‘adam’ The solver for weight optimization, ‘lbfgs’ is an optimizer in the family of quasi-Newton methods. ‘sgd’ refers to stochastic gradient descent. ‘adam’ refers to a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba Note: The default solver ‘adam’ works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, ‘lbfgs’ can converge faster and perform better.

  • alpha – float, optional, default 0.0001 L2 penalty (regularization term) parameter.

  • batch_size – int, optional, default ‘auto’ Size of minibatches for stochastic optimizers. If the solver is ‘lbfgs’, the classifier will not use minibatch. When set to “auto”, batch_size=min(200, n_samples)

  • learning_rate – {‘constant’, ‘invscaling’, ‘adaptive’}, default ‘constant’ Learning rate schedule for weight updates. ‘constant’ is a constant learning rate given by ‘learning_rate_init’, ‘invscaling’ gradually decreases the learning rate learning_rate_ at each time step ‘t’ using an inverse scaling exponent of ‘power_t’. effective_learning_rate = learning_rate_init / pow(t, power_t), ‘adaptive’ keeps the learning rate constant to ‘learning_rate_init’ as long as training loss keeps decreasing. Each time two consecutive epochs fail to decrease training loss by at least tol, or fail to increase validation score by at least tol if ‘early_stopping’ is on, the current learning rate is divided by 5. Only used when solver=’sgd’.

  • learning_rate_init – double, optional, default 0.001 The initial learning rate used. It controls the step-size in updating the weights. Only used when solver=’sgd’ or ‘adam’.

  • power_t – double, optional, default 0.5 The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to ‘invscaling’. Only used when solver=’sgd’.

  • max_iter – int, optional, default 200 Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

  • shuffle – bool, optional, default True Whether to shuffle samples in each iteration. Only used when solver=’sgd’ or ‘adam’.

  • random_state – int, RandomState instance or None, optional, default None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • tol – float, optional, default 1e-4 Tolerance for the optimization. When the loss or score is not improving by at least tol for n_iter_no_change consecutive iterations, unless learning_rate is set to ‘adaptive’, convergence is considered to be reached and training stops.

  • verbose – bool, optional, default False Whether to print progress messages to stdout.

  • warm_start – bool, optional, default False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

  • momentum – float, default 0.9 Momentum for gradient descent update. Should be between 0 and 1. Only used when solver=’sgd’.

  • nesterovs_momentum – boolean, default True Whether to use Nesterov’s momentum. Only used when solver=’sgd’ and momentum > 0.

  • early_stopping – bool, default False Whether to use early stopping to terminate training when validation score is not improving. If set to true, it will automatically set aside 10% of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs. Only effective when solver=’sgd’ or ‘adam’

  • validation_fraction – float, optional, default 0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True

  • beta_1 – float, optional, default 0.9 Exponential decay rate for estimates of first moment vector in adam, should be in [0, 1). Only used when solver=’adam’

  • beta_2 – float, optional, default 0.999 Exponential decay rate for estimates of second moment vector in adam, should be in [0, 1). Only used when solver=’adam’

  • epsilon – float, optional, default 1e-8 Value for numerical stability in adam. Only used when solver=’adam’

  • n_iter_no_change – int, optional, default 10 Maximum number of epochs to not meet tol improvement. Only effective when solver=’sgd’ or ‘adam’

  • kwargs – additional parameters sent to the constructor of the parent

Fitted attributes:

  • loss_: float The current loss computed with the loss function.

  • coefs_: list, length n_layers - 1 The ith element in the list represents the weight matrix corresponding to layer i.

  • intercepts_: list, length n_layers - 1 The ith element in the list represents the bias vector corresponding to layer i + 1.

  • n_iter_: int, The number of iterations the solver has ran.

  • n_layers_: int Number of layers.

  • n_outputs_: int Number of outputs.

  • out_activation_: string Name of the output activation function.

predict(X)[source]#

Predicts using the multi-layer perceptron model.

Parameters:

X – {array-like, sparse matrix}, shape (n_samples, n_features) The input data.

Returns:

y : array-like, shape (n_samples, n_outputs) The predicted values.

score(X, y, sample_weight=None)[source]#

Returns mean absolute error regression loss.

Parameters:
  • X – array-like, shape = (n_samples, n_features) Test samples.

  • y – array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X.

  • sample_weight – array-like, shape = [n_samples], optional Sample weights.

Returns:

score, float mean absolute error regression loss

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') QuantileMLPRegressor#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

QuantileLinearRegression#

class mlinsights.mlmodel.quantile_regression.QuantileLinearRegression(fit_intercept=True, copy_X=True, n_jobs=1, delta=0.0001, max_iter=10, quantile=0.5, positive=False, verbose=False)[source]#

Quantile Linear Regression or linear regression trained with norm L1. This class inherits from sklearn.linear_models.LinearRegression. See example Quantile Regression.

Norm L1 is chosen if quantile=0.5, otherwise, for quantile=\rho, the following error is optimized:

\sum_i \rho |f(X_i) - Y_i|^- + (1-\rho) |f(X_i) - Y_i|^+

where |f(X_i) - Y_i|^-= \max(Y_i - f(X_i), 0) and |f(X_i) - Y_i|^+= \max(f(X_i) - Y_i, 0). f(i) is the prediction, Y_i the expected value.

Parameters:
  • fit_intercept – boolean, optional, default True whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (e.g. data is expected to be already centered).

  • copy_X – boolean, optional, default True If True, X will be copied; else, it may be overwritten.

  • n_jobs – int, optional, default 1 The number of jobs to use for the computation. If -1 all CPUs are used. This will only provide speedup for n_targets > 1 and sufficient large problems.

  • max_iter – int, optional, default 1 The number of iteration to do at training time. This parameter is specific to the quantile regression.

  • delta – float, optional, default 0.0001 Used to ensure matrices has an inverse (M + delta*I).

  • quantile – float, by default 0.5, determines which quantile to use to estimate the regression.

  • positive – when set to True, forces the coefficients to be positive.

  • verbose – bool, optional, default False Prints error at each iteration of the optimisation.

fit(X, y, sample_weight=None)[source]#

Fits a linear model with L1 norm which is equivalent to a quantile regression. The implementation is not the most efficient as it calls multiple times method fit from sklearn.linear_models.LinearRegression. Data gets checked and rescaled each time. The optimization follows the algorithm Iteratively reweighted least squares. It is described in French at Régression quantile.

Parameters:
  • X – numpy array or sparse matrix of shape [n_samples,n_features] Training data

  • y – numpy array of shape [n_samples, n_targets] Target values. Will be cast to X’s dtype if necessary

  • sample_weight – numpy array of shape [n_samples] Individual weights for each sample

Returns:

self, returns an instance of self.

Fitted attributes:

  • coef_: array, shape (n_features, ) or (n_targets, n_features)

    Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

  • intercept_: array

    Independent term in the linear model.

  • n_iter_: int

    Number of iterations at training time.

score(X, y, sample_weight=None)[source]#

Returns Mean absolute error regression loss.

Parameters:
  • X – array-like, shape = (n_samples, n_features) Test samples.

  • y – array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X.

  • sample_weight – array-like, shape = [n_samples], optional Sample weights.

Returns:

score : float mean absolute error regression loss

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') QuantileLinearRegression#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') QuantileLinearRegression#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

TransformedTargetClassifier2#

class mlinsights.mlmodel.target_predictors.TransformedTargetClassifier2(classifier=None, transformer=None)[source]#

Meta-estimator to classify on a transformed target. Useful for applying permutation transformation in classification problems.

Parameters:
  • classifier – object, default=LogisticRegression() Classifier object such as derived from ClassifierMixin. This classifier will automatically be cloned each time prior to fitting.

  • transformer – str or object of type BaseReciprocalTransformer Transforms the features.

<<<

import numpy
from sklearn.linear_model import LogisticRegression
from mlinsights.mlmodel import TransformedTargetClassifier2

tt = TransformedTargetClassifier2(
    classifier=LogisticRegression(), transformer="permute"
)
X = numpy.arange(4).reshape(-1, 1)
y = numpy.array([0, 1, 0, 1])
print(tt.fit(X, y))
print(tt.score(X, y))
print(tt.classifier_.coef_)

>>>

    TransformedTargetClassifier2(classifier=LogisticRegression(),
                                 transformer='permute')
    0.5
    [[-0.453]]

See example Transformed Target for a more complete example.

The class holds two attributes classifier_, the fitted classifier, transformer_ transformer used in fit, predict, decision_function, predict_proba.

property classes_#

Returns the classes.

decision_function(X)[source]#

Predicts using the base classifier, applying inverse.

Parameters:

X – {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.

Returns:

raw score : array, shape = (n_samples, ?)

fit(X, y, sample_weight=None)[source]#

Fits the model according to the given training data.

Parameters:
  • X – {array-like, sparse matrix}, shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y – array-like, shape (n_samples,) Target values.

  • sample_weight – array-like, shape (n_samples,) optional Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns:

self, object

predict(X)[source]#

Predicts using the base classifier, applying inverse.

Parameters:

X – {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.

Returns:

y_hat, array, shape = (n_samples,) Predicted values.

predict_proba(X)[source]#

Predicts using the base classifier, applying inverse.

Parameters:

X – {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.

Returns:

predict probabilities, array, shape = (n_samples, n_classes) Predicted values.

score(X, y, sample_weight=None)[source]#

Scores the model with sklearn.metrics.accuracy_score.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TransformedTargetClassifier2#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TransformedTargetClassifier2#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

TransformedTargetRegressor2#

class mlinsights.mlmodel.target_predictors.TransformedTargetRegressor2(regressor=None, transformer=None)[source]#

Meta-estimator to regress on a transformed target. Useful for applying a non-linear transformation in regression problems.

Parameters:
  • regressor – object, default=LinearRegression() Regressor object such as derived from RegressorMixin. This regressor will automatically be cloned each time prior to fitting.

  • transformer – str or object of type BaseReciprocalTransformer

<<<

import numpy
from sklearn.linear_model import LinearRegression
from mlinsights.mlmodel import TransformedTargetRegressor2

tt = TransformedTargetRegressor2(regressor=LinearRegression(), transformer="log")
X = numpy.arange(4).reshape(-1, 1)
y = numpy.exp(2 * X).ravel()
print(tt.fit(X, y))
print(tt.score(X, y))
print(tt.regressor_.coef_)

>>>

    TransformedTargetRegressor2(regressor=LinearRegression(), transformer='log')
    1.0
    [2.]

See example Transformed Target for a more complete example.

The class holds two attributes regressor_, the fitted regressor, transformer_ transformer used in fit, predict, decision_function, predict_proba.

fit(X, y, sample_weight=None)[source]#

Fits the model according to the given training data.

Parameters:
  • X – {array-like, sparse matrix}, shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y – array-like, shape (n_samples,) Target values.

  • sample_weight – array-like, shape (n_samples,) optional Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns:

self, object

predict(X)[source]#

Predicts using the base regressor, applying inverse.

Parameters:

X – {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.

Returns:

y_hat : array, shape = (n_samples,) Predicted values.

score(X, y, sample_weight=None)[source]#

Scores the model with sklearn.metrics.r2_score.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TransformedTargetRegressor2#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TransformedTargetRegressor2#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

Transforms#

NGramsMixin#

class mlinsights.mlmodel.sklearn_text.NGramsMixin[source]#

Overloads method _word_ngrams to get tuples instead of string in member vocabulary_. of TfidfVectorizer or CountVectorizer. It contains the list of n-grams used to process documents. See TraceableCountVectorizer and TraceableTfidfVectorizer for example.

BaseReciprocalTransformer#

class mlinsights.mlmodel.sklearn_transform_inv.BaseReciprocalTransformer[source]#

Base for transform which transforms the features and the targets at the same time. It must also return another transform which transforms the target back to what it was.

get_fct_inv()[source]#

Returns a trained transform which reverse the target after a predictor.

transform(X, y)[source]#

Transforms X and y. Returns transformed X and y.

CategoriesToIntegers#

class mlinsights.mlmodel.categories_to_integers.CategoriesToIntegers(columns=None, remove=None, skip_errors=False, single=False)[source]#

Does something similar to what DictVectorizer does but in a transformer. The method fit retains all categories, the method transform transforms categories into integers. Categories are sorted by columns. If the method transform tries to convert a categories which was not seen by method fit, it can raise an exception or ignore it and replace it by zero.

Parameters:
  • columns – specify a columns selection

  • remove – modalities to remove

  • skip_errors – skip when a new categories appear (no 1)

  • single – use a single column per category, do not multiply them for each value

The logging function displays a message when a new dense and big matrix is created when it should be sparse. A sparse matrix should be allocated instead.

DictVectorizer or CategoriesToIntegers

Example which transforms text into integers:

<<<

import pandas
from mlinsights.mlmodel import CategoriesToIntegers

df = pandas.DataFrame([{"cat": "a"}, {"cat": "b"}])
trans = CategoriesToIntegers()
trans.fit(df)
newdf = trans.transform(df)
print(newdf)

>>>

       cat=a  cat=b
    0    1.0    NaN
    1    NaN    1.0
fit(X, y=None, **fit_params)[source]#

Makes the list of all categories in input X. X must be a dataframe.

Parameters:
  • X – iterable Training data

  • y – iterable, default=None Training targets.

  • fit_params – additional fit params

Returns:

self

fit_transform(X, y=None, **fit_params)[source]#

Fits and transforms categories in numerical features based on the list of categories found by method fit. X must be a dataframe. The function does not preserve the order of the columns.

Parameters:
  • X – iterable Training data

  • y – iterable, default=None Training targets.

  • fit_params – additional fitting parameters

Returns:

Dataframe, X with categories.

transform(X, y=None)[source]#

Transforms categories in numerical features based on the list of categories found by method fit. X must be a dataframe. The function does not preserve the order of the columns.

Parameters:
  • X – iterable Training data

  • y – iterable, default=None Training targets.

Returns:

DataFrame, X with categories.

ExtendedFeatures#

class mlinsights.mlmodel.extended_features.ExtendedFeatures(kind='poly', poly_degree=2, poly_interaction_only=False, poly_include_bias=True)[source]#

Generates extended features such as polynomial features.

Parameters:
  • kind – string 'poly' for polynomial features, 'poly-slow' for polynomial features in scikit-learn 0.20.2

  • poly_degree – integer The degree of the polynomial features. Default = 2.

  • poly_interaction_only – boolean If true, only interaction features are produced: features that are products of at most degree distinct input features (so not x[1] ** 2, x[0] * x[2] ** 3, etc.).

  • poly_include_bias – boolean If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).

Fitted attributes:

  • n_input_features_: int

    The total number of input features.

  • n_output_features_: int

    The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.

fit(X, y=None)[source]#

Compute number of output features.

Parameters:
  • X – array-like, shape (n_samples, n_features) The data.

  • y – targets

Returns:

self : instance

get_feature_names_out(input_features=None)[source]#

Returns feature names for output features.

Parameters:

input_features – list of string, length n_features, optional String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

Returns:

output_feature_names : list of string, length n_output_features

transform(X)[source]#

Transforms data to extended features.

Parameters:

X – array-like, shape [n_samples, n_features] The data to transform, row by row. rns

FunctionReciprocalTransformer#

class mlinsights.mlmodel.sklearn_transform_inv_fct.FunctionReciprocalTransformer(fct, fct_inv=None)[source]#

The transform is used to apply a function on a the target, predict, then transform the target back before scoring. The transforms implements a series of predefined functions:

Parameters:
  • fct – function name of numerical function

  • fct_inv – optional if fct is a function name, reciprocal function otherwise

<<<

import pprint
from mlinsights.mlmodel.sklearn_transform_inv_fct import FunctionReciprocalTransformer

pprint.pprint(FunctionReciprocalTransformer.available_fcts())

>>>

    {'exp': (<ufunc 'exp'>, 'log'),
     'exp(x)-1': (<function FunctionReciprocalTransformer.available_fcts.<locals>.<lambda> at 0x7fc72559d6c0>,
                  'log'),
     'expm1': (<ufunc 'expm1'>, 'log1p'),
     'log': (<ufunc 'log'>, 'exp'),
     'log(1+x)': (<function FunctionReciprocalTransformer.available_fcts.<locals>.<lambda> at 0x7fc72559cf70>,
                  'exp(x)-1'),
     'log1p': (<ufunc 'log1p'>, 'expm1')}
static available_fcts()[source]#

Returns the list of predefined functions.

fit(X=None, y=None, sample_weight=None)[source]#

Just defines fct and fct_inv.

get_fct_inv()[source]#

Returns a trained transform which reverse the target after a predictor.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') FunctionReciprocalTransformer#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

transform(X, y)[source]#

Transforms X and y. Returns transformed X and y. If y is None, the returned value for y is None as well.

PermutationReciprocalTransformer#

class mlinsights.mlmodel.sklearn_transform_inv_fct.PermutationReciprocalTransformer(random_state=None, closest=False)[source]#

The transform is used to permute targets, predict, then permute the target back before scoring. nan values remain nan values. Once fitted, the transform has attribute permutation_ which keeps track of the permutation to apply.

Parameters:
  • random_state – random state

  • closest – if True, finds the closest permuted element

fit(X=None, y=None, sample_weight=None)[source]#

Defines a random permutation over the targets.

get_fct_inv()[source]#

Returns a trained transform which reverse the target after a predictor.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PermutationReciprocalTransformer#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

transform(X, y)[source]#

Transforms X and y. Returns transformed X and y. If y is None, the returned value for y is None as well.

PredictableTSNE#

class mlinsights.mlmodel.predictable_tsne.PredictableTSNE(normalizer=None, transformer=None, estimator=None, normalize=True, keep_tsne_outputs=False)[source]#

t-SNE is an interesting transform which can only be used to study data as there is no way to reproduce the result once it was fitted. That’s why the class TSNE does not have any method transform, only fit_transform. This example proposes a way to train a machine learned model which approximates the outputs of a TSNE transformer. Example Predictable t-SNE gives an example on how to use this class.

Parameters:
  • normalizer – None by default

  • transformersklearn.manifold.TSNE by default

  • estimatorsklearn.neural_network.MLPRegressor by default

  • normalize – normalizes the outputs, centers and normalizes the output of the t-SNE and applies that same normalization to he prediction of the estimator

  • keep_tsne_outputs – if True, keep raw outputs of TSNE is stored in member tsne_outputs_

fit(X, y, sample_weight=None)[source]#

Trains a TSNE then trains an estimator to approximate its outputs.

Parameters:
  • X – numpy array or sparse matrix of shape [n_samples,n_features] Training data

  • y – numpy array of shape [n_samples, n_targets] Target values. Will be cast to X’s dtype if necessary

  • sample_weight – numpy array of shape [n_samples] Individual weights for each sample

Returns:

self, returns an instance of self.

Fitted attributes:

  • normalizer_: trained normalier

  • transformer_: trained transformeer

  • estimator_: trained regressor

  • tsne_outputs_: t-SNE outputs if keep_tsne_outputs is True

  • mean_: average of the t-SNE output on each dimension

  • inv_std_: inverse of the standard deviation of the t-SNE output on each dimension

  • loss_: loss (sklearn.metrics.mean_squared_error) between the predictions and the outputs of t-SNE

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PredictableTSNE#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

transform(X)[source]#

Runs the predictions.

Parameters:

X – numpy array or sparse matrix of shape [n_samples,n_features] Training data

Returns:

tranformed X

TransferTransformer#

class mlinsights.mlmodel.transfer_transformer.TransferTransformer(estimator, method=None, copy_estimator=True, trainable=False)[source]#

Wraps a predictor or a transformer in a transformer. This model is frozen: it cannot be trained and only computes the predictions.

Parameters:
  • estimator – estimator to wrap in a transformer, it is clone with the training data (deep copy) when fitted

  • method – if None, guess what method should be called, transform for a transformer, predict_proba for a classifier, decision_function if found, predict otherwiser

  • copy_estimator – copy the model instead of taking a reference

  • trainable – the transfered model must be trained

fit(X=None, y=None, sample_weight=None)[source]#

The function does nothing.

Parameters:
  • X – unused

  • y – unused

  • sample_weight – unused

Returns:

self: returns an instance of self.

Fitted attributes:

  • estimator_: already trained estimator

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TransferTransformer#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

transform(X)[source]#

Runs the predictions.

Parameters:

X – numpy array or sparse matrix of shape [n_samples,n_features] Training data

Returns:

tranformed X

TraceableCountVectorizer#

class mlinsights.mlmodel.sklearn_text.TraceableCountVectorizer(*, input='content', encoding='utf-8', decode_error='strict', strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, stop_words=None, token_pattern='(?u)\\b\\w\\w+\\b', ngram_range=(1, 1), analyzer='word', max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=<class 'numpy.int64'>)[source]#

Inherits from NGramsMixin which overloads method _word_ngrams to keep more information about n-grams but still produces the same outputs than CountVectorizer.

<<<

import numpy
from sklearn.feature_extraction.text import CountVectorizer
from mlinsights.mlmodel.sklearn_text import TraceableCountVectorizer
from pprint import pformat

corpus = numpy.array(
    [
        "This is the first document.",
        "This document is the second document.",
        "Is this the first document?",
        "",
    ]
).reshape((4,))

print("CountVectorizer from scikit-learn")
mod1 = CountVectorizer(ngram_range=(1, 2))
mod1.fit(corpus)
print(mod1.transform(corpus).todense()[:2])
print(pformat(mod1.vocabulary_)[:100])

print("TraceableCountVectorizer from scikit-learn")
mod2 = TraceableCountVectorizer(ngram_range=(1, 2))
mod2.fit(corpus)
print(mod2.transform(corpus).todense()[:2])
print(pformat(mod2.vocabulary_)[:100])

>>>

    CountVectorizer from scikit-learn
    [[1 0 1 1 1 1 0 0 0 1 1 0 1 0 1 0]
     [2 1 0 0 1 1 0 1 1 1 0 1 1 1 0 0]]
    {'document': 0,
     'document is': 1,
     'first': 2,
     'first document': 3,
     'is': 4,
     'is the': 5,
     'is t
    TraceableCountVectorizer from scikit-learn
    [[1 0 1 1 1 1 0 0 0 1 1 0 1 0 1 0]
     [2 1 0 0 1 1 0 1 1 1 0 1 1 1 0 0]]
    {('document',): 0,
     ('document', 'is'): 1,
     ('first',): 2,
     ('first', 'document'): 3,
     ('is',): 4,

A weirder example with TraceableTfidfVectorizer shows more differences.

set_fit_request(*, raw_documents: bool | None | str = '$UNCHANGED$') TraceableCountVectorizer#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

raw_documentsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for raw_documents parameter in fit.

Returns#

selfobject

The updated object.

set_transform_request(*, raw_documents: bool | None | str = '$UNCHANGED$') TraceableCountVectorizer#

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

raw_documentsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for raw_documents parameter in transform.

Returns#

selfobject

The updated object.

TraceableTfidfVectorizer#

class mlinsights.mlmodel.sklearn_text.TraceableTfidfVectorizer(*, input='content', encoding='utf-8', decode_error='strict', strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, analyzer='word', stop_words=None, token_pattern='(?u)\\b\\w\\w+\\b', ngram_range=(1, 1), max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=<class 'numpy.float64'>, norm='l2', use_idf=True, smooth_idf=True, sublinear_tf=False)[source]#

Inherits from NGramsMixin which overloads method _word_ngrams to keep more information about n-grams but still produces the same outputs than TfidfVectorizer.

<<<

import numpy
from sklearn.feature_extraction.text import TfidfVectorizer
from mlinsights.mlmodel.sklearn_text import TraceableTfidfVectorizer
from pprint import pformat

corpus = numpy.array(
    [
        "This is the first document.",
        "This document is the second document.",
        "Is this the first document?",
        "",
    ]
).reshape((4,))

print("TfidfVectorizer from scikit-learn")
mod1 = TfidfVectorizer(ngram_range=(1, 2), token_pattern="[a-zA-Z ]{1,4}")
mod1.fit(corpus)
print(mod1.transform(corpus).todense()[:2])
print(pformat(mod1.vocabulary_)[:100])

print("TraceableTfidfVectorizer from scikit-learn")
mod2 = TraceableTfidfVectorizer(ngram_range=(1, 2), token_pattern="[a-zA-Z ]{1,4}")
mod2.fit(corpus)
print(mod2.transform(corpus).todense()[:2])
print(pformat(mod2.vocabulary_)[:100])

>>>

    TfidfVectorizer from scikit-learn
    [[0.    0.    0.329 0.329 0.    0.    0.    0.    0.26  0.26  0.    0.
      0.26  0.26  0.    0.    0.    0.    0.    0.26  0.    0.    0.26  0.26
      0.    0.    0.26  0.26  0.26  0.    0.329 0.    0.   ]
     [0.245 0.245 0.    0.    0.245 0.245 0.245 0.245 0.    0.    0.245 0.245
      0.    0.    0.    0.    0.    0.    0.245 0.    0.245 0.245 0.    0.
      0.245 0.245 0.    0.    0.193 0.245 0.    0.245 0.245]]
    {' doc': 0,
     ' doc umen': 1,
     ' is ': 2,
     ' is  the ': 3,
     ' sec': 4,
     ' sec ond ': 5,
     ' the': 6,
     
    TraceableTfidfVectorizer from scikit-learn
    [[0.    0.    0.329 0.329 0.    0.    0.    0.    0.26  0.26  0.    0.
      0.26  0.26  0.    0.    0.    0.    0.    0.26  0.    0.    0.26  0.26
      0.    0.    0.26  0.26  0.26  0.    0.329 0.    0.   ]
     [0.245 0.245 0.    0.    0.245 0.245 0.245 0.245 0.    0.    0.245 0.245
      0.    0.    0.    0.    0.    0.    0.245 0.    0.245 0.245 0.    0.
      0.245 0.245 0.    0.    0.193 0.245 0.    0.245 0.245]]
    {(' doc',): 0,
     (' doc', 'umen'): 1,
     (' is ',): 2,
     (' is ', 'the '): 3,
     (' sec',): 4,
     (' sec', '
set_fit_request(*, raw_documents: bool | None | str = '$UNCHANGED$') TraceableTfidfVectorizer#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

raw_documentsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for raw_documents parameter in fit.

Returns#

selfobject

The updated object.

set_transform_request(*, raw_documents: bool | None | str = '$UNCHANGED$') TraceableTfidfVectorizer#

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

raw_documentsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for raw_documents parameter in transform.

Returns#

selfobject

The updated object.

Exploration in C#

Losses#

mlinsights.mlmodel.quantile_mlpregressor.absolute_loss(y_true, y_pred)[source]#

Computes the absolute loss for regression.

Parameters:
  • y_true – array-like or label indicator matrix Ground truth (correct) values.

  • y_pred – array-like or label indicator matrix Predicted values, as returned by a regression estimator.

Returns:

loss, float The degree to which the samples are correctly predicted.

Hidden API#

_switch_clusters#

mlinsights.mlmodel._kmeans_constraint_._switch_clusters(labels, distances)[source]#

Tries to switch clusters. Modifies labels inplace.

Parameters:
  • labels – labels

  • distances – distances