mlinsights.mlmodel#
Helpers#
model_featurizer#
- mlinsights.mlmodel.ml_featurizer.model_featurizer(model, **params)[source]#
Converts a machine learned model into a function which converts a vector into features produced by the model. It can be the output itself or intermediate results. The model can come from scikit-learn, torch.
- Parameters:
model – model
params – additional parameters
- Returns:
function
Clustering#
ConstraintKMeans#
- class mlinsights.mlmodel.kmeans_constraint.ConstraintKMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=500, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='lloyd', balanced_predictions=False, strategy='gain', kmeans0=True, learning_rate=1.0, history=False)[source]#
Defines a constraint k-means. Clusters are modified to have an equal size. The algorithm is initialized with a regular KMeans and continues with a modified version of it.
Computing the predictions offer a choice. The first one is to keep the predictions from the regular k-means algorithm but with the balanced clusters. The second is to compute balanced predictions over the test set. That implies that the predictions for the same observations might change depending on the set it belongs to.
The parameter strategy determines how obseervations should be assigned to a cluster. The value can be:
'distance'
: observations are ranked by distance to a cluster, the algorithm assigns first point to the closest center unless it reached the maximum size, it deals first with the further point and maps it to the closest center'gain'
: follows the algorithm described at
'weights'
: estimates weights attached to each cluster,it weights the distance to each cluster in order to balance the number of points mapped to every cluster, the strategy uses a learning rate.
The first two strategies cannot reach a good compromise without using function
_switch_clusters
which tries every switch between clusters: two points change clusters. It keeps the number of points and checks that the inertia is reduced.- Parameters:
n_clusters – number of clusters
init – used by k-means
n_init – used by k-means
max_iter – used by k-means
tol – used by k-means
verbose – used by k-means
random_state – used by k-means
copy_x – used by k-means
algorithm – used by k-means
balanced_predictions – produced balanced prediction or the regular ones
strategy – strategy or algorithm used to abide by the constraint
kmeans0 – if True, applies k-means algorithm first
history – keeps centers accress iterations
learning_rate – learning rate, used by strategy ‘weights’
- constraint_kmeans(X, sample_weight=None, state=None, learning_rate=1.0, history=False)[source]#
Completes the constraint k-means.
- Parameters:
X – features
sample_weight – sample weight
state – state
learning_rate – learning rate
history – keeps evolution of centers
- fit(X, y=None, sample_weight=None)[source]#
Compute k-means clustering.
- Parameters:
X – array-like or sparse matrix, shape=(n_samples, n_features) Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous.
y – Ignored
sample_weight – sample weight
- score(X, y=None, sample_weight=None)[source]#
Returns the distances to all clusters.
- Parameters:
X – features
y – unused
sample_weight – sample weight
- Returns:
distances
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ConstraintKMeans #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ConstraintKMeans #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns#
- selfobject
The updated object.
KMeansL1L2#
- class mlinsights.mlmodel.kmeans_l1.KMeansL1L2(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='lloyd', norm='L2')[source]#
K-Means clustering with either norm L1 or L2. See notebook KMeans with norm L1 for an example.
- Parameters:
n_clusters – int, default=8 The number of clusters to form as well as the number of centroids to generate.
init –
{‘k-means++’, ‘random’} or ndarray of shape (n_clusters, n_features), default=’k-means++’ Method for initialization, defaults to ‘k-means++’:
’k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.
’random’: choose k observations (rows) at random from data for the initial centroids.
If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.
n_init – int, default=10 Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
max_iter – int, default=300 Maximum number of iterations of the k-means algorithm for a single run.
tol – float, default=1e-4 Relative tolerance with regards to inertia to declare convergence.
verbose – int, default=0 Verbosity mode.
random_state – int, RandomState instance, default=None Determines random number generation for centroid initialization. Use an int to make the randomness deterministic.
copy_x – bool, default=True When pre-computing distances it is more numerically accurate to center the data first. If copy_x is True (default), then the original data is not modified, ensuring X is C-contiguous. If False, the original data is modified, and put back before the function returns, but small numerical differences may be introduced by subtracting and then adding the data mean, in this case it will also not ensure that data is C-contiguous which may cause a significant slowdown.
algorithm – {“lloyd”, “elkan”}, default=”lloyd” K-means algorithm to use. The classical EM-style algorithm is “lloyd”. The “elkan” variation is more efficient by using the triangle inequality, but currently doesn’t support sparse data.
norm – {“L1”, “L2”} The norm L2 is identical to KMeans. Norm L1 uses a complete different path.
Fitted attributes:
- cluster_centers_: ndarray of shape (n_clusters, n_features)
Coordinates of cluster centers. If the algorithm stops before fully converging (see
tol
andmax_iter
), these will not be consistent withlabels_
.
- labels_: ndarray of shape (n_samples,)
Labels of each point
- inertia_: float
Sum of squared distances of samples to their closest cluster center.
- n_iter_: int
Number of iterations run.
- fit(X, y=None, sample_weight=None)[source]#
Computes k-means clustering.
- Parameters:
X – array-like or sparse matrix, shape=(n_samples, n_features) Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous.
y – Ignored Not used, present here for API consistency by convention.
sample_weight – array-like, shape (n_samples,), optional The weights for each observation in X. If None, all observations are assigned equal weight (default: None).
- Returns:
self Fitted estimator.
- predict(X, sample_weight=None)[source]#
Predicts the closest cluster each sample in X belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
- Parameters:
X – {array-like, sparse matrix} of shape (n_samples, n_features) New data to predict.
sample_weight – array-like, shape (n_samples,), optional The weights for each observation in X. If None, all observations are assigned equal weight (default: None), unused here
- Returns:
labels : array, shape [n_samples,] Index of the cluster each sample belongs to.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KMeansL1L2 #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
- set_predict_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KMeansL1L2 #
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inpredict
.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KMeansL1L2 #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns#
- selfobject
The updated object.
- transform(X)[source]#
Transforms X to a cluster-distance space.
In the new space, each dimension is the distance to the cluster centers. Note that even if X is sparse, the array returned by transform will typically be dense.
- Parameters:
X – {array-like, sparse matrix} of shape (n_samples, n_features) New data to transform.
- Returns:
X_new : array, shape [n_samples, k] X transformed in the new space.
Trainers#
ClassifierAfterKMeans#
- class mlinsights.mlmodel.classification_kmeans.ClassifierAfterKMeans(estimator=None, clus=None, **kwargs)[source]#
Applies a k-means (see sklearn.cluster.KMeans) for each class, then adds the distance to each cluster as a feature for a classifier. See example LogisticRegression and Clustering.
- Parameters:
estimator –
sklearn.linear_model.LogisiticRegression
by defaultclus – clustering applied on each class, by default k-means with two classes
kwargs – sent to
set_params
, see its documentation to understand how to specify parameters
- fit(X, y, sample_weight=None)[source]#
Runs a k-means on each class then trains a classifier on the extended set of features.
- Parameters:
X – numpy array or sparse matrix of shape [n_samples,n_features] Training data
y – numpy array of shape [n_samples, n_targets] Target values. Will be cast to X’s dtype if necessary
sample_weight – numpy array of shape [n_samples] Individual weights for each sample
- Returns:
self : returns an instance of self.
Fitting attributes: * labels_: dictionary of clustering models * clus_: array of clustering models * estimator_: trained classifier
- get_params(deep=True)[source]#
Returns the parameters for both the clustering and the classifier.
- Parameters:
deep – unused here
- Returns:
dict
set_params
describes the pattern parameters names follow.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ClassifierAfterKMeans #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
- set_params(**values)[source]#
Sets the parameters before training. Every parameter prefixed by
'e_'
is an estimator parameter, every parameter prefixed by'c_'
is for the sklearn.cluster.KMeans.- Parameters:
values – valeurs
- Returns:
dict
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ClassifierAfterKMeans #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns#
- selfobject
The updated object.
CustomizedMultilayerPerceptron#
- class mlinsights.mlmodel.quantile_mlpregressor.CustomizedMultilayerPerceptron(hidden_layer_sizes, activation, solver, alpha, batch_size, learning_rate, learning_rate_init, power_t, max_iter, loss, shuffle, random_state, tol, verbose, warm_start, momentum, nesterovs_momentum, early_stopping, validation_fraction, beta_1, beta_2, epsilon, n_iter_no_change, max_fun)[source]#
Customized MLP Perceptron based on BaseMultilayerPerceptron.
IntervalRegressor#
- class mlinsights.mlmodel.interval_regressor.IntervalRegressor(estimator=None, n_estimators=10, n_jobs=None, alpha=1.0, verbose=False)[source]#
Trains multiple regressors to provide a confidence interval on prediction. It only works for single regression. Every training is made with a new sample of the training data, parameter alpha let the user choose the size of this sample. A smaller alpha increases the variance of the predictions. The current implementation draws sample by random but keeps the weight associated to each of them. Another way could be to draw a weighted sample but give them uniform weights.
- Parameters:
estimator – predictor trained on every bucket
n_estimators – number of estimators to train
n_jobs – number of parallel jobs (for training and predicting)
alpha – proportion of samples resampled for each training
verbose – boolean or use
'tqdm'
to use tqdm to fit the estimators
- fit(X, y, sample_weight=None)[source]#
Trains the binner and an estimator on every bucket.
- Parameters:
X – features, X is converted into an array if X is a dataframe
y – target
sample_weight – sample weights
- Returns:
self: returns an instance of self.
Fitted attributes:
binner_: binner
- estimators_: dictionary of estimators, each of them
mapped to a leave to the tree
- mean_estimator_: estimator trained on the whole
datasets in case the binner can find a bucket for a new observation
dim_: dimension of the output
mean_: average targets
- property n_estimators_#
Returns the number of estimators = the number of buckets the data was split in.
- predict(X)[source]#
Computes the average predictions.
- Parameters:
X – features, X is converted into an array if X is a dataframe
- Returns:
predictions
- predict_all(X)[source]#
Computes the predictions for all estimators.
- Parameters:
X – features, X is converted into an array if X is a dataframe
- Returns:
predictions
- predict_sorted(X)[source]#
Computes the predictions for all estimators. Sorts them for all observations.
- Parameters:
X – features, X is converted into an array if X is a dataframe
- Returns:
predictions sorted for each observation
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IntervalRegressor #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IntervalRegressor #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns#
- selfobject
The updated object.
ApproximateNMFPredictor#
- class mlinsights.mlmodel.anmf_predictor.ApproximateNMFPredictor(force_positive=False, **kwargs)[source]#
Converts sklearn.decomposition.NMF into a predictor so that the prediction does not involve training even for new observations. The class uses a sklearn.decomposition.TruncatedSVD of the components found by the sklearn.decomposition.NMF. The prediction projects the test data into the components vector space and retrieves them back into their original space. The issue is it does not necessarily produce results with only positive results as the sklearn.decomposition.NMF would do unless parameter force_positive is True.
<<<
import numpy from mlinsights.mlmodel.anmf_predictor import ApproximateNMFPredictor train = numpy.array( [[1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0]], dtype=numpy.float64, ) train[: train.shape[1], :] += numpy.identity(train.shape[1]) model = ApproximateNMFPredictor(n_components=2, force_positive=True) model.fit(train) test = numpy.array([[1, 1, 1, 0]], dtype=numpy.float64) pred = model.predict(test) print(pred)
>>>
[[1.155 0.84 0.05 0.05 ]]
- fit(X, y=None)[source]#
Trains a sklearn.decomposition.NMF then a multi-output regressor.
- predict(X)[source]#
Predicts based on the multi-output regressor. The output has the same dimension as X.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ApproximateNMFPredictor #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns#
- selfobject
The updated object.
PiecewiseClassifier#
- class mlinsights.mlmodel.piecewise_estimator.PiecewiseClassifier(binner=None, estimator=None, n_jobs=None, random_state=None, verbose=False)[source]#
Uses a decision tree to split the space of features into buckets and trains a logistic regression (default) on each of them. The second estimator is usually a sklearn.linear_model.LogisticRegression. It can also be sklearn.dummy.DummyClassifier to just get the average on each bucket.
- Parameters:
binner – transformer or predictor which creates the buckets
estimator – predictor trained on every bucket
n_jobs – number of parallel jobs (for training and predicting)
random_state – to pick up random examples when buckets do not contain enough examples of each class
verbose – boolean or use
'tqdm'
to use tqdm to fit the estimators
binner allows the following values:
tree
: the model issklearn.tree.DecisionTreeClassifier
'bins'
: the modelsklearn.preprocessing.KBinsDiscretizer
any instanciated model
estimator allows the following values:
None
: the model issklearn.linear_model.LogisticRegression
any instanciated model
The main issue with the PiecewiseClassifier is that each piece requires one example of each class in each bucket which may not happen. To avoid that, the training will pick up random example from other bucket to ensure this case does not happen.
- decision_function(X)[source]#
Computes the predictions probabilities.
- Parameters:
X – features, X is converted into an array if X is a dataframe
- Returns:
predictions probabilities
- predict(X)[source]#
Computes the predictions.
- Parameters:
X – features, X is converted into an array if X is a dataframe
- Returns:
predictions
- predict_proba(X)[source]#
Computes the predictions probabilities.
- Parameters:
X – features, X is converted into an array if X is a dataframe
- Returns:
predictions probabilities
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseClassifier #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseClassifier #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns#
- selfobject
The updated object.
PiecewiseRegressor#
- class mlinsights.mlmodel.piecewise_estimator.PiecewiseRegressor(binner=None, estimator=None, n_jobs=None, verbose=False)[source]#
Uses a decision tree to split the space of features into buckets and trains a linear regression (default) on each of them. The second estimator is usually a sklearn.linear_model.LinearRegression. It can also be sklearn.dummy.DummyRegressor to just get the average on each bucket.
- Parameters:
binner – transformer or predictor which creates the buckets
estimator – predictor trained on every bucket
n_jobs – number of parallel jobs (for training and predicting)
verbose – boolean or use
'tqdm'
to use tqdm to fit the estimators
binner allows the following values:
tree
: the model issklearn.tree.DecisionTreeRegressor
'bins'
: the modelsklearn.preprocessing.KBinsDiscretizer
any instanciated model
estimator allows the following values:
None
: the model is sklearn.linear_model.LinearRegressionany instanciated model
- predict(X)[source]#
Computes the predictions.
- Parameters:
X – features, X is converted into an array if X is a dataframe
- Returns:
predictions
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseRegressor #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PiecewiseRegressor #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns#
- selfobject
The updated object.
QuantileMLPRegressor#
- class mlinsights.mlmodel.quantile_mlpregressor.QuantileMLPRegressor(hidden_layer_sizes=(100,), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, **kwargs)[source]#
Quantile MLP Regression or neural networks regression trained with norm L1. This class inherits from sklearn.neural_networks.MLPRegressor. This model optimizes the absolute-loss using LBFGS or stochastic gradient descent. See
CustomizedMultilayerPerceptron
andabsolute_loss
.- Parameters:
hidden_layer_sizes – tuple, length = n_layers - 2, default (100,) The ith element represents the number of neurons in the ith hidden layer.
activation – {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default ‘relu’ Activation function for the hidden layer. ‘identity’, no-op activation, useful to implement linear bottleneck, returns , ‘logistic’, the logistic sigmoid function, returns . ‘tanh’, the hyperbolic tan function, returns . ‘relu’, the rectified linear unit function, returns .
solver –
{'lbfgs', 'sgd', 'adam'}
, default ‘adam’ The solver for weight optimization, ‘lbfgs’ is an optimizer in the family of quasi-Newton methods. ‘sgd’ refers to stochastic gradient descent. ‘adam’ refers to a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba Note: The default solver ‘adam’ works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, ‘lbfgs’ can converge faster and perform better.alpha – float, optional, default 0.0001 L2 penalty (regularization term) parameter.
batch_size – int, optional, default ‘auto’ Size of minibatches for stochastic optimizers. If the solver is ‘lbfgs’, the classifier will not use minibatch. When set to “auto”, batch_size=min(200, n_samples)
learning_rate – {‘constant’, ‘invscaling’, ‘adaptive’}, default ‘constant’ Learning rate schedule for weight updates. ‘constant’ is a constant learning rate given by ‘learning_rate_init’, ‘invscaling’ gradually decreases the learning rate
learning_rate_
at each time step ‘t’ using an inverse scaling exponent of ‘power_t’. effective_learning_rate = learning_rate_init / pow(t, power_t), ‘adaptive’ keeps the learning rate constant to ‘learning_rate_init’ as long as training loss keeps decreasing. Each time two consecutive epochs fail to decrease training loss by at least tol, or fail to increase validation score by at least tol if ‘early_stopping’ is on, the current learning rate is divided by 5. Only used when solver=’sgd’.learning_rate_init – double, optional, default 0.001 The initial learning rate used. It controls the step-size in updating the weights. Only used when solver=’sgd’ or ‘adam’.
power_t – double, optional, default 0.5 The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to ‘invscaling’. Only used when solver=’sgd’.
max_iter – int, optional, default 200 Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.
shuffle – bool, optional, default True Whether to shuffle samples in each iteration. Only used when solver=’sgd’ or ‘adam’.
random_state – int, RandomState instance or None, optional, default None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
tol – float, optional, default 1e-4 Tolerance for the optimization. When the loss or score is not improving by at least
tol
forn_iter_no_change
consecutive iterations, unlesslearning_rate
is set to ‘adaptive’, convergence is considered to be reached and training stops.verbose – bool, optional, default False Whether to print progress messages to stdout.
warm_start – bool, optional, default False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
momentum – float, default 0.9 Momentum for gradient descent update. Should be between 0 and 1. Only used when solver=’sgd’.
nesterovs_momentum – boolean, default True Whether to use Nesterov’s momentum. Only used when solver=’sgd’ and momentum > 0.
early_stopping – bool, default False Whether to use early stopping to terminate training when validation score is not improving. If set to true, it will automatically set aside 10% of training data as validation and terminate training when validation score is not improving by at least
tol
forn_iter_no_change
consecutive epochs. Only effective when solver=’sgd’ or ‘adam’validation_fraction – float, optional, default 0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True
beta_1 – float, optional, default 0.9 Exponential decay rate for estimates of first moment vector in adam, should be in [0, 1). Only used when solver=’adam’
beta_2 – float, optional, default 0.999 Exponential decay rate for estimates of second moment vector in adam, should be in [0, 1). Only used when solver=’adam’
epsilon – float, optional, default 1e-8 Value for numerical stability in adam. Only used when solver=’adam’
n_iter_no_change – int, optional, default 10 Maximum number of epochs to not meet
tol
improvement. Only effective when solver=’sgd’ or ‘adam’kwargs – additional parameters sent to the constructor of the parent
Fitted attributes:
loss_: float The current loss computed with the loss function.
coefs_: list, length n_layers - 1 The ith element in the list represents the weight matrix corresponding to layer i.
intercepts_: list, length n_layers - 1 The ith element in the list represents the bias vector corresponding to layer i + 1.
n_iter_: int, The number of iterations the solver has ran.
n_layers_: int Number of layers.
n_outputs_: int Number of outputs.
out_activation_: string Name of the output activation function.
- predict(X)[source]#
Predicts using the multi-layer perceptron model.
- Parameters:
X – {array-like, sparse matrix}, shape (n_samples, n_features) The input data.
- Returns:
y : array-like, shape (n_samples, n_outputs) The predicted values.
- score(X, y, sample_weight=None)[source]#
Returns mean absolute error regression loss.
- Parameters:
X – array-like, shape = (n_samples, n_features) Test samples.
y – array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X.
sample_weight – array-like, shape = [n_samples], optional Sample weights.
- Returns:
score, float mean absolute error regression loss
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') QuantileMLPRegressor #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns#
- selfobject
The updated object.
QuantileLinearRegression#
- class mlinsights.mlmodel.quantile_regression.QuantileLinearRegression(fit_intercept=True, copy_X=True, n_jobs=1, delta=0.0001, max_iter=10, quantile=0.5, positive=False, verbose=False)[source]#
Quantile Linear Regression or linear regression trained with norm L1. This class inherits from sklearn.linear_models.LinearRegression. See example Quantile Regression.
Norm L1 is chosen if
quantile=0.5
, otherwise, for quantile=, the following error is optimized:where and . is the prediction, the expected value.
- Parameters:
fit_intercept – boolean, optional, default True whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (e.g. data is expected to be already centered).
copy_X – boolean, optional, default True If True, X will be copied; else, it may be overwritten.
n_jobs – int, optional, default 1 The number of jobs to use for the computation. If -1 all CPUs are used. This will only provide speedup for n_targets > 1 and sufficient large problems.
max_iter – int, optional, default 1 The number of iteration to do at training time. This parameter is specific to the quantile regression.
delta – float, optional, default 0.0001 Used to ensure matrices has an inverse (M + delta*I).
quantile – float, by default 0.5, determines which quantile to use to estimate the regression.
positive – when set to True, forces the coefficients to be positive.
verbose – bool, optional, default False Prints error at each iteration of the optimisation.
- fit(X, y, sample_weight=None)[source]#
Fits a linear model with L1 norm which is equivalent to a quantile regression. The implementation is not the most efficient as it calls multiple times method fit from sklearn.linear_models.LinearRegression. Data gets checked and rescaled each time. The optimization follows the algorithm Iteratively reweighted least squares. It is described in French at Régression quantile.
- Parameters:
X – numpy array or sparse matrix of shape [n_samples,n_features] Training data
y – numpy array of shape [n_samples, n_targets] Target values. Will be cast to X’s dtype if necessary
sample_weight – numpy array of shape [n_samples] Individual weights for each sample
- Returns:
self, returns an instance of self.
Fitted attributes:
- coef_: array, shape (n_features, ) or (n_targets, n_features)
Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.
- intercept_: array
Independent term in the linear model.
- n_iter_: int
Number of iterations at training time.
- score(X, y, sample_weight=None)[source]#
Returns Mean absolute error regression loss.
- Parameters:
X – array-like, shape = (n_samples, n_features) Test samples.
y – array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X.
sample_weight – array-like, shape = [n_samples], optional Sample weights.
- Returns:
score : float mean absolute error regression loss
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') QuantileLinearRegression #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') QuantileLinearRegression #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns#
- selfobject
The updated object.
TransformedTargetClassifier2#
- class mlinsights.mlmodel.target_predictors.TransformedTargetClassifier2(classifier=None, transformer=None)[source]#
Meta-estimator to classify on a transformed target. Useful for applying permutation transformation in classification problems.
- Parameters:
classifier – object, default=LogisticRegression() Classifier object such as derived from
ClassifierMixin
. This classifier will automatically be cloned each time prior to fitting.transformer – str or object of type
BaseReciprocalTransformer
Transforms the features.
<<<
import numpy from sklearn.linear_model import LogisticRegression from mlinsights.mlmodel import TransformedTargetClassifier2 tt = TransformedTargetClassifier2( classifier=LogisticRegression(), transformer="permute" ) X = numpy.arange(4).reshape(-1, 1) y = numpy.array([0, 1, 0, 1]) print(tt.fit(X, y)) print(tt.score(X, y)) print(tt.classifier_.coef_)
>>>
TransformedTargetClassifier2(classifier=LogisticRegression(), transformer='permute') 0.5 [[-0.453]]
See example Transformed Target for a more complete example.
The class holds two attributes classifier_, the fitted classifier, transformer_ transformer used in
fit
,predict
,decision_function
,predict_proba
.- property classes_#
Returns the classes.
- decision_function(X)[source]#
Predicts using the base classifier, applying inverse.
- Parameters:
X – {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.
- Returns:
raw score : array, shape = (n_samples, ?)
- fit(X, y, sample_weight=None)[source]#
Fits the model according to the given training data.
- Parameters:
X – {array-like, sparse matrix}, shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.
y – array-like, shape (n_samples,) Target values.
sample_weight – array-like, shape (n_samples,) optional Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- Returns:
self, object
- predict(X)[source]#
Predicts using the base classifier, applying inverse.
- Parameters:
X – {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.
- Returns:
y_hat, array, shape = (n_samples,) Predicted values.
- predict_proba(X)[source]#
Predicts using the base classifier, applying inverse.
- Parameters:
X – {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.
- Returns:
predict probabilities, array, shape = (n_samples, n_classes) Predicted values.
- score(X, y, sample_weight=None)[source]#
Scores the model with sklearn.metrics.accuracy_score.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TransformedTargetClassifier2 #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TransformedTargetClassifier2 #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns#
- selfobject
The updated object.
TransformedTargetRegressor2#
- class mlinsights.mlmodel.target_predictors.TransformedTargetRegressor2(regressor=None, transformer=None)[source]#
Meta-estimator to regress on a transformed target. Useful for applying a non-linear transformation in regression problems.
- Parameters:
regressor – object, default=LinearRegression() Regressor object such as derived from
RegressorMixin
. This regressor will automatically be cloned each time prior to fitting.transformer – str or object of type
BaseReciprocalTransformer
<<<
import numpy from sklearn.linear_model import LinearRegression from mlinsights.mlmodel import TransformedTargetRegressor2 tt = TransformedTargetRegressor2(regressor=LinearRegression(), transformer="log") X = numpy.arange(4).reshape(-1, 1) y = numpy.exp(2 * X).ravel() print(tt.fit(X, y)) print(tt.score(X, y)) print(tt.regressor_.coef_)
>>>
TransformedTargetRegressor2(regressor=LinearRegression(), transformer='log') 1.0 [2.]
See example Transformed Target for a more complete example.
The class holds two attributes regressor_, the fitted regressor, transformer_ transformer used in
fit
,predict
,decision_function
,predict_proba
.- fit(X, y, sample_weight=None)[source]#
Fits the model according to the given training data.
- Parameters:
X – {array-like, sparse matrix}, shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.
y – array-like, shape (n_samples,) Target values.
sample_weight – array-like, shape (n_samples,) optional Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- Returns:
self, object
- predict(X)[source]#
Predicts using the base regressor, applying inverse.
- Parameters:
X – {array-like, sparse matrix}, shape = (n_samples, n_features) Samples.
- Returns:
y_hat : array, shape = (n_samples,) Predicted values.
- score(X, y, sample_weight=None)[source]#
Scores the model with sklearn.metrics.r2_score.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TransformedTargetRegressor2 #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TransformedTargetRegressor2 #
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
Returns#
- selfobject
The updated object.
Transforms#
NGramsMixin#
- class mlinsights.mlmodel.sklearn_text.NGramsMixin[source]#
Overloads method _word_ngrams to get tuples instead of string in member vocabulary_. of TfidfVectorizer or CountVectorizer. It contains the list of n-grams used to process documents. See
TraceableCountVectorizer
andTraceableTfidfVectorizer
for example.
BaseReciprocalTransformer#
CategoriesToIntegers#
- class mlinsights.mlmodel.categories_to_integers.CategoriesToIntegers(columns=None, remove=None, skip_errors=False, single=False)[source]#
Does something similar to what DictVectorizer does but in a transformer. The method fit retains all categories, the method transform transforms categories into integers. Categories are sorted by columns. If the method transform tries to convert a categories which was not seen by method fit, it can raise an exception or ignore it and replace it by zero.
- Parameters:
columns – specify a columns selection
remove – modalities to remove
skip_errors – skip when a new categories appear (no 1)
single – use a single column per category, do not multiply them for each value
The logging function displays a message when a new dense and big matrix is created when it should be sparse. A sparse matrix should be allocated instead.
DictVectorizer or CategoriesToIntegers
Example which transforms text into integers:
<<<
import pandas from mlinsights.mlmodel import CategoriesToIntegers df = pandas.DataFrame([{"cat": "a"}, {"cat": "b"}]) trans = CategoriesToIntegers() trans.fit(df) newdf = trans.transform(df) print(newdf)
>>>
cat=a cat=b 0 1.0 NaN 1 NaN 1.0
- fit(X, y=None, **fit_params)[source]#
Makes the list of all categories in input X. X must be a dataframe.
- Parameters:
X – iterable Training data
y – iterable, default=None Training targets.
fit_params – additional fit params
- Returns:
self
- fit_transform(X, y=None, **fit_params)[source]#
Fits and transforms categories in numerical features based on the list of categories found by method fit. X must be a dataframe. The function does not preserve the order of the columns.
- Parameters:
X – iterable Training data
y – iterable, default=None Training targets.
fit_params – additional fitting parameters
- Returns:
Dataframe, X with categories.
- transform(X, y=None)[source]#
Transforms categories in numerical features based on the list of categories found by method fit. X must be a dataframe. The function does not preserve the order of the columns.
- Parameters:
X – iterable Training data
y – iterable, default=None Training targets.
- Returns:
DataFrame, X with categories.
ExtendedFeatures#
- class mlinsights.mlmodel.extended_features.ExtendedFeatures(kind='poly', poly_degree=2, poly_interaction_only=False, poly_include_bias=True)[source]#
Generates extended features such as polynomial features.
- Parameters:
kind – string
'poly'
for polynomial features,'poly-slow'
for polynomial features in scikit-learn 0.20.2poly_degree – integer The degree of the polynomial features. Default = 2.
poly_interaction_only – boolean If true, only interaction features are produced: features that are products of at most degree distinct input features (so not
x[1] ** 2, x[0] * x[2] ** 3
, etc.).poly_include_bias – boolean If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).
Fitted attributes:
- n_input_features_: int
The total number of input features.
- n_output_features_: int
The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.
- fit(X, y=None)[source]#
Compute number of output features.
- Parameters:
X – array-like, shape (n_samples, n_features) The data.
y – targets
- Returns:
self : instance
- get_feature_names_out(input_features=None)[source]#
Returns feature names for output features.
- Parameters:
input_features – list of string, length n_features, optional String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- Returns:
output_feature_names : list of string, length n_output_features
FunctionReciprocalTransformer#
- class mlinsights.mlmodel.sklearn_transform_inv_fct.FunctionReciprocalTransformer(fct, fct_inv=None)[source]#
The transform is used to apply a function on a the target, predict, then transform the target back before scoring. The transforms implements a series of predefined functions:
- Parameters:
fct – function name of numerical function
fct_inv – optional if fct is a function name, reciprocal function otherwise
<<<
import pprint from mlinsights.mlmodel.sklearn_transform_inv_fct import FunctionReciprocalTransformer pprint.pprint(FunctionReciprocalTransformer.available_fcts())
>>>
{'exp': (<ufunc 'exp'>, 'log'), 'exp(x)-1': (<function FunctionReciprocalTransformer.available_fcts.<locals>.<lambda> at 0x7fc72559d6c0>, 'log'), 'expm1': (<ufunc 'expm1'>, 'log1p'), 'log': (<ufunc 'log'>, 'exp'), 'log(1+x)': (<function FunctionReciprocalTransformer.available_fcts.<locals>.<lambda> at 0x7fc72559cf70>, 'exp(x)-1'), 'log1p': (<ufunc 'log1p'>, 'expm1')}
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') FunctionReciprocalTransformer #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
PermutationReciprocalTransformer#
- class mlinsights.mlmodel.sklearn_transform_inv_fct.PermutationReciprocalTransformer(random_state=None, closest=False)[source]#
The transform is used to permute targets, predict, then permute the target back before scoring. nan values remain nan values. Once fitted, the transform has attribute
permutation_
which keeps track of the permutation to apply.- Parameters:
random_state – random state
closest – if True, finds the closest permuted element
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PermutationReciprocalTransformer #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
PredictableTSNE#
- class mlinsights.mlmodel.predictable_tsne.PredictableTSNE(normalizer=None, transformer=None, estimator=None, normalize=True, keep_tsne_outputs=False)[source]#
t-SNE is an interesting transform which can only be used to study data as there is no way to reproduce the result once it was fitted. That’s why the class TSNE does not have any method transform, only fit_transform. This example proposes a way to train a machine learned model which approximates the outputs of a TSNE transformer. Example Predictable t-SNE gives an example on how to use this class.
- Parameters:
normalizer – None by default
transformer – sklearn.manifold.TSNE by default
estimator – sklearn.neural_network.MLPRegressor by default
normalize – normalizes the outputs, centers and normalizes the output of the t-SNE and applies that same normalization to he prediction of the estimator
keep_tsne_outputs – if True, keep raw outputs of TSNE is stored in member tsne_outputs_
- fit(X, y, sample_weight=None)[source]#
Trains a TSNE then trains an estimator to approximate its outputs.
- Parameters:
X – numpy array or sparse matrix of shape [n_samples,n_features] Training data
y – numpy array of shape [n_samples, n_targets] Target values. Will be cast to X’s dtype if necessary
sample_weight – numpy array of shape [n_samples] Individual weights for each sample
- Returns:
self, returns an instance of self.
Fitted attributes:
normalizer_: trained normalier
transformer_: trained transformeer
estimator_: trained regressor
tsne_outputs_: t-SNE outputs if keep_tsne_outputs is True
mean_: average of the t-SNE output on each dimension
inv_std_: inverse of the standard deviation of the t-SNE output on each dimension
loss_: loss (sklearn.metrics.mean_squared_error) between the predictions and the outputs of t-SNE
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PredictableTSNE #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
TransferTransformer#
- class mlinsights.mlmodel.transfer_transformer.TransferTransformer(estimator, method=None, copy_estimator=True, trainable=False)[source]#
Wraps a predictor or a transformer in a transformer. This model is frozen: it cannot be trained and only computes the predictions.
- Parameters:
estimator – estimator to wrap in a transformer, it is clone with the training data (deep copy) when fitted
method – if None, guess what method should be called, transform for a transformer, predict_proba for a classifier, decision_function if found, predict otherwiser
copy_estimator – copy the model instead of taking a reference
trainable – the transfered model must be trained
- fit(X=None, y=None, sample_weight=None)[source]#
The function does nothing.
- Parameters:
X – unused
y – unused
sample_weight – unused
- Returns:
self: returns an instance of self.
Fitted attributes:
estimator_: already trained estimator
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TransferTransformer #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.
Returns#
- selfobject
The updated object.
TraceableCountVectorizer#
- class mlinsights.mlmodel.sklearn_text.TraceableCountVectorizer(*, input='content', encoding='utf-8', decode_error='strict', strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, stop_words=None, token_pattern='(?u)\\b\\w\\w+\\b', ngram_range=(1, 1), analyzer='word', max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=<class 'numpy.int64'>)[source]#
Inherits from
NGramsMixin
which overloads method _word_ngrams to keep more information about n-grams but still produces the same outputs than CountVectorizer.<<<
import numpy from sklearn.feature_extraction.text import CountVectorizer from mlinsights.mlmodel.sklearn_text import TraceableCountVectorizer from pprint import pformat corpus = numpy.array( [ "This is the first document.", "This document is the second document.", "Is this the first document?", "", ] ).reshape((4,)) print("CountVectorizer from scikit-learn") mod1 = CountVectorizer(ngram_range=(1, 2)) mod1.fit(corpus) print(mod1.transform(corpus).todense()[:2]) print(pformat(mod1.vocabulary_)[:100]) print("TraceableCountVectorizer from scikit-learn") mod2 = TraceableCountVectorizer(ngram_range=(1, 2)) mod2.fit(corpus) print(mod2.transform(corpus).todense()[:2]) print(pformat(mod2.vocabulary_)[:100])
>>>
CountVectorizer from scikit-learn [[1 0 1 1 1 1 0 0 0 1 1 0 1 0 1 0] [2 1 0 0 1 1 0 1 1 1 0 1 1 1 0 0]] {'document': 0, 'document is': 1, 'first': 2, 'first document': 3, 'is': 4, 'is the': 5, 'is t TraceableCountVectorizer from scikit-learn [[1 0 1 1 1 1 0 0 0 1 1 0 1 0 1 0] [2 1 0 0 1 1 0 1 1 1 0 1 1 1 0 0]] {('document',): 0, ('document', 'is'): 1, ('first',): 2, ('first', 'document'): 3, ('is',): 4,
A weirder example with
TraceableTfidfVectorizer
shows more differences.- set_fit_request(*, raw_documents: bool | None | str = '$UNCHANGED$') TraceableCountVectorizer #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- raw_documentsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
raw_documents
parameter infit
.
Returns#
- selfobject
The updated object.
- set_transform_request(*, raw_documents: bool | None | str = '$UNCHANGED$') TraceableCountVectorizer #
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- raw_documentsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
raw_documents
parameter intransform
.
Returns#
- selfobject
The updated object.
TraceableTfidfVectorizer#
- class mlinsights.mlmodel.sklearn_text.TraceableTfidfVectorizer(*, input='content', encoding='utf-8', decode_error='strict', strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, analyzer='word', stop_words=None, token_pattern='(?u)\\b\\w\\w+\\b', ngram_range=(1, 1), max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=<class 'numpy.float64'>, norm='l2', use_idf=True, smooth_idf=True, sublinear_tf=False)[source]#
Inherits from
NGramsMixin
which overloads method _word_ngrams to keep more information about n-grams but still produces the same outputs than TfidfVectorizer.<<<
import numpy from sklearn.feature_extraction.text import TfidfVectorizer from mlinsights.mlmodel.sklearn_text import TraceableTfidfVectorizer from pprint import pformat corpus = numpy.array( [ "This is the first document.", "This document is the second document.", "Is this the first document?", "", ] ).reshape((4,)) print("TfidfVectorizer from scikit-learn") mod1 = TfidfVectorizer(ngram_range=(1, 2), token_pattern="[a-zA-Z ]{1,4}") mod1.fit(corpus) print(mod1.transform(corpus).todense()[:2]) print(pformat(mod1.vocabulary_)[:100]) print("TraceableTfidfVectorizer from scikit-learn") mod2 = TraceableTfidfVectorizer(ngram_range=(1, 2), token_pattern="[a-zA-Z ]{1,4}") mod2.fit(corpus) print(mod2.transform(corpus).todense()[:2]) print(pformat(mod2.vocabulary_)[:100])
>>>
TfidfVectorizer from scikit-learn [[0. 0. 0.329 0.329 0. 0. 0. 0. 0.26 0.26 0. 0. 0.26 0.26 0. 0. 0. 0. 0. 0.26 0. 0. 0.26 0.26 0. 0. 0.26 0.26 0.26 0. 0.329 0. 0. ] [0.245 0.245 0. 0. 0.245 0.245 0.245 0.245 0. 0. 0.245 0.245 0. 0. 0. 0. 0. 0. 0.245 0. 0.245 0.245 0. 0. 0.245 0.245 0. 0. 0.193 0.245 0. 0.245 0.245]] {' doc': 0, ' doc umen': 1, ' is ': 2, ' is the ': 3, ' sec': 4, ' sec ond ': 5, ' the': 6, TraceableTfidfVectorizer from scikit-learn [[0. 0. 0.329 0.329 0. 0. 0. 0. 0.26 0.26 0. 0. 0.26 0.26 0. 0. 0. 0. 0. 0.26 0. 0. 0.26 0.26 0. 0. 0.26 0.26 0.26 0. 0.329 0. 0. ] [0.245 0.245 0. 0. 0.245 0.245 0.245 0.245 0. 0. 0.245 0.245 0. 0. 0. 0. 0. 0. 0.245 0. 0.245 0.245 0. 0. 0.245 0.245 0. 0. 0.193 0.245 0. 0.245 0.245]] {(' doc',): 0, (' doc', 'umen'): 1, (' is ',): 2, (' is ', 'the '): 3, (' sec',): 4, (' sec', '
- set_fit_request(*, raw_documents: bool | None | str = '$UNCHANGED$') TraceableTfidfVectorizer #
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- raw_documentsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
raw_documents
parameter infit
.
Returns#
- selfobject
The updated object.
- set_transform_request(*, raw_documents: bool | None | str = '$UNCHANGED$') TraceableTfidfVectorizer #
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters#
- raw_documentsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
raw_documents
parameter intransform
.
Returns#
- selfobject
The updated object.
Exploration in C#
Losses#
- mlinsights.mlmodel.quantile_mlpregressor.absolute_loss(y_true, y_pred)[source]#
Computes the absolute loss for regression.
- Parameters:
y_true – array-like or label indicator matrix Ground truth (correct) values.
y_pred – array-like or label indicator matrix Predicted values, as returned by a regression estimator.
- Returns:
loss, float The degree to which the samples are correctly predicted.