Short Examples¶
DictVectorizer or CategoriesToIntegers
Example which transforms text into integers:
<<<
import pandas
from mlinsights.mlmodel import CategoriesToIntegers
df = pandas.DataFrame([{"cat": "a"}, {"cat": "b"}])
trans = CategoriesToIntegers()
trans.fit(df)
newdf = trans.transform(df)
print(newdf)
>>>
cat=a cat=b
0 1.0 NaN
1 NaN 1.0
(original entry : categories_to_integers.py:docstring of mlinsights.mlmodel.categories_to_integers.CategoriesToIntegers, line 18)
Stacking de plusieurs learners dans un pipeline scikit-learn.
Ce transform assemble les résultats de plusieurs learners. Ces features servent d’entrée à un modèle de stacking.
<<<
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.pipeline import make_pipeline
from mlinsights.sklapi import SkBaseTransformStacking
data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
trans = SkBaseTransformStacking([LogisticRegression(), DecisionTreeClassifier()])
trans.fit(X_train, y_train)
pred = trans.transform(X_test)
print(pred[3:])
>>>
[[0 0]
[0 0]
[1 1]
[2 2]
[0 0]
[2 2]
[2 2]
[0 0]
[2 2]
[1 1]
[0 0]
[0 0]
[0 0]
[2 2]
[1 1]
[1 1]
[0 0]
[0 0]
[1 1]
[0 0]
[0 0]
[0 0]
[1 1]
[1 1]
[1 1]
[0 0]
[2 2]
[2 2]
[2 2]
[1 1]
[1 1]
[2 2]
[2 2]
[1 1]
[0 0]]
(original entry : sklearn_base_transform_stacking.py:docstring of mlinsights.sklapi.sklearn_base_transform_stacking.SkBaseTransformStacking, line 4)
Use two learners into a same pipeline
It is impossible to use two learners into a pipeline
unless we use a class such as SkBaseTransformLearner
which disguise a learner into a transform.
<<<
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.pipeline import make_pipeline
from mlinsights.sklapi import SkBaseTransformLearner
data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
try:
pipe = make_pipeline(LogisticRegression(), DecisionTreeClassifier())
except Exception as e:
print("ERROR:")
print(e)
print(".")
pipe = make_pipeline(
SkBaseTransformLearner(LogisticRegression()), DecisionTreeClassifier()
)
pipe.fit(X_train, y_train)
pred = pipe.predict(X_test)
score = accuracy_score(y_test, pred)
print("pipeline avec deux learners :", score)
>>>
pipeline avec deux learners : 0.9736842105263158
(original entry : sklearn_base_transform_learner.py:docstring of mlinsights.sklapi.sklearn_base_transform_learner.SkBaseTransformLearner, line 8)