.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_visualize_pipeline.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_visualize_pipeline.py: .. _l-visualize-pipeline-example: Visualize a scikit-learn pipeline ================================= Pipeline can be big with *scikit-learn*, let's dig into a visual way to look a them. Simple model ------------ Let's vizualize a simple pipeline, a single model not even trained. .. GENERATED FROM PYTHON SOURCE LINES 15-45 .. code-block:: Python from numpy.random import randn import pandas from PIL import Image from sphinx_runpython.runpython import run_cmd from sklearn import datasets from sklearn.compose import ColumnTransformer from sklearn.impute import SimpleImputer from sklearn.linear_model import LinearRegression, LogisticRegression from sklearn.pipeline import Pipeline, FeatureUnion from sklearn.preprocessing import ( OneHotEncoder, StandardScaler, MinMaxScaler, PolynomialFeatures, ) from mlinsights.helpers.pipeline import ( alter_pipeline_for_debugging, enumerate_pipeline_models, ) from mlinsights.plotting import pipeline2dot, pipeline2str iris = datasets.load_iris() X = iris.data[:, :4] df = pandas.DataFrame(X) df.columns = ["X1", "X2", "X3", "X4"] clf = LogisticRegression() clf .. raw:: html
LogisticRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 46-49 The trick consists in converting the pipeline in a graph through the `DOT `_ language. .. GENERATED FROM PYTHON SOURCE LINES 49-55 .. code-block:: Python dot = pipeline2dot(clf, df) print(dot) .. rst-class:: sphx-glr-script-out .. code-block:: none digraph{ orientation=portrait; nodesep=0.05; ranksep=0.25; sch0[label=" X1| X2| X3| X4",shape=record,fontsize=8]; node1[label="union",shape=box,style="filled,rounded",color=cyan,fontsize=12]; sch0:f0 -> node1; sch0:f1 -> node1; sch0:f2 -> node1; sch0:f3 -> node1; sch1[label=" -v-0",shape=record,fontsize=8]; node1 -> sch1:f0; node2[label="LogisticRegression",shape=box,style="filled,rounded",color=yellow,fontsize=12]; sch1:f0 -> node2; sch2[label=" PredictedLabel| Probabilities",shape=record,fontsize=8]; node2 -> sch2:f0; node2 -> sch2:f1; } .. GENERATED FROM PYTHON SOURCE LINES 56-57 It is lot better with an image. .. GENERATED FROM PYTHON SOURCE LINES 57-64 .. code-block:: Python dot_file = "graph.dot" with open(dot_file, "w", encoding="utf-8") as f: f.write(dot) .. GENERATED FROM PYTHON SOURCE LINES 66-76 .. code-block:: Python cmd = "dot -G=300 -Tpng {0} -o{0}.png".format(dot_file) run_cmd(cmd, wait=True) img = Image.open("graph.dot.png") img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 77-84 Complex pipeline ---------------- *scikit-learn* instroduced a couple of transform to play with features in a single pipeline. The following example is taken from `Column Transformer with Mixed Types `_. .. GENERATED FROM PYTHON SOURCE LINES 84-131 .. code-block:: Python columns = [ "pclass", "name", "sex", "age", "sibsp", "parch", "ticket", "fare", "cabin", "embarked", "boat", "body", "home.dest", ] numeric_features = ["age", "fare"] numeric_transformer = Pipeline( steps=[("imputer", SimpleImputer(strategy="median")), ("scaler", StandardScaler())] ) categorical_features = ["embarked", "sex", "pclass"] categorical_transformer = Pipeline( steps=[ ("imputer", SimpleImputer(strategy="constant", fill_value="missing")), ("onehot", OneHotEncoder(handle_unknown="ignore")), ] ) preprocessor = ColumnTransformer( transformers=[ ("num", numeric_transformer, numeric_features), ("cat", categorical_transformer, categorical_features), ] ) clf = Pipeline( steps=[ ("preprocessor", preprocessor), ("classifier", LogisticRegression(solver="lbfgs")), ] ) clf .. raw:: html
Pipeline(steps=[('preprocessor',
                     ColumnTransformer(transformers=[('num',
                                                      Pipeline(steps=[('imputer',
                                                                       SimpleImputer(strategy='median')),
                                                                      ('scaler',
                                                                       StandardScaler())]),
                                                      ['age', 'fare']),
                                                     ('cat',
                                                      Pipeline(steps=[('imputer',
                                                                       SimpleImputer(fill_value='missing',
                                                                                     strategy='constant')),
                                                                      ('onehot',
                                                                       OneHotEncoder(handle_unknown='ignore'))]),
                                                      ['embarked', 'sex',
                                                       'pclass'])])),
                    ('classifier', LogisticRegression())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 132-133 Let's see it first as a simplified text. .. GENERATED FROM PYTHON SOURCE LINES 133-137 .. code-block:: Python print(pipeline2str(clf)) .. rst-class:: sphx-glr-script-out .. code-block:: none Pipeline ColumnTransformer Pipeline(age,fare) SimpleImputer StandardScaler Pipeline(embarked,sex,pclass) SimpleImputer OneHotEncoder LogisticRegression .. GENERATED FROM PYTHON SOURCE LINES 139-154 .. code-block:: Python dot = pipeline2dot(clf, columns) dot_file = "graph2.dot" with open(dot_file, "w", encoding="utf-8") as f: f.write(dot) cmd = "dot -G=300 -Tpng {0} -o{0}.png".format(dot_file) run_cmd(cmd, wait=True) img = Image.open("graph2.dot.png") img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 155-157 Example with FeatureUnion ------------------------- .. GENERATED FROM PYTHON SOURCE LINES 157-181 .. code-block:: Python model = Pipeline( [ ("poly", PolynomialFeatures()), ( "union", FeatureUnion([("scaler2", MinMaxScaler()), ("scaler3", StandardScaler())]), ), ] ) dot = pipeline2dot(model, columns) dot_file = "graph3.dot" with open(dot_file, "w", encoding="utf-8") as f: f.write(dot) cmd = "dot -G=300 -Tpng {0} -o{0}.png".format(dot_file) run_cmd(cmd, wait=True) img = Image.open("graph3.dot.png") img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 182-184 Compute intermediate outputs ---------------------------- .. GENERATED FROM PYTHON SOURCE LINES 184-205 .. code-block:: Python # It is difficult to access intermediate outputs with *scikit-learn* but # it may be interesting to do so. The method # `alter_pipeline_for_debugging `_ # modifies the pipeline to intercept intermediate outputs. model = Pipeline( [ ("scaler1", StandardScaler()), ( "union", FeatureUnion([("scaler2", StandardScaler()), ("scaler3", MinMaxScaler())]), ), ("lr", LinearRegression()), ] ) X = randn(4, 5) y = randn(4) model.fit(X, y) .. raw:: html
Pipeline(steps=[('scaler1', StandardScaler()),
                    ('union',
                     FeatureUnion(transformer_list=[('scaler2', StandardScaler()),
                                                    ('scaler3', MinMaxScaler())])),
                    ('lr', LinearRegression())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 207-211 .. code-block:: Python print(pipeline2str(model)) .. rst-class:: sphx-glr-script-out .. code-block:: none Pipeline StandardScaler FeatureUnion StandardScaler MinMaxScaler LinearRegression .. GENERATED FROM PYTHON SOURCE LINES 212-213 Let's now modify the pipeline to get the intermediate outputs. .. GENERATED FROM PYTHON SOURCE LINES 213-218 .. code-block:: Python alter_pipeline_for_debugging(model) .. GENERATED FROM PYTHON SOURCE LINES 219-221 The function adds a member ``_debug`` which stores inputs and outputs in every piece of the pipeline. .. GENERATED FROM PYTHON SOURCE LINES 221-224 .. code-block:: Python model.steps[0][1]._debug .. rst-class:: sphx-glr-script-out .. code-block:: none BaseEstimatorDebugInformation(StandardScaler) .. GENERATED FROM PYTHON SOURCE LINES 226-230 .. code-block:: Python model.predict(X) .. rst-class:: sphx-glr-script-out .. code-block:: none array([ 0.57886408, 0.02901042, -1.80801004, -0.13899759]) .. GENERATED FROM PYTHON SOURCE LINES 231-232 The member was populated with inputs and outputs. .. GENERATED FROM PYTHON SOURCE LINES 232-237 .. code-block:: Python model.steps[0][1]._debug .. rst-class:: sphx-glr-script-out .. code-block:: none BaseEstimatorDebugInformation(StandardScaler) transform( shape=(4, 5) type= [[-0.11278923 0.49633964 0.89431531 -0.58705583 -0.89982345] [ 0.30571764 0.50186418 -0.57305771 0.77656605 0.60803094] [-0.70304868 0.32967023 -0.34664322 -0.13516745 -0.76585543] [-1.3830715 -2.00956839 -0.79207642 -0.25670281 0.36958478]] ) -> ( shape=(4, 5) type= [[ 0.56695655 0.62660822 1.68101311 -1.06151875 -1.08975908] [ 1.22512431 0.63180005 -0.56410962 1.63671501 1.16797756] [-0.3613191 0.46997627 -0.21768897 -0.16735556 -0.88916644] [-1.43076176 -1.72838454 -0.89921453 -0.40784071 0.81094796]] ) .. GENERATED FROM PYTHON SOURCE LINES 238-239 Every piece behaves the same way. .. GENERATED FROM PYTHON SOURCE LINES 239-244 .. code-block:: Python for coor, m, _vars in enumerate_pipeline_models(model): print(coor) print(m._debug) .. rst-class:: sphx-glr-script-out .. code-block:: none (0,) BaseEstimatorDebugInformation(Pipeline) predict( shape=(4, 5) type= [[-0.11278923 0.49633964 0.89431531 -0.58705583 -0.89982345] [ 0.30571764 0.50186418 -0.57305771 0.77656605 0.60803094] [-0.70304868 0.32967023 -0.34664322 -0.13516745 -0.76585543] [-1.3830715 -2.00956839 -0.79207642 -0.25670281 0.36958478]] ) -> ( shape=(4,) type= [ 0.57886408 0.02901042 -1.80801004 -0.13899759] ) (0, 0) BaseEstimatorDebugInformation(StandardScaler) transform( shape=(4, 5) type= [[-0.11278923 0.49633964 0.89431531 -0.58705583 -0.89982345] [ 0.30571764 0.50186418 -0.57305771 0.77656605 0.60803094] [-0.70304868 0.32967023 -0.34664322 -0.13516745 -0.76585543] [-1.3830715 -2.00956839 -0.79207642 -0.25670281 0.36958478]] ) -> ( shape=(4, 5) type= [[ 0.56695655 0.62660822 1.68101311 -1.06151875 -1.08975908] [ 1.22512431 0.63180005 -0.56410962 1.63671501 1.16797756] [-0.3613191 0.46997627 -0.21768897 -0.16735556 -0.88916644] [-1.43076176 -1.72838454 -0.89921453 -0.40784071 0.81094796]] ) (0, 1) BaseEstimatorDebugInformation(FeatureUnion) transform( shape=(4, 5) type= [[ 0.56695655 0.62660822 1.68101311 -1.06151875 -1.08975908] [ 1.22512431 0.63180005 -0.56410962 1.63671501 1.16797756] [-0.3613191 0.46997627 -0.21768897 -0.16735556 -0.88916644] [-1.43076176 -1.72838454 -0.89921453 -0.40784071 0.81094796]] ) -> ( shape=(4, 10) type= [[ 0.56695655 0.62660822 1.68101311 -1.06151875 -1.08975908 0.75218524 0.99780024 1. 0. 0. ] [ 1.22512431 0.63180005 -0.56410962 1.63671501 1.16797756 1. 1. 0.12987416 1. 1. ] [-0.3613191 0.46997627 -0.21768897 -0.16735556 -0.88916644 0.40266888 ... ) (0, 1, 0) BaseEstimatorDebugInformation(StandardScaler) transform( shape=(4, 5) type= [[ 0.56695655 0.62660822 1.68101311 -1.06151875 -1.08975908] [ 1.22512431 0.63180005 -0.56410962 1.63671501 1.16797756] [-0.3613191 0.46997627 -0.21768897 -0.16735556 -0.88916644] [-1.43076176 -1.72838454 -0.89921453 -0.40784071 0.81094796]] ) -> ( shape=(4, 5) type= [[ 0.56695655 0.62660822 1.68101311 -1.06151875 -1.08975908] [ 1.22512431 0.63180005 -0.56410962 1.63671501 1.16797756] [-0.3613191 0.46997627 -0.21768897 -0.16735556 -0.88916644] [-1.43076176 -1.72838454 -0.89921453 -0.40784071 0.81094796]] ) (0, 1, 1) BaseEstimatorDebugInformation(MinMaxScaler) transform( shape=(4, 5) type= [[ 0.56695655 0.62660822 1.68101311 -1.06151875 -1.08975908] [ 1.22512431 0.63180005 -0.56410962 1.63671501 1.16797756] [-0.3613191 0.46997627 -0.21768897 -0.16735556 -0.88916644] [-1.43076176 -1.72838454 -0.89921453 -0.40784071 0.81094796]] ) -> ( shape=(4, 5) type= [[0.75218524 0.99780024 1. 0. 0. ] [1. 1. 0.12987416 1. 1. ] [0.40266888 0.93143596 0.26413389 0.33138833 0.08884678] [0. 0. 0. 0.24226146 0.84186393]] ) (0, 2) BaseEstimatorDebugInformation(LinearRegression) predict( shape=(4, 10) type= [[ 0.56695655 0.62660822 1.68101311 -1.06151875 -1.08975908 0.75218524 0.99780024 1. 0. 0. ] [ 1.22512431 0.63180005 -0.56410962 1.63671501 1.16797756 1. 1. 0.12987416 1. 1. ] [-0.3613191 0.46997627 -0.21768897 -0.16735556 -0.88916644 0.40266888 ... ) -> ( shape=(4,) type= [ 0.57886408 0.02901042 -1.80801004 -0.13899759] ) .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.170 seconds) .. _sphx_glr_download_auto_examples_plot_visualize_pipeline.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_visualize_pipeline.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_visualize_pipeline.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_visualize_pipeline.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_