.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_regression_confidence_interval.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_regression_confidence_interval.py: Regression with confidence interval =================================== The notebook computes confidence intervals with `bootstrapping `_ and `quantile regression `_ on a simple problem. Some data --------- The data follows the formula: :math:`y = \frac{X}{2} + 2 + \epsilon_1 + \eta \epsilon_2`. Noises follows the laws :math:`\epsilon_1 \sim \mathcal{N}(0, 0.2)`, :math:`\epsilon_2 \sim \mathcal{N}(1, 1)`, :math:`\eta \sim \mathcal{B}(2, 0.0.5)`. The second part of the noise adds some bigger noise but not always. .. GENERATED FROM PYTHON SOURCE LINES 21-45 .. code-block:: Python import numpy from numpy.random import binomial, rand, randn import pandas import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import ( RBF, ConstantKernel as C, WhiteKernel, ) from sklearn.linear_model import LinearRegression from sklearn.tree import DecisionTreeRegressor from mlinsights.mlmodel import IntervalRegressor, QuantileLinearRegression N = 200 X = rand(N, 1) * 2 eps = randn(N, 1) * 0.2 eps2 = randn(N, 1) + 1 bin = binomial(2, 0.05, size=(N, 1)) y = (0.5 * X + eps + 2 + eps2 * bin).ravel() .. GENERATED FROM PYTHON SOURCE LINES 47-51 .. code-block:: Python fig, ax = plt.subplots(1, 1, figsize=(4, 4)) ax.plot(X, y, ".") .. image-sg:: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_001.png :alt: plot regression confidence interval :srcset: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none [] .. GENERATED FROM PYTHON SOURCE LINES 53-58 .. code-block:: Python X_train, X_test, y_train, y_test = train_test_split(X, y) .. GENERATED FROM PYTHON SOURCE LINES 59-61 Confidence interval with a linear regression -------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 61-68 .. code-block:: Python # The object fits many times the same learner, every training is done on a # resampling of the training dataset. lin = IntervalRegressor(LinearRegression()) lin.fit(X_train, y_train) .. raw:: html

IntervalRegressor(estimator=LinearRegression())

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

.. GENERATED FROM PYTHON SOURCE LINES 70-77 .. code-block:: Python sorted_X = numpy.array(list(sorted(X_test))) pred = lin.predict(sorted_X) bootstrapped_pred = lin.predict_sorted(sorted_X) min_pred = bootstrapped_pred[:, 0] max_pred = bootstrapped_pred[:, bootstrapped_pred.shape[1] - 1] .. GENERATED FROM PYTHON SOURCE LINES 79-89 .. code-block:: Python fig, ax = plt.subplots(1, 1, figsize=(4, 4)) ax.plot(X_test, y_test, ".", label="raw") ax.plot(sorted_X, pred, label="prediction") ax.plot(sorted_X, min_pred, "--", label="min") ax.plot(sorted_X, max_pred, "--", label="max") ax.legend() .. image-sg:: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_002.png :alt: plot regression confidence interval :srcset: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 90-92 Higher confidence interval -------------------------- .. GENERATED FROM PYTHON SOURCE LINES 92-99 .. code-block:: Python # It is possible to use smaller resample of the training dataset or we can # increase the number of resamplings. lin2 = IntervalRegressor(LinearRegression(), alpha=0.3) lin2.fit(X_train, y_train) .. raw:: html

IntervalRegressor(alpha=0.3, estimator=LinearRegression())

.. GENERATED FROM PYTHON SOURCE LINES 101-105 .. code-block:: Python lin3 = IntervalRegressor(LinearRegression(), n_estimators=50) lin3.fit(X_train, y_train) .. raw:: html

IntervalRegressor(estimator=LinearRegression(), n_estimators=50)

.. GENERATED FROM PYTHON SOURCE LINES 107-113 .. code-block:: Python pred2 = lin2.predict(sorted_X) bootstrapped_pred2 = lin2.predict_sorted(sorted_X) min_pred2 = bootstrapped_pred2[:, 0] max_pred2 = bootstrapped_pred2[:, bootstrapped_pred2.shape[1] - 1] .. GENERATED FROM PYTHON SOURCE LINES 115-121 .. code-block:: Python pred3 = lin3.predict(sorted_X) bootstrapped_pred3 = lin3.predict_sorted(sorted_X) min_pred3 = bootstrapped_pred3[:, 0] max_pred3 = bootstrapped_pred3[:, bootstrapped_pred3.shape[1] - 1] .. GENERATED FROM PYTHON SOURCE LINES 123-146 .. code-block:: Python fig, ax = plt.subplots(1, 3, figsize=(12, 4)) ax[0].plot(X_test, y_test, ".", label="raw") ax[0].plot(sorted_X, pred, label="prediction") ax[0].plot(sorted_X, min_pred, "--", label="min") ax[0].plot(sorted_X, max_pred, "--", label="max") ax[0].legend() ax[0].set_title("alpha=%f" % lin.alpha) ax[1].plot(X_test, y_test, ".", label="raw") ax[1].plot(sorted_X, pred2, label="prediction") ax[1].plot(sorted_X, min_pred2, "--", label="min") ax[1].plot(sorted_X, max_pred2, "--", label="max") ax[1].set_title("alpha=%f" % lin2.alpha) ax[1].legend() ax[2].plot(X_test, y_test, ".", label="raw") ax[2].plot(sorted_X, pred3, label="prediction") ax[2].plot(sorted_X, min_pred3, "--", label="min") ax[2].plot(sorted_X, max_pred3, "--", label="max") ax[2].set_title("n_estimators=%d" % lin3.n_estimators) ax[2].legend() .. image-sg:: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_003.png :alt: alpha=1.000000, alpha=0.300000, n_estimators=50 :srcset: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 147-149 With decision trees ------------------- .. GENERATED FROM PYTHON SOURCE LINES 149-153 .. code-block:: Python tree = IntervalRegressor(DecisionTreeRegressor(min_samples_leaf=10)) tree.fit(X_train, y_train) .. raw:: html

IntervalRegressor(estimator=DecisionTreeRegressor(min_samples_leaf=10))

.. GENERATED FROM PYTHON SOURCE LINES 155-161 .. code-block:: Python pred_tree = tree.predict(sorted_X) b_pred_tree = tree.predict_sorted(sorted_X) min_pred_tree = b_pred_tree[:, 0] max_pred_tree = b_pred_tree[:, b_pred_tree.shape[1] - 1] .. GENERATED FROM PYTHON SOURCE LINES 163-174 .. code-block:: Python fig, ax = plt.subplots(1, 1, figsize=(4, 4)) ax.plot(X_test, y_test, ".", label="raw") ax.plot(sorted_X, pred_tree, label="prediction") ax.plot(sorted_X, min_pred_tree, "--", label="min") ax.plot(sorted_X, max_pred_tree, "--", label="max") ax.set_title("Interval with trees") ax.legend() .. image-sg:: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_004.png :alt: Interval with trees :srcset: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 175-182 In that case, the prediction is very similar to the one a random forest would produce as it is an average of the predictions made by 10 trees. Regression quantile ------------------- The last way tries to fit two regressions for quantiles 0.05 and 0.95. .. GENERATED FROM PYTHON SOURCE LINES 182-190 .. code-block:: Python m = QuantileLinearRegression() q1 = QuantileLinearRegression(quantile=0.05) q2 = QuantileLinearRegression(quantile=0.95) for model in [m, q1, q2]: model.fit(X_train, y_train) .. GENERATED FROM PYTHON SOURCE LINES 192-196 .. code-block:: Python fig, ax = plt.subplots(1, 1, figsize=(4, 4)) ax.plot(X_test, y_test, ".", label="raw") .. image-sg:: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_005.png :alt: plot regression confidence interval :srcset: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_005.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none [] .. GENERATED FROM PYTHON SOURCE LINES 198-207 .. code-block:: Python for label, model in [("med", m), ("q0.05", q1), ("q0.95", q2)]: p = model.predict(sorted_X) ax.plot(sorted_X, p, label=label) ax.set_title("Quantile Regression") ax.legend() .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 208-216 With a non linear model… but the model *QuantileMLPRegressor* only implements the regression with quantile 0.5. With seaborn ------------ It uses a theoritical way to compute the confidence interval by computing the confidence interval on the parameters first. .. GENERATED FROM PYTHON SOURCE LINES 216-223 .. code-block:: Python df_train = pandas.DataFrame(dict(X=X_train.ravel(), y=y_train)) g = sns.jointplot(x="X", y="y", data=df_train, kind="reg", color="m", height=7) g.ax_joint.plot(X_test, y_test, "ro") .. image-sg:: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_006.png :alt: plot regression confidence interval :srcset: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_006.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none [] .. GENERATED FROM PYTHON SOURCE LINES 224-232 GaussianProcessRegressor ------------------------ Last option with this example `Gaussian Processes regression: basic introductory example `_ which computes the standard deviation for every prediction. It can then be used to show an interval confidence. .. GENERATED FROM PYTHON SOURCE LINES 232-237 .. code-block:: Python kernel = C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2)) + WhiteKernel() gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9) gp.fit(X_train, y_train) .. raw:: html

GaussianProcessRegressor(kernel=1**2 * RBF(length_scale=10) + WhiteKernel(noise_level=1),
                             n_restarts_optimizer=9)

.. GENERATED FROM PYTHON SOURCE LINES 239-242 .. code-block:: Python y_pred, sigma = gp.predict(sorted_X, return_std=True) .. GENERATED FROM PYTHON SOURCE LINES 244-253 .. code-block:: Python fig, ax = plt.subplots(1, 1, figsize=(12, 4)) ax.plot(X_test, y_test, ".", label="raw") ax.plot(sorted_X, y_pred, label="prediction") ax.plot(sorted_X, y_pred + sigma * 1.96, "b--", label="q0.95") ax.plot(sorted_X, y_pred - sigma * 1.96, "b--", label="q0.95") ax.set_title("Confidence intervalle with GaussianProcessRegressor") ax.legend() .. image-sg:: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_007.png :alt: Confidence intervalle with GaussianProcessRegressor :srcset: /auto_examples/images/sphx_glr_plot_regression_confidence_interval_007.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 6.623 seconds) .. _sphx_glr_download_auto_examples_plot_regression_confidence_interval.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_regression_confidence_interval.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_regression_confidence_interval.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_regression_confidence_interval.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_