Note
Go to the end to download the full example code.
Quantile Regression¶
scikit-learn does not have a quantile regression. mlinsights implements a version of it.
Simple example¶
We first generate some dummy data.
import numpy
import matplotlib.pyplot as plt
from pandas import DataFrame
from sklearn.linear_model import LinearRegression
from mlinsights.mlmodel import QuantileLinearRegression
X = numpy.random.random(1000)
eps1 = (numpy.random.random(900) - 0.5) * 0.1
eps2 = (numpy.random.random(100)) * 10
eps = numpy.hstack([eps1, eps2])
X = X.reshape((1000, 1))
Y = X.ravel() * 3.4 + 5.6 + eps
clr = LinearRegression()
clr.fit(X, Y)
fig, ax = plt.subplots(1, 1, figsize=(10, 4))
choice = numpy.random.choice(X.shape[0] - 1, size=100)
xx = X.ravel()[choice]
yy = Y[choice]
ax.plot(xx, yy, ".", label="data")
xx = numpy.array([[0], [1]])
y1 = clr.predict(xx)
y2 = clq.predict(xx)
ax.plot(xx, y1, "--", label="L2")
ax.plot(xx, y2, "--", label="L1")
ax.set_title("Quantile (L1) vs Square (L2)")
ax.legend()

<matplotlib.legend.Legend object at 0x7dd1aac05b80>
The L1 is clearly less sensible to extremas. The optimization algorithm is based on Iteratively reweighted least squares. It estimates a linear regression with error L2 then reweights each oberservation with the inverse of the error L1.
clq = QuantileLinearRegression(verbose=True, max_iter=20)
clq.fit(X, Y)
[QuantileLinearRegression.fit] iter=1 error=853.9008145418201
[QuantileLinearRegression.fit] iter=2 error=546.3665130625367
[QuantileLinearRegression.fit] iter=3 error=496.098333075342
[QuantileLinearRegression.fit] iter=4 error=495.7617127189459
[QuantileLinearRegression.fit] iter=5 error=495.4216843950596
[QuantileLinearRegression.fit] iter=6 error=495.20197898886244
[QuantileLinearRegression.fit] iter=7 error=495.0334627319531
[QuantileLinearRegression.fit] iter=8 error=494.92642428728277
[QuantileLinearRegression.fit] iter=9 error=494.82861984158745
[QuantileLinearRegression.fit] iter=10 error=494.76065181513496
[QuantileLinearRegression.fit] iter=11 error=494.71516121613246
[QuantileLinearRegression.fit] iter=12 error=494.6744424392276
[QuantileLinearRegression.fit] iter=13 error=494.6463271066102
[QuantileLinearRegression.fit] iter=14 error=494.6180033247077
[QuantileLinearRegression.fit] iter=15 error=494.59973985026437
[QuantileLinearRegression.fit] iter=16 error=494.58365013787017
[QuantileLinearRegression.fit] iter=17 error=494.56462829562474
[QuantileLinearRegression.fit] iter=18 error=494.54836863328563
[QuantileLinearRegression.fit] iter=19 error=494.53616162309544
[QuantileLinearRegression.fit] iter=20 error=494.5273731709936
0.4945273731709936
Regression with various quantiles¶
X = numpy.random.random(1200)
eps1 = (numpy.random.random(900) - 0.5) * 0.5
eps2 = (numpy.random.random(300)) * 2
eps = numpy.hstack([eps1, eps2])
X = X.reshape((1200, 1))
Y = X.ravel() * 3.4 + 5.6 + eps + X.ravel() * X.ravel() * 8

Text(0.5, 1.0, 'Almost linear dataset')
fig, ax = plt.subplots(1, 1, figsize=(10, 4))
choice = numpy.random.choice(X.shape[0] - 1, size=100)
xx = X.ravel()[choice]
yy = Y[choice]
ax.plot(xx, yy, ".", label="data")
xx = numpy.array([[0], [1]])
for qu in sorted(clqs):
y = clqs[qu].predict(xx)
ax.plot(xx, y, "--", label=qu)
ax.set_title("Various quantiles")
ax.legend()

<matplotlib.legend.Legend object at 0x7dd1acd379b0>
Total running time of the script: (0 minutes 0.331 seconds)