Régression polynômiale et pipeline

Le notebook compare plusieurs de modèles de régression polynômiale.

[2]:
%matplotlib inline
[3]:
from teachpyx.datasets import load_wines_dataset

data = load_wines_dataset()
X = data.drop(["quality", "color"], axis=1)
y = data["quality"]
[4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

On normalise les données. Pour ce cas particulier, c’est d’autant plus important que les polynômes prendront de très grandes valeurs si cela n’est pas fait et les librairies de calculs n’aiment pas les ordres de grandeurs trop différents.

[5]:
from sklearn.preprocessing import Normalizer

norm = Normalizer()
X_train_norm = norm.fit_transform(X_train)
X_test_norm = norm.transform(X_test)

La transformation PolynomialFeatures créée de nouvelles features en multipliant les variables les unes avec les autres. Pour le degré deux et trois features a, b, c, on obtient les nouvelles features : 1, a, b, c, a^2, ab, ac, b^2, bc, c^2.

[6]:
from time import perf_counter
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.metrics import r2_score

r2ts = []
r2es = []
degs = []
tts = []
models = []

for d in range(1, 5):
    begin = perf_counter()
    pipe = make_pipeline(PolynomialFeatures(degree=d), LinearRegression())
    pipe.fit(X_train_norm, y_train)
    duree = perf_counter() - begin
    r2t = r2_score(y_train, pipe.predict(X_train_norm))
    r2e = r2_score(y_test, pipe.predict(X_test_norm))
    degs.append(d)
    r2ts.append(r2t)
    r2es.append(r2e)
    tts.append(duree)
    models.append(pipe)
    print(d, r2t, r2e, duree)
1 0.1909065078664849 0.16570749381482386 0.02195639999990817
2 0.31686272332465504 0.2634484656108902 0.16658860000006825
3 0.4117084105383497 -1.446755311176299 1.0382120000001578
4 0.5940872457783092 -3926.677572477097 2.8583189999999377
[7]:
import pandas

df = pandas.DataFrame(dict(temps=tts, r2_train=r2ts, r2_test=r2es, degré=degs))
df.set_index("degré")
[7]:
temps r2_train r2_test
degré
1 0.021956 0.190907 0.165707
2 0.166589 0.316863 0.263448
3 1.038212 0.411708 -1.446755
4 2.858319 0.594087 -3926.677572

Le polynômes de degré 2 paraît le meilleur modèle. Le temps de calcul est multiplié par 10 à chaque fois, ce qui correspond au nombre de features. On voit néanmoins que l’ajout de features croisée fonctionne sur ce jeu de données. Mais au delà de 3, la régression produit des résultats très mauvais sur la base de test alors qu’ils continuent d’augmenter sur la base d’apprentissage. Voyons cela un peu plus en détail.

[7]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 2, figsize=(12, 4))

n = 15
ax[0].plot(y_train[:n].reset_index(), "o")
ax[1].plot(y_test[:n].reset_index(), "o")
ax[0].set_title("Prédictions sur quelques valeurs\napprentissage")
ax[1].set_title("Prédictions sur quelques valeurs\ntest")
for x in ax:
    x.set_ylim([3, 9])
    x.get_xaxis().set_visible(False)

for model in models:
    d = model.get_params()["polynomialfeatures__degree"]
    tr = model.predict(X_train_norm[:n])
    te = model.predict(X_test_norm[:n])
    ax[0].plot(tr, label="d=%d" % d)
    ax[1].plot(te, label="d=%d" % d)
ax[0].legend()
ax[1].legend();
../../_images/practice_ml_winesr_reg_poly_10_0.png

Le modèle de degré 4 a l’air performant sur la base d’apprentissage mais s’égare complètement sur la base de test comme s’il était surpris des valeurs rencontrées sur la base de test. On dit que le modèle fait du sur-apprentissage ou overfitting en anglais. Le polynôme de degré fonctionne mieux que la régression linéaire simple. On peut se demander quelles sont les variables croisées qui ont un impact sur la performance. On utilise le modèle statsmodels.

[8]:
poly = PolynomialFeatures(degree=2)
poly_feat_train = poly.fit_transform(X_train_norm)
poly_feat_test = poly.fit_transform(X_test_norm)
[9]:
from statsmodels.regression.linear_model import OLS

model = OLS(y_train, poly_feat_train)
results = model.fit()
results.summary2()
[9]:
Model: OLS Adj. R-squared: 0.306
Dependent Variable: quality AIC: 10768.5223
Date: 2024-01-23 00:08 BIC: 11268.3493
No. Observations: 4872 Log-Likelihood: -5307.3
Df Model: 76 F-statistic: 29.30
Df Residuals: 4795 Prob (F-statistic): 0.00
R-squared: 0.317 Scale: 0.52557
Coef. Std.Err. t P>|t| [0.025 0.975]
const 874.2126 1866.6100 0.4683 0.6396 -2785.1996 4533.6248
x1 17.2438 25.1175 0.6865 0.4924 -31.9980 66.4856
x2 -735.9147 164.2593 -4.4802 0.0000 -1057.9383 -413.8911
x3 -375.2205 200.9788 -1.8670 0.0620 -769.2311 18.7900
x4 2.1457 13.7859 0.1556 0.8763 -24.8809 29.1723
x5 -1219.9140 760.0849 -1.6050 0.1086 -2710.0291 270.2011
x6 33.0684 8.6300 3.8318 0.0001 16.1496 49.9873
x7 45.6122 23.6785 1.9263 0.0541 -0.8085 92.0328
x8 -1621.7821 721.4602 -2.2479 0.0246 -3036.1752 -207.3890
x9 -123.6719 196.5043 -0.6294 0.5291 -508.9104 261.5667
x10 -213.6188 172.6441 -1.2373 0.2160 -552.0806 124.8429
x11 274.6811 25.3731 10.8257 0.0000 224.9381 324.4241
x12 -888.0924 1860.0506 -0.4775 0.6331 -4534.6449 2758.4602
x13 213.0448 149.3410 1.4266 0.1538 -79.7320 505.8216
x14 -169.2454 191.2389 -0.8850 0.3762 -544.1614 205.6706
x15 -2.3959 21.0911 -0.1136 0.9096 -43.7441 38.9523
x16 151.4367 661.2643 0.2290 0.8189 -1144.9447 1447.8180
x17 -13.3943 9.8122 -1.3651 0.1723 -32.6306 5.8421
x18 -12.3144 22.1599 -0.5557 0.5784 -55.7580 31.1291
x19 -228.1023 1055.2972 -0.2161 0.8289 -2296.9691 1840.7644
x20 263.5729 260.8085 1.0106 0.3123 -247.7314 774.8773
x21 210.1261 147.0152 1.4293 0.1530 -78.0912 498.3434
x22 -102.4573 26.1357 -3.9202 0.0001 -153.6952 -51.2193
x23 -1256.2263 1979.4050 -0.6346 0.5257 -5136.7684 2624.3158
x24 2503.6940 1629.0043 1.5369 0.1244 -689.9019 5697.2899
x25 -304.5840 139.4970 -2.1834 0.0291 -578.0621 -31.1060
x26 6503.3193 4276.6479 1.5207 0.1284 -1880.8729 14887.5116
x27 176.1465 65.4348 2.6919 0.0071 47.8643 304.4288
x28 541.9165 145.6814 3.7199 0.0002 256.3141 827.5190
x29 -5408.0462 5170.0682 -1.0460 0.2956 -15543.7521 4727.6598
x30 1591.6613 1576.5785 1.0096 0.3128 -1499.1559 4682.4786
x31 -3066.2318 1347.2069 -2.2760 0.0229 -5707.3755 -425.0882
x32 611.6934 183.2814 3.3375 0.0009 252.3778 971.0089
x33 861.9664 2070.4337 0.4163 0.6772 -3197.0336 4920.9664
x34 -307.3877 171.8498 -1.7887 0.0737 -644.2921 29.5168
x35 -8483.4913 6547.3948 -1.2957 0.1951 -21319.3893 4352.4067
x36 150.5489 83.1030 1.8116 0.0701 -12.3711 313.4689
x37 300.7497 178.4947 1.6849 0.0921 -49.1819 650.6813
x38 14067.7740 7800.2113 1.8035 0.0714 -1224.2191 29359.7672
x39 -5133.5558 2077.6861 -2.4708 0.0135 -9206.7738 -1060.3378
x40 -2372.2746 1576.8448 -1.5044 0.1325 -5463.6139 719.0647
x41 708.8006 236.3385 2.9991 0.0027 245.4687 1172.1325
x42 -910.1293 1867.0943 -0.4875 0.6260 -4570.4908 2750.2323
x43 1971.4865 757.4887 2.6027 0.0093 486.4611 3456.5118
x44 -7.6328 5.0273 -1.5183 0.1290 -17.4886 2.2230
x45 2.8665 12.6000 0.2275 0.8200 -21.8354 27.5684
x46 1429.2194 705.7754 2.0250 0.0429 45.5757 2812.8631
x47 -287.7160 203.0709 -1.4168 0.1566 -685.8281 110.3961
x48 -189.7045 168.1916 -1.1279 0.2594 -519.4371 140.0282
x49 -18.0129 19.5540 -0.9212 0.3570 -56.3477 20.3219
x50 10142.2002 7074.9790 1.4335 0.1518 -3728.0049 24012.4054
x51 -201.9536 331.1123 -0.6099 0.5419 -851.0857 447.1785
x52 1197.2260 682.4747 1.7542 0.0795 -140.7375 2535.1896
x53 20265.9335 23064.5613 0.8787 0.3796 -24951.1898 65483.0568
x54 -12226.8108 6517.5063 -1.8760 0.0607 -25004.1137 550.4920
x55 -3613.4768 4978.7999 -0.7258 0.4680 -13374.2092 6147.2555
x56 2196.0436 843.4185 2.6037 0.0092 542.5563 3849.5310
x57 -909.6884 1867.1743 -0.4872 0.6261 -4570.2067 2750.8299
x58 -24.9437 7.6926 -3.2426 0.0012 -40.0247 -9.8627
x59 549.2329 298.2374 1.8416 0.0656 -35.4492 1133.9150
x60 -13.8185 79.9507 -0.1728 0.8628 -170.5585 142.9216
x61 59.9612 69.6414 0.8610 0.3893 -76.5679 196.4903
x62 -60.9958 9.6344 -6.3310 0.0000 -79.8838 -42.1079
x63 -915.2771 1867.4927 -0.4901 0.6241 -4576.4197 2745.8655
x64 856.3795 643.6237 1.3306 0.1834 -405.4183 2118.1773
x65 172.8750 176.1471 0.9814 0.3264 -172.4541 518.2041
x66 233.8040 154.5694 1.5126 0.1304 -69.2230 536.8310
x67 -208.7613 22.8714 -9.1276 0.0000 -253.5998 -163.9229
x68 2377.5655 18554.7387 0.1281 0.8980 -33998.2360 38753.3671
x69 11008.3563 10291.6307 1.0696 0.2848 -9167.9622 31184.6748
x70 -6367.0944 6065.2219 -1.0498 0.2939 -18257.7123 5523.5234
x71 -2160.1700 944.6336 -2.2868 0.0223 -4012.0853 -308.2546
x72 -4031.2392 1137.0986 -3.5452 0.0004 -6260.4741 -1802.0043
x73 1604.8538 1639.6859 0.9788 0.3277 -1609.6830 4819.3905
x74 766.9861 249.3365 3.0761 0.0021 278.1721 1255.8001
x75 -2563.1016 2045.6050 -1.2530 0.2103 -6573.4261 1447.2228
x76 596.8511 210.7614 2.8319 0.0046 183.6620 1010.0402
x77 -1033.7653 1865.5885 -0.5541 0.5795 -4691.1748 2623.6442
Omnibus: 74.655 Durbin-Watson: 2.012
Prob(Omnibus): 0.000 Jarque-Bera (JB): 119.025
Skew: 0.141 Prob(JB): 0.000
Kurtosis: 3.712 Condition No.: 4390541685726112

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 7.09e-28. This might indicate that there are strong multicollinearity problems or that the design matrix is singular.

Ce n’est pas très lisible. Il faut ajouter le nom de chaque variable et recommencer.

[12]:
names = poly.get_feature_names_out(input_features=data.columns[:-2])
names = [n.replace(" ", " * ") for n in names]
pft = pandas.DataFrame(poly_feat_train, columns=names)
pft.head()
[12]:
1 fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulfur_dioxide total_sulfur_dioxide density pH ... density^2 density * pH density * sulphates density * alcohol pH^2 pH * sulphates pH * alcohol sulphates^2 sulphates * alcohol alcohol^2
0 1.0 0.061316 0.003511 0.003461 0.019779 0.000455 0.306582 0.939526 0.009773 0.030263 ... 0.000096 0.000296 0.000044 0.001315 0.000916 0.000138 0.004070 0.000021 0.000612 0.018090
1 1.0 0.035236 0.001373 0.001922 0.065438 0.000206 0.205924 0.974706 0.004572 0.014552 ... 0.000021 0.000067 0.000013 0.000192 0.000212 0.000042 0.000613 0.000008 0.000121 0.001772
2 1.0 0.042579 0.001319 0.001919 0.101349 0.000336 0.293852 0.947522 0.005996 0.020210 ... 0.000036 0.000121 0.000014 0.000345 0.000408 0.000046 0.001163 0.000005 0.000131 0.003314
3 1.0 0.053638 0.000920 0.001456 0.037547 0.000421 0.206890 0.973147 0.007627 0.025210 ... 0.000058 0.000192 0.000024 0.000549 0.000636 0.000079 0.001816 0.000010 0.000226 0.005188
4 1.0 0.071498 0.002413 0.002949 0.010725 0.000447 0.366428 0.920540 0.008848 0.026812 ... 0.000078 0.000237 0.000036 0.000981 0.000719 0.000108 0.002971 0.000016 0.000446 0.012282

5 rows × 78 columns

[13]:
results.summary2(xname=pft.columns)
[13]:
Model: OLS Adj. R-squared: 0.306
Dependent Variable: quality AIC: 10768.5223
Date: 2024-01-23 00:09 BIC: 11268.3493
No. Observations: 4872 Log-Likelihood: -5307.3
Df Model: 76 F-statistic: 29.30
Df Residuals: 4795 Prob (F-statistic): 0.00
R-squared: 0.317 Scale: 0.52557
Coef. Std.Err. t P>|t| [0.025 0.975]
1 874.2126 1866.6100 0.4683 0.6396 -2785.1996 4533.6248
fixed_acidity 17.2438 25.1175 0.6865 0.4924 -31.9980 66.4856
volatile_acidity -735.9147 164.2593 -4.4802 0.0000 -1057.9383 -413.8911
citric_acid -375.2205 200.9788 -1.8670 0.0620 -769.2311 18.7900
residual_sugar 2.1457 13.7859 0.1556 0.8763 -24.8809 29.1723
chlorides -1219.9140 760.0849 -1.6050 0.1086 -2710.0291 270.2011
free_sulfur_dioxide 33.0684 8.6300 3.8318 0.0001 16.1496 49.9873
total_sulfur_dioxide 45.6122 23.6785 1.9263 0.0541 -0.8085 92.0328
density -1621.7821 721.4602 -2.2479 0.0246 -3036.1752 -207.3890
pH -123.6719 196.5043 -0.6294 0.5291 -508.9104 261.5667
sulphates -213.6188 172.6441 -1.2373 0.2160 -552.0806 124.8429
alcohol 274.6811 25.3731 10.8257 0.0000 224.9381 324.4241
fixed_acidity^2 -888.0924 1860.0506 -0.4775 0.6331 -4534.6449 2758.4602
fixed_acidity * volatile_acidity 213.0448 149.3410 1.4266 0.1538 -79.7320 505.8216
fixed_acidity * citric_acid -169.2454 191.2389 -0.8850 0.3762 -544.1614 205.6706
fixed_acidity * residual_sugar -2.3959 21.0911 -0.1136 0.9096 -43.7441 38.9523
fixed_acidity * chlorides 151.4367 661.2643 0.2290 0.8189 -1144.9447 1447.8180
fixed_acidity * free_sulfur_dioxide -13.3943 9.8122 -1.3651 0.1723 -32.6306 5.8421
fixed_acidity * total_sulfur_dioxide -12.3144 22.1599 -0.5557 0.5784 -55.7580 31.1291
fixed_acidity * density -228.1023 1055.2972 -0.2161 0.8289 -2296.9691 1840.7644
fixed_acidity * pH 263.5729 260.8085 1.0106 0.3123 -247.7314 774.8773
fixed_acidity * sulphates 210.1261 147.0152 1.4293 0.1530 -78.0912 498.3434
fixed_acidity * alcohol -102.4573 26.1357 -3.9202 0.0001 -153.6952 -51.2193
volatile_acidity^2 -1256.2263 1979.4050 -0.6346 0.5257 -5136.7684 2624.3158
volatile_acidity * citric_acid 2503.6940 1629.0043 1.5369 0.1244 -689.9019 5697.2899
volatile_acidity * residual_sugar -304.5840 139.4970 -2.1834 0.0291 -578.0621 -31.1060
volatile_acidity * chlorides 6503.3193 4276.6479 1.5207 0.1284 -1880.8729 14887.5116
volatile_acidity * free_sulfur_dioxide 176.1465 65.4348 2.6919 0.0071 47.8643 304.4288
volatile_acidity * total_sulfur_dioxide 541.9165 145.6814 3.7199 0.0002 256.3141 827.5190
volatile_acidity * density -5408.0462 5170.0682 -1.0460 0.2956 -15543.7521 4727.6598
volatile_acidity * pH 1591.6613 1576.5785 1.0096 0.3128 -1499.1559 4682.4786
volatile_acidity * sulphates -3066.2318 1347.2069 -2.2760 0.0229 -5707.3755 -425.0882
volatile_acidity * alcohol 611.6934 183.2814 3.3375 0.0009 252.3778 971.0089
citric_acid^2 861.9664 2070.4337 0.4163 0.6772 -3197.0336 4920.9664
citric_acid * residual_sugar -307.3877 171.8498 -1.7887 0.0737 -644.2921 29.5168
citric_acid * chlorides -8483.4913 6547.3948 -1.2957 0.1951 -21319.3893 4352.4067
citric_acid * free_sulfur_dioxide 150.5489 83.1030 1.8116 0.0701 -12.3711 313.4689
citric_acid * total_sulfur_dioxide 300.7497 178.4947 1.6849 0.0921 -49.1819 650.6813
citric_acid * density 14067.7740 7800.2113 1.8035 0.0714 -1224.2191 29359.7672
citric_acid * pH -5133.5558 2077.6861 -2.4708 0.0135 -9206.7738 -1060.3378
citric_acid * sulphates -2372.2746 1576.8448 -1.5044 0.1325 -5463.6139 719.0647
citric_acid * alcohol 708.8006 236.3385 2.9991 0.0027 245.4687 1172.1325
residual_sugar^2 -910.1293 1867.0943 -0.4875 0.6260 -4570.4908 2750.2323
residual_sugar * chlorides 1971.4865 757.4887 2.6027 0.0093 486.4611 3456.5118
residual_sugar * free_sulfur_dioxide -7.6328 5.0273 -1.5183 0.1290 -17.4886 2.2230
residual_sugar * total_sulfur_dioxide 2.8665 12.6000 0.2275 0.8200 -21.8354 27.5684
residual_sugar * density 1429.2194 705.7754 2.0250 0.0429 45.5757 2812.8631
residual_sugar * pH -287.7160 203.0709 -1.4168 0.1566 -685.8281 110.3961
residual_sugar * sulphates -189.7045 168.1916 -1.1279 0.2594 -519.4371 140.0282
residual_sugar * alcohol -18.0129 19.5540 -0.9212 0.3570 -56.3477 20.3219
chlorides^2 10142.2002 7074.9790 1.4335 0.1518 -3728.0049 24012.4054
chlorides * free_sulfur_dioxide -201.9536 331.1123 -0.6099 0.5419 -851.0857 447.1785
chlorides * total_sulfur_dioxide 1197.2260 682.4747 1.7542 0.0795 -140.7375 2535.1896
chlorides * density 20265.9335 23064.5613 0.8787 0.3796 -24951.1898 65483.0568
chlorides * pH -12226.8108 6517.5063 -1.8760 0.0607 -25004.1137 550.4920
chlorides * sulphates -3613.4768 4978.7999 -0.7258 0.4680 -13374.2092 6147.2555
chlorides * alcohol 2196.0436 843.4185 2.6037 0.0092 542.5563 3849.5310
free_sulfur_dioxide^2 -909.6884 1867.1743 -0.4872 0.6261 -4570.2067 2750.8299
free_sulfur_dioxide * total_sulfur_dioxide -24.9437 7.6926 -3.2426 0.0012 -40.0247 -9.8627
free_sulfur_dioxide * density 549.2329 298.2374 1.8416 0.0656 -35.4492 1133.9150
free_sulfur_dioxide * pH -13.8185 79.9507 -0.1728 0.8628 -170.5585 142.9216
free_sulfur_dioxide * sulphates 59.9612 69.6414 0.8610 0.3893 -76.5679 196.4903
free_sulfur_dioxide * alcohol -60.9958 9.6344 -6.3310 0.0000 -79.8838 -42.1079
total_sulfur_dioxide^2 -915.2771 1867.4927 -0.4901 0.6241 -4576.4197 2745.8655
total_sulfur_dioxide * density 856.3795 643.6237 1.3306 0.1834 -405.4183 2118.1773
total_sulfur_dioxide * pH 172.8750 176.1471 0.9814 0.3264 -172.4541 518.2041
total_sulfur_dioxide * sulphates 233.8040 154.5694 1.5126 0.1304 -69.2230 536.8310
total_sulfur_dioxide * alcohol -208.7613 22.8714 -9.1276 0.0000 -253.5998 -163.9229
density^2 2377.5655 18554.7387 0.1281 0.8980 -33998.2360 38753.3671
density * pH 11008.3563 10291.6307 1.0696 0.2848 -9167.9622 31184.6748
density * sulphates -6367.0944 6065.2219 -1.0498 0.2939 -18257.7123 5523.5234
density * alcohol -2160.1700 944.6336 -2.2868 0.0223 -4012.0853 -308.2546
pH^2 -4031.2392 1137.0986 -3.5452 0.0004 -6260.4741 -1802.0043
pH * sulphates 1604.8538 1639.6859 0.9788 0.3277 -1609.6830 4819.3905
pH * alcohol 766.9861 249.3365 3.0761 0.0021 278.1721 1255.8001
sulphates^2 -2563.1016 2045.6050 -1.2530 0.2103 -6573.4261 1447.2228
sulphates * alcohol 596.8511 210.7614 2.8319 0.0046 183.6620 1010.0402
alcohol^2 -1033.7653 1865.5885 -0.5541 0.5795 -4691.1748 2623.6442
Omnibus: 74.655 Durbin-Watson: 2.012
Prob(Omnibus): 0.000 Jarque-Bera (JB): 119.025
Skew: 0.141 Prob(JB): 0.000
Kurtosis: 3.712 Condition No.: 4390541685726112

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 7.09e-28. This might indicate that there are strong multicollinearity problems or that the design matrix is singular.

On ne garde que celles dont la p-value est inférieur à 0.05.

[14]:
pval = results.pvalues.copy()
pval[pval <= 0.05]
[14]:
x2     7.630773e-06
x6     1.288486e-04
x8     2.462679e-02
x11    5.327338e-27
x22    8.970958e-05
x25    2.905133e-02
x27    7.128466e-03
x28    2.016020e-04
x31    2.289041e-02
x32    8.519337e-04
x39    1.351547e-02
x41    2.721775e-03
x43    9.278805e-03
x46    4.291916e-02
x56    9.249665e-03
x58    1.192719e-03
x62    2.657965e-10
x67    1.010018e-19
x71    2.225201e-02
x72    3.960632e-04
x74    2.109039e-03
x76    4.646809e-03
dtype: float64
[15]:
pval.index = pft.columns
pval[pval <= 0.05]
[15]:
volatile_acidity                              7.630773e-06
free_sulfur_dioxide                           1.288486e-04
density                                       2.462679e-02
alcohol                                       5.327338e-27
fixed_acidity * alcohol                       8.970958e-05
volatile_acidity * residual_sugar             2.905133e-02
volatile_acidity * free_sulfur_dioxide        7.128466e-03
volatile_acidity * total_sulfur_dioxide       2.016020e-04
volatile_acidity * sulphates                  2.289041e-02
volatile_acidity * alcohol                    8.519337e-04
citric_acid * pH                              1.351547e-02
citric_acid * alcohol                         2.721775e-03
residual_sugar * chlorides                    9.278805e-03
residual_sugar * density                      4.291916e-02
chlorides * alcohol                           9.249665e-03
free_sulfur_dioxide * total_sulfur_dioxide    1.192719e-03
free_sulfur_dioxide * alcohol                 2.657965e-10
total_sulfur_dioxide * alcohol                1.010018e-19
density * alcohol                             2.225201e-02
pH^2                                          3.960632e-04
pH * alcohol                                  2.109039e-03
sulphates * alcohol                           4.646809e-03
dtype: float64

Le modèle fonctionne mieux mais il est plus compliqué de savoir si la contribution de l’alcool est corrélée positivement avec la qualité car l’alcool apparaît dans plus d’une variable.


Notebook on github