{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Régression polynômiale et pipeline\n", "\n", "Le notebook compare plusieurs de modèles de régression polynômiale." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from teachpyx.datasets import load_wines_dataset\n", "\n", "data = load_wines_dataset()\n", "X = data.drop([\"quality\", \"color\"], axis=1)\n", "y = data[\"quality\"]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On normalise les données. Pour ce cas particulier, c'est d'autant plus important que les polynômes prendront de très grandes valeurs si cela n'est pas fait et les librairies de calculs n'aiment pas les ordres de grandeurs trop différents." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import Normalizer\n", "\n", "norm = Normalizer()\n", "X_train_norm = norm.fit_transform(X_train)\n", "X_test_norm = norm.transform(X_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La transformation [PolynomialFeatures](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html) créée de nouvelles features en multipliant les variables les unes avec les autres. Pour le degré deux et trois features $a, b, c$, on obtient les nouvelles features : $1, a, b, c, a^2, ab, ac, b^2, bc, c^2$." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 0.1909065078664849 0.16570749381482386 0.02195639999990817\n", "2 0.31686272332465504 0.2634484656108902 0.16658860000006825\n", "3 0.4117084105383497 -1.446755311176299 1.0382120000001578\n", "4 0.5940872457783092 -3926.677572477097 2.8583189999999377\n" ] } ], "source": [ "from time import perf_counter\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.preprocessing import PolynomialFeatures\n", "from sklearn.pipeline import make_pipeline\n", "from sklearn.metrics import r2_score\n", "\n", "r2ts = []\n", "r2es = []\n", "degs = []\n", "tts = []\n", "models = []\n", "\n", "for d in range(1, 5):\n", " begin = perf_counter()\n", " pipe = make_pipeline(PolynomialFeatures(degree=d), LinearRegression())\n", " pipe.fit(X_train_norm, y_train)\n", " duree = perf_counter() - begin\n", " r2t = r2_score(y_train, pipe.predict(X_train_norm))\n", " r2e = r2_score(y_test, pipe.predict(X_test_norm))\n", " degs.append(d)\n", " r2ts.append(r2t)\n", " r2es.append(r2e)\n", " tts.append(duree)\n", " models.append(pipe)\n", " print(d, r2t, r2e, duree)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tempsr2_trainr2_test
degré
10.0219560.1909070.165707
20.1665890.3168630.263448
31.0382120.411708-1.446755
42.8583190.594087-3926.677572
\n", "
" ], "text/plain": [ " temps r2_train r2_test\n", "degré \n", "1 0.021956 0.190907 0.165707\n", "2 0.166589 0.316863 0.263448\n", "3 1.038212 0.411708 -1.446755\n", "4 2.858319 0.594087 -3926.677572" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas\n", "\n", "df = pandas.DataFrame(dict(temps=tts, r2_train=r2ts, r2_test=r2es, degré=degs))\n", "df.set_index(\"degré\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le polynômes de degré 2 paraît le meilleur modèle. Le temps de calcul est multiplié par 10 à chaque fois, ce qui correspond au nombre de features. On voit néanmoins que l'ajout de features croisée fonctionne sur ce jeu de données. Mais au delà de 3, la régression produit des résultats très mauvais sur la base de test alors qu'ils continuent d'augmenter sur la base d'apprentissage. Voyons cela un peu plus en détail." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "fig, ax = plt.subplots(1, 2, figsize=(12, 4))\n", "\n", "n = 15\n", "ax[0].plot(y_train[:n].reset_index(), \"o\")\n", "ax[1].plot(y_test[:n].reset_index(), \"o\")\n", "ax[0].set_title(\"Prédictions sur quelques valeurs\\napprentissage\")\n", "ax[1].set_title(\"Prédictions sur quelques valeurs\\ntest\")\n", "for x in ax:\n", " x.set_ylim([3, 9])\n", " x.get_xaxis().set_visible(False)\n", "\n", "for model in models:\n", " d = model.get_params()[\"polynomialfeatures__degree\"]\n", " tr = model.predict(X_train_norm[:n])\n", " te = model.predict(X_test_norm[:n])\n", " ax[0].plot(tr, label=\"d=%d\" % d)\n", " ax[1].plot(te, label=\"d=%d\" % d)\n", "ax[0].legend()\n", "ax[1].legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le modèle de degré 4 a l'air performant sur la base d'apprentissage mais s'égare complètement sur la base de test comme s'il était surpris des valeurs rencontrées sur la base de test. On dit que le modèle fait du [sur-apprentissage](https://fr.wikipedia.org/wiki/Surapprentissage) ou [overfitting](https://en.wikipedia.org/wiki/Overfitting) en anglais. Le polynôme de degré fonctionne mieux que la régression linéaire simple. On peut se demander quelles sont les variables croisées qui ont un impact sur la performance. On utilise le modèle [statsmodels](http://www.statsmodels.org/stable/index.html)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "poly = PolynomialFeatures(degree=2)\n", "poly_feat_train = poly.fit_transform(X_train_norm)\n", "poly_feat_test = poly.fit_transform(X_test_norm)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Model: OLS Adj. R-squared: 0.306
Dependent Variable: quality AIC: 10768.5223
Date: 2024-01-23 00:08 BIC: 11268.3493
No. Observations: 4872 Log-Likelihood: -5307.3
Df Model: 76 F-statistic: 29.30
Df Residuals: 4795 Prob (F-statistic): 0.00
R-squared: 0.317 Scale: 0.52557
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Coef. Std.Err. t P>|t| [0.025 0.975]
const 874.2126 1866.6100 0.4683 0.6396 -2785.1996 4533.6248
x1 17.2438 25.1175 0.6865 0.4924 -31.9980 66.4856
x2 -735.9147 164.2593 -4.4802 0.0000 -1057.9383 -413.8911
x3 -375.2205 200.9788 -1.8670 0.0620 -769.2311 18.7900
x4 2.1457 13.7859 0.1556 0.8763 -24.8809 29.1723
x5 -1219.9140 760.0849 -1.6050 0.1086 -2710.0291 270.2011
x6 33.0684 8.6300 3.8318 0.0001 16.1496 49.9873
x7 45.6122 23.6785 1.9263 0.0541 -0.8085 92.0328
x8 -1621.7821 721.4602 -2.2479 0.0246 -3036.1752 -207.3890
x9 -123.6719 196.5043 -0.6294 0.5291 -508.9104 261.5667
x10 -213.6188 172.6441 -1.2373 0.2160 -552.0806 124.8429
x11 274.6811 25.3731 10.8257 0.0000 224.9381 324.4241
x12 -888.0924 1860.0506 -0.4775 0.6331 -4534.6449 2758.4602
x13 213.0448 149.3410 1.4266 0.1538 -79.7320 505.8216
x14 -169.2454 191.2389 -0.8850 0.3762 -544.1614 205.6706
x15 -2.3959 21.0911 -0.1136 0.9096 -43.7441 38.9523
x16 151.4367 661.2643 0.2290 0.8189 -1144.9447 1447.8180
x17 -13.3943 9.8122 -1.3651 0.1723 -32.6306 5.8421
x18 -12.3144 22.1599 -0.5557 0.5784 -55.7580 31.1291
x19 -228.1023 1055.2972 -0.2161 0.8289 -2296.9691 1840.7644
x20 263.5729 260.8085 1.0106 0.3123 -247.7314 774.8773
x21 210.1261 147.0152 1.4293 0.1530 -78.0912 498.3434
x22 -102.4573 26.1357 -3.9202 0.0001 -153.6952 -51.2193
x23 -1256.2263 1979.4050 -0.6346 0.5257 -5136.7684 2624.3158
x24 2503.6940 1629.0043 1.5369 0.1244 -689.9019 5697.2899
x25 -304.5840 139.4970 -2.1834 0.0291 -578.0621 -31.1060
x26 6503.3193 4276.6479 1.5207 0.1284 -1880.8729 14887.5116
x27 176.1465 65.4348 2.6919 0.0071 47.8643 304.4288
x28 541.9165 145.6814 3.7199 0.0002 256.3141 827.5190
x29 -5408.0462 5170.0682 -1.0460 0.2956 -15543.7521 4727.6598
x30 1591.6613 1576.5785 1.0096 0.3128 -1499.1559 4682.4786
x31 -3066.2318 1347.2069 -2.2760 0.0229 -5707.3755 -425.0882
x32 611.6934 183.2814 3.3375 0.0009 252.3778 971.0089
x33 861.9664 2070.4337 0.4163 0.6772 -3197.0336 4920.9664
x34 -307.3877 171.8498 -1.7887 0.0737 -644.2921 29.5168
x35 -8483.4913 6547.3948 -1.2957 0.1951 -21319.3893 4352.4067
x36 150.5489 83.1030 1.8116 0.0701 -12.3711 313.4689
x37 300.7497 178.4947 1.6849 0.0921 -49.1819 650.6813
x38 14067.7740 7800.2113 1.8035 0.0714 -1224.2191 29359.7672
x39 -5133.5558 2077.6861 -2.4708 0.0135 -9206.7738 -1060.3378
x40 -2372.2746 1576.8448 -1.5044 0.1325 -5463.6139 719.0647
x41 708.8006 236.3385 2.9991 0.0027 245.4687 1172.1325
x42 -910.1293 1867.0943 -0.4875 0.6260 -4570.4908 2750.2323
x43 1971.4865 757.4887 2.6027 0.0093 486.4611 3456.5118
x44 -7.6328 5.0273 -1.5183 0.1290 -17.4886 2.2230
x45 2.8665 12.6000 0.2275 0.8200 -21.8354 27.5684
x46 1429.2194 705.7754 2.0250 0.0429 45.5757 2812.8631
x47 -287.7160 203.0709 -1.4168 0.1566 -685.8281 110.3961
x48 -189.7045 168.1916 -1.1279 0.2594 -519.4371 140.0282
x49 -18.0129 19.5540 -0.9212 0.3570 -56.3477 20.3219
x50 10142.2002 7074.9790 1.4335 0.1518 -3728.0049 24012.4054
x51 -201.9536 331.1123 -0.6099 0.5419 -851.0857 447.1785
x52 1197.2260 682.4747 1.7542 0.0795 -140.7375 2535.1896
x53 20265.9335 23064.5613 0.8787 0.3796 -24951.1898 65483.0568
x54 -12226.8108 6517.5063 -1.8760 0.0607 -25004.1137 550.4920
x55 -3613.4768 4978.7999 -0.7258 0.4680 -13374.2092 6147.2555
x56 2196.0436 843.4185 2.6037 0.0092 542.5563 3849.5310
x57 -909.6884 1867.1743 -0.4872 0.6261 -4570.2067 2750.8299
x58 -24.9437 7.6926 -3.2426 0.0012 -40.0247 -9.8627
x59 549.2329 298.2374 1.8416 0.0656 -35.4492 1133.9150
x60 -13.8185 79.9507 -0.1728 0.8628 -170.5585 142.9216
x61 59.9612 69.6414 0.8610 0.3893 -76.5679 196.4903
x62 -60.9958 9.6344 -6.3310 0.0000 -79.8838 -42.1079
x63 -915.2771 1867.4927 -0.4901 0.6241 -4576.4197 2745.8655
x64 856.3795 643.6237 1.3306 0.1834 -405.4183 2118.1773
x65 172.8750 176.1471 0.9814 0.3264 -172.4541 518.2041
x66 233.8040 154.5694 1.5126 0.1304 -69.2230 536.8310
x67 -208.7613 22.8714 -9.1276 0.0000 -253.5998 -163.9229
x68 2377.5655 18554.7387 0.1281 0.8980 -33998.2360 38753.3671
x69 11008.3563 10291.6307 1.0696 0.2848 -9167.9622 31184.6748
x70 -6367.0944 6065.2219 -1.0498 0.2939 -18257.7123 5523.5234
x71 -2160.1700 944.6336 -2.2868 0.0223 -4012.0853 -308.2546
x72 -4031.2392 1137.0986 -3.5452 0.0004 -6260.4741 -1802.0043
x73 1604.8538 1639.6859 0.9788 0.3277 -1609.6830 4819.3905
x74 766.9861 249.3365 3.0761 0.0021 278.1721 1255.8001
x75 -2563.1016 2045.6050 -1.2530 0.2103 -6573.4261 1447.2228
x76 596.8511 210.7614 2.8319 0.0046 183.6620 1010.0402
x77 -1033.7653 1865.5885 -0.5541 0.5795 -4691.1748 2623.6442
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 74.655 Durbin-Watson: 2.012
Prob(Omnibus): 0.000 Jarque-Bera (JB): 119.025
Skew: 0.141 Prob(JB): 0.000
Kurtosis: 3.712 Condition No.: 4390541685726112

\n", "Notes:
\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
\n", "[2] The smallest eigenvalue is 7.09e-28. This might indicate that there are strong multicollinearity problems or that the design matrix is singular." ], "text/latex": [ "\\begin{table}\n", "\\caption{Results: Ordinary least squares}\n", "\\label{}\n", "\\begin{center}\n", "\\begin{tabular}{llll}\n", "\\hline\n", "Model: & OLS & Adj. R-squared: & 0.306 \\\\\n", "Dependent Variable: & quality & AIC: & 10768.5223 \\\\\n", "Date: & 2024-01-23 00:08 & BIC: & 11268.3493 \\\\\n", "No. Observations: & 4872 & Log-Likelihood: & -5307.3 \\\\\n", "Df Model: & 76 & F-statistic: & 29.30 \\\\\n", "Df Residuals: & 4795 & Prob (F-statistic): & 0.00 \\\\\n", "R-squared: & 0.317 & Scale: & 0.52557 \\\\\n", "\\hline\n", "\\end{tabular}\n", "\\end{center}\n", "\n", "\\begin{center}\n", "\\begin{tabular}{lrrrrrr}\n", "\\hline\n", " & Coef. & Std.Err. & t & P$> |$t$|$ & [0.025 & 0.975] \\\\\n", "\\hline\n", "const & 874.2126 & 1866.6100 & 0.4683 & 0.6396 & -2785.1996 & 4533.6248 \\\\\n", "x1 & 17.2438 & 25.1175 & 0.6865 & 0.4924 & -31.9980 & 66.4856 \\\\\n", "x2 & -735.9147 & 164.2593 & -4.4802 & 0.0000 & -1057.9383 & -413.8911 \\\\\n", "x3 & -375.2205 & 200.9788 & -1.8670 & 0.0620 & -769.2311 & 18.7900 \\\\\n", "x4 & 2.1457 & 13.7859 & 0.1556 & 0.8763 & -24.8809 & 29.1723 \\\\\n", "x5 & -1219.9140 & 760.0849 & -1.6050 & 0.1086 & -2710.0291 & 270.2011 \\\\\n", "x6 & 33.0684 & 8.6300 & 3.8318 & 0.0001 & 16.1496 & 49.9873 \\\\\n", "x7 & 45.6122 & 23.6785 & 1.9263 & 0.0541 & -0.8085 & 92.0328 \\\\\n", "x8 & -1621.7821 & 721.4602 & -2.2479 & 0.0246 & -3036.1752 & -207.3890 \\\\\n", "x9 & -123.6719 & 196.5043 & -0.6294 & 0.5291 & -508.9104 & 261.5667 \\\\\n", "x10 & -213.6188 & 172.6441 & -1.2373 & 0.2160 & -552.0806 & 124.8429 \\\\\n", "x11 & 274.6811 & 25.3731 & 10.8257 & 0.0000 & 224.9381 & 324.4241 \\\\\n", "x12 & -888.0924 & 1860.0506 & -0.4775 & 0.6331 & -4534.6449 & 2758.4602 \\\\\n", "x13 & 213.0448 & 149.3410 & 1.4266 & 0.1538 & -79.7320 & 505.8216 \\\\\n", "x14 & -169.2454 & 191.2389 & -0.8850 & 0.3762 & -544.1614 & 205.6706 \\\\\n", "x15 & -2.3959 & 21.0911 & -0.1136 & 0.9096 & -43.7441 & 38.9523 \\\\\n", "x16 & 151.4367 & 661.2643 & 0.2290 & 0.8189 & -1144.9447 & 1447.8180 \\\\\n", "x17 & -13.3943 & 9.8122 & -1.3651 & 0.1723 & -32.6306 & 5.8421 \\\\\n", "x18 & -12.3144 & 22.1599 & -0.5557 & 0.5784 & -55.7580 & 31.1291 \\\\\n", "x19 & -228.1023 & 1055.2972 & -0.2161 & 0.8289 & -2296.9691 & 1840.7644 \\\\\n", "x20 & 263.5729 & 260.8085 & 1.0106 & 0.3123 & -247.7314 & 774.8773 \\\\\n", "x21 & 210.1261 & 147.0152 & 1.4293 & 0.1530 & -78.0912 & 498.3434 \\\\\n", "x22 & -102.4573 & 26.1357 & -3.9202 & 0.0001 & -153.6952 & -51.2193 \\\\\n", "x23 & -1256.2263 & 1979.4050 & -0.6346 & 0.5257 & -5136.7684 & 2624.3158 \\\\\n", "x24 & 2503.6940 & 1629.0043 & 1.5369 & 0.1244 & -689.9019 & 5697.2899 \\\\\n", "x25 & -304.5840 & 139.4970 & -2.1834 & 0.0291 & -578.0621 & -31.1060 \\\\\n", "x26 & 6503.3193 & 4276.6479 & 1.5207 & 0.1284 & -1880.8729 & 14887.5116 \\\\\n", "x27 & 176.1465 & 65.4348 & 2.6919 & 0.0071 & 47.8643 & 304.4288 \\\\\n", "x28 & 541.9165 & 145.6814 & 3.7199 & 0.0002 & 256.3141 & 827.5190 \\\\\n", "x29 & -5408.0462 & 5170.0682 & -1.0460 & 0.2956 & -15543.7521 & 4727.6598 \\\\\n", "x30 & 1591.6613 & 1576.5785 & 1.0096 & 0.3128 & -1499.1559 & 4682.4786 \\\\\n", "x31 & -3066.2318 & 1347.2069 & -2.2760 & 0.0229 & -5707.3755 & -425.0882 \\\\\n", "x32 & 611.6934 & 183.2814 & 3.3375 & 0.0009 & 252.3778 & 971.0089 \\\\\n", "x33 & 861.9664 & 2070.4337 & 0.4163 & 0.6772 & -3197.0336 & 4920.9664 \\\\\n", "x34 & -307.3877 & 171.8498 & -1.7887 & 0.0737 & -644.2921 & 29.5168 \\\\\n", "x35 & -8483.4913 & 6547.3948 & -1.2957 & 0.1951 & -21319.3893 & 4352.4067 \\\\\n", "x36 & 150.5489 & 83.1030 & 1.8116 & 0.0701 & -12.3711 & 313.4689 \\\\\n", "x37 & 300.7497 & 178.4947 & 1.6849 & 0.0921 & -49.1819 & 650.6813 \\\\\n", "x38 & 14067.7740 & 7800.2113 & 1.8035 & 0.0714 & -1224.2191 & 29359.7672 \\\\\n", "x39 & -5133.5558 & 2077.6861 & -2.4708 & 0.0135 & -9206.7738 & -1060.3378 \\\\\n", "x40 & -2372.2746 & 1576.8448 & -1.5044 & 0.1325 & -5463.6139 & 719.0647 \\\\\n", "x41 & 708.8006 & 236.3385 & 2.9991 & 0.0027 & 245.4687 & 1172.1325 \\\\\n", "x42 & -910.1293 & 1867.0943 & -0.4875 & 0.6260 & -4570.4908 & 2750.2323 \\\\\n", "x43 & 1971.4865 & 757.4887 & 2.6027 & 0.0093 & 486.4611 & 3456.5118 \\\\\n", "x44 & -7.6328 & 5.0273 & -1.5183 & 0.1290 & -17.4886 & 2.2230 \\\\\n", "x45 & 2.8665 & 12.6000 & 0.2275 & 0.8200 & -21.8354 & 27.5684 \\\\\n", "x46 & 1429.2194 & 705.7754 & 2.0250 & 0.0429 & 45.5757 & 2812.8631 \\\\\n", "x47 & -287.7160 & 203.0709 & -1.4168 & 0.1566 & -685.8281 & 110.3961 \\\\\n", "x48 & -189.7045 & 168.1916 & -1.1279 & 0.2594 & -519.4371 & 140.0282 \\\\\n", "x49 & -18.0129 & 19.5540 & -0.9212 & 0.3570 & -56.3477 & 20.3219 \\\\\n", "x50 & 10142.2002 & 7074.9790 & 1.4335 & 0.1518 & -3728.0049 & 24012.4054 \\\\\n", "x51 & -201.9536 & 331.1123 & -0.6099 & 0.5419 & -851.0857 & 447.1785 \\\\\n", "x52 & 1197.2260 & 682.4747 & 1.7542 & 0.0795 & -140.7375 & 2535.1896 \\\\\n", "x53 & 20265.9335 & 23064.5613 & 0.8787 & 0.3796 & -24951.1898 & 65483.0568 \\\\\n", "x54 & -12226.8108 & 6517.5063 & -1.8760 & 0.0607 & -25004.1137 & 550.4920 \\\\\n", "x55 & -3613.4768 & 4978.7999 & -0.7258 & 0.4680 & -13374.2092 & 6147.2555 \\\\\n", "x56 & 2196.0436 & 843.4185 & 2.6037 & 0.0092 & 542.5563 & 3849.5310 \\\\\n", "x57 & -909.6884 & 1867.1743 & -0.4872 & 0.6261 & -4570.2067 & 2750.8299 \\\\\n", "x58 & -24.9437 & 7.6926 & -3.2426 & 0.0012 & -40.0247 & -9.8627 \\\\\n", "x59 & 549.2329 & 298.2374 & 1.8416 & 0.0656 & -35.4492 & 1133.9150 \\\\\n", "x60 & -13.8185 & 79.9507 & -0.1728 & 0.8628 & -170.5585 & 142.9216 \\\\\n", "x61 & 59.9612 & 69.6414 & 0.8610 & 0.3893 & -76.5679 & 196.4903 \\\\\n", "x62 & -60.9958 & 9.6344 & -6.3310 & 0.0000 & -79.8838 & -42.1079 \\\\\n", "x63 & -915.2771 & 1867.4927 & -0.4901 & 0.6241 & -4576.4197 & 2745.8655 \\\\\n", "x64 & 856.3795 & 643.6237 & 1.3306 & 0.1834 & -405.4183 & 2118.1773 \\\\\n", "x65 & 172.8750 & 176.1471 & 0.9814 & 0.3264 & -172.4541 & 518.2041 \\\\\n", "x66 & 233.8040 & 154.5694 & 1.5126 & 0.1304 & -69.2230 & 536.8310 \\\\\n", "x67 & -208.7613 & 22.8714 & -9.1276 & 0.0000 & -253.5998 & -163.9229 \\\\\n", "x68 & 2377.5655 & 18554.7387 & 0.1281 & 0.8980 & -33998.2360 & 38753.3671 \\\\\n", "x69 & 11008.3563 & 10291.6307 & 1.0696 & 0.2848 & -9167.9622 & 31184.6748 \\\\\n", "x70 & -6367.0944 & 6065.2219 & -1.0498 & 0.2939 & -18257.7123 & 5523.5234 \\\\\n", "x71 & -2160.1700 & 944.6336 & -2.2868 & 0.0223 & -4012.0853 & -308.2546 \\\\\n", "x72 & -4031.2392 & 1137.0986 & -3.5452 & 0.0004 & -6260.4741 & -1802.0043 \\\\\n", "x73 & 1604.8538 & 1639.6859 & 0.9788 & 0.3277 & -1609.6830 & 4819.3905 \\\\\n", "x74 & 766.9861 & 249.3365 & 3.0761 & 0.0021 & 278.1721 & 1255.8001 \\\\\n", "x75 & -2563.1016 & 2045.6050 & -1.2530 & 0.2103 & -6573.4261 & 1447.2228 \\\\\n", "x76 & 596.8511 & 210.7614 & 2.8319 & 0.0046 & 183.6620 & 1010.0402 \\\\\n", "x77 & -1033.7653 & 1865.5885 & -0.5541 & 0.5795 & -4691.1748 & 2623.6442 \\\\\n", "\\hline\n", "\\end{tabular}\n", "\\end{center}\n", "\n", "\\begin{center}\n", "\\begin{tabular}{llll}\n", "\\hline\n", "Omnibus: & 74.655 & Durbin-Watson: & 2.012 \\\\\n", "Prob(Omnibus): & 0.000 & Jarque-Bera (JB): & 119.025 \\\\\n", "Skew: & 0.141 & Prob(JB): & 0.000 \\\\\n", "Kurtosis: & 3.712 & Condition No.: & 4390541685726112 \\\\\n", "\\hline\n", "\\end{tabular}\n", "\\end{center}\n", "\\end{table}\n", "\\bigskip\n", "Notes: \\newline \n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified. \\newline \n", "[2] The smallest eigenvalue is 7.09e-28. This might indicate that there are strong multicollinearity problems or that the design matrix is singular." ], "text/plain": [ "\n", "\"\"\"\n", " Results: Ordinary least squares\n", "===================================================================\n", "Model: OLS Adj. R-squared: 0.306 \n", "Dependent Variable: quality AIC: 10768.5223\n", "Date: 2024-01-23 00:08 BIC: 11268.3493\n", "No. Observations: 4872 Log-Likelihood: -5307.3 \n", "Df Model: 76 F-statistic: 29.30 \n", "Df Residuals: 4795 Prob (F-statistic): 0.00 \n", "R-squared: 0.317 Scale: 0.52557 \n", "-------------------------------------------------------------------\n", " Coef. Std.Err. t P>|t| [0.025 0.975] \n", "-------------------------------------------------------------------\n", "const 874.2126 1866.6100 0.4683 0.6396 -2785.1996 4533.6248\n", "x1 17.2438 25.1175 0.6865 0.4924 -31.9980 66.4856\n", "x2 -735.9147 164.2593 -4.4802 0.0000 -1057.9383 -413.8911\n", "x3 -375.2205 200.9788 -1.8670 0.0620 -769.2311 18.7900\n", "x4 2.1457 13.7859 0.1556 0.8763 -24.8809 29.1723\n", "x5 -1219.9140 760.0849 -1.6050 0.1086 -2710.0291 270.2011\n", "x6 33.0684 8.6300 3.8318 0.0001 16.1496 49.9873\n", "x7 45.6122 23.6785 1.9263 0.0541 -0.8085 92.0328\n", "x8 -1621.7821 721.4602 -2.2479 0.0246 -3036.1752 -207.3890\n", "x9 -123.6719 196.5043 -0.6294 0.5291 -508.9104 261.5667\n", "x10 -213.6188 172.6441 -1.2373 0.2160 -552.0806 124.8429\n", "x11 274.6811 25.3731 10.8257 0.0000 224.9381 324.4241\n", "x12 -888.0924 1860.0506 -0.4775 0.6331 -4534.6449 2758.4602\n", "x13 213.0448 149.3410 1.4266 0.1538 -79.7320 505.8216\n", "x14 -169.2454 191.2389 -0.8850 0.3762 -544.1614 205.6706\n", "x15 -2.3959 21.0911 -0.1136 0.9096 -43.7441 38.9523\n", "x16 151.4367 661.2643 0.2290 0.8189 -1144.9447 1447.8180\n", "x17 -13.3943 9.8122 -1.3651 0.1723 -32.6306 5.8421\n", "x18 -12.3144 22.1599 -0.5557 0.5784 -55.7580 31.1291\n", "x19 -228.1023 1055.2972 -0.2161 0.8289 -2296.9691 1840.7644\n", "x20 263.5729 260.8085 1.0106 0.3123 -247.7314 774.8773\n", "x21 210.1261 147.0152 1.4293 0.1530 -78.0912 498.3434\n", "x22 -102.4573 26.1357 -3.9202 0.0001 -153.6952 -51.2193\n", "x23 -1256.2263 1979.4050 -0.6346 0.5257 -5136.7684 2624.3158\n", "x24 2503.6940 1629.0043 1.5369 0.1244 -689.9019 5697.2899\n", "x25 -304.5840 139.4970 -2.1834 0.0291 -578.0621 -31.1060\n", "x26 6503.3193 4276.6479 1.5207 0.1284 -1880.8729 14887.5116\n", "x27 176.1465 65.4348 2.6919 0.0071 47.8643 304.4288\n", "x28 541.9165 145.6814 3.7199 0.0002 256.3141 827.5190\n", "x29 -5408.0462 5170.0682 -1.0460 0.2956 -15543.7521 4727.6598\n", "x30 1591.6613 1576.5785 1.0096 0.3128 -1499.1559 4682.4786\n", "x31 -3066.2318 1347.2069 -2.2760 0.0229 -5707.3755 -425.0882\n", "x32 611.6934 183.2814 3.3375 0.0009 252.3778 971.0089\n", "x33 861.9664 2070.4337 0.4163 0.6772 -3197.0336 4920.9664\n", "x34 -307.3877 171.8498 -1.7887 0.0737 -644.2921 29.5168\n", "x35 -8483.4913 6547.3948 -1.2957 0.1951 -21319.3893 4352.4067\n", "x36 150.5489 83.1030 1.8116 0.0701 -12.3711 313.4689\n", "x37 300.7497 178.4947 1.6849 0.0921 -49.1819 650.6813\n", "x38 14067.7740 7800.2113 1.8035 0.0714 -1224.2191 29359.7672\n", "x39 -5133.5558 2077.6861 -2.4708 0.0135 -9206.7738 -1060.3378\n", "x40 -2372.2746 1576.8448 -1.5044 0.1325 -5463.6139 719.0647\n", "x41 708.8006 236.3385 2.9991 0.0027 245.4687 1172.1325\n", "x42 -910.1293 1867.0943 -0.4875 0.6260 -4570.4908 2750.2323\n", "x43 1971.4865 757.4887 2.6027 0.0093 486.4611 3456.5118\n", "x44 -7.6328 5.0273 -1.5183 0.1290 -17.4886 2.2230\n", "x45 2.8665 12.6000 0.2275 0.8200 -21.8354 27.5684\n", "x46 1429.2194 705.7754 2.0250 0.0429 45.5757 2812.8631\n", "x47 -287.7160 203.0709 -1.4168 0.1566 -685.8281 110.3961\n", "x48 -189.7045 168.1916 -1.1279 0.2594 -519.4371 140.0282\n", "x49 -18.0129 19.5540 -0.9212 0.3570 -56.3477 20.3219\n", "x50 10142.2002 7074.9790 1.4335 0.1518 -3728.0049 24012.4054\n", "x51 -201.9536 331.1123 -0.6099 0.5419 -851.0857 447.1785\n", "x52 1197.2260 682.4747 1.7542 0.0795 -140.7375 2535.1896\n", "x53 20265.9335 23064.5613 0.8787 0.3796 -24951.1898 65483.0568\n", "x54 -12226.8108 6517.5063 -1.8760 0.0607 -25004.1137 550.4920\n", "x55 -3613.4768 4978.7999 -0.7258 0.4680 -13374.2092 6147.2555\n", "x56 2196.0436 843.4185 2.6037 0.0092 542.5563 3849.5310\n", "x57 -909.6884 1867.1743 -0.4872 0.6261 -4570.2067 2750.8299\n", "x58 -24.9437 7.6926 -3.2426 0.0012 -40.0247 -9.8627\n", "x59 549.2329 298.2374 1.8416 0.0656 -35.4492 1133.9150\n", "x60 -13.8185 79.9507 -0.1728 0.8628 -170.5585 142.9216\n", "x61 59.9612 69.6414 0.8610 0.3893 -76.5679 196.4903\n", "x62 -60.9958 9.6344 -6.3310 0.0000 -79.8838 -42.1079\n", "x63 -915.2771 1867.4927 -0.4901 0.6241 -4576.4197 2745.8655\n", "x64 856.3795 643.6237 1.3306 0.1834 -405.4183 2118.1773\n", "x65 172.8750 176.1471 0.9814 0.3264 -172.4541 518.2041\n", "x66 233.8040 154.5694 1.5126 0.1304 -69.2230 536.8310\n", "x67 -208.7613 22.8714 -9.1276 0.0000 -253.5998 -163.9229\n", "x68 2377.5655 18554.7387 0.1281 0.8980 -33998.2360 38753.3671\n", "x69 11008.3563 10291.6307 1.0696 0.2848 -9167.9622 31184.6748\n", "x70 -6367.0944 6065.2219 -1.0498 0.2939 -18257.7123 5523.5234\n", "x71 -2160.1700 944.6336 -2.2868 0.0223 -4012.0853 -308.2546\n", "x72 -4031.2392 1137.0986 -3.5452 0.0004 -6260.4741 -1802.0043\n", "x73 1604.8538 1639.6859 0.9788 0.3277 -1609.6830 4819.3905\n", "x74 766.9861 249.3365 3.0761 0.0021 278.1721 1255.8001\n", "x75 -2563.1016 2045.6050 -1.2530 0.2103 -6573.4261 1447.2228\n", "x76 596.8511 210.7614 2.8319 0.0046 183.6620 1010.0402\n", "x77 -1033.7653 1865.5885 -0.5541 0.5795 -4691.1748 2623.6442\n", "-------------------------------------------------------------------\n", "Omnibus: 74.655 Durbin-Watson: 2.012 \n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 119.025 \n", "Skew: 0.141 Prob(JB): 0.000 \n", "Kurtosis: 3.712 Condition No.: 4390541685726112\n", "===================================================================\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors\n", "is correctly specified.\n", "[2] The smallest eigenvalue is 7.09e-28. This might indicate that\n", "there are strong multicollinearity problems or that the design\n", "matrix is singular.\n", "\"\"\"" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from statsmodels.regression.linear_model import OLS\n", "\n", "model = OLS(y_train, poly_feat_train)\n", "results = model.fit()\n", "results.summary2()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ce n'est pas très lisible. Il faut ajouter le nom de chaque variable et recommencer." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypH...density^2density * pHdensity * sulphatesdensity * alcoholpH^2pH * sulphatespH * alcoholsulphates^2sulphates * alcoholalcohol^2
01.00.0613160.0035110.0034610.0197790.0004550.3065820.9395260.0097730.030263...0.0000960.0002960.0000440.0013150.0009160.0001380.0040700.0000210.0006120.018090
11.00.0352360.0013730.0019220.0654380.0002060.2059240.9747060.0045720.014552...0.0000210.0000670.0000130.0001920.0002120.0000420.0006130.0000080.0001210.001772
21.00.0425790.0013190.0019190.1013490.0003360.2938520.9475220.0059960.020210...0.0000360.0001210.0000140.0003450.0004080.0000460.0011630.0000050.0001310.003314
31.00.0536380.0009200.0014560.0375470.0004210.2068900.9731470.0076270.025210...0.0000580.0001920.0000240.0005490.0006360.0000790.0018160.0000100.0002260.005188
41.00.0714980.0024130.0029490.0107250.0004470.3664280.9205400.0088480.026812...0.0000780.0002370.0000360.0009810.0007190.0001080.0029710.0000160.0004460.012282
\n", "

5 rows × 78 columns

\n", "
" ], "text/plain": [ " 1 fixed_acidity volatile_acidity citric_acid residual_sugar \\\n", "0 1.0 0.061316 0.003511 0.003461 0.019779 \n", "1 1.0 0.035236 0.001373 0.001922 0.065438 \n", "2 1.0 0.042579 0.001319 0.001919 0.101349 \n", "3 1.0 0.053638 0.000920 0.001456 0.037547 \n", "4 1.0 0.071498 0.002413 0.002949 0.010725 \n", "\n", " chlorides free_sulfur_dioxide total_sulfur_dioxide density pH \\\n", "0 0.000455 0.306582 0.939526 0.009773 0.030263 \n", "1 0.000206 0.205924 0.974706 0.004572 0.014552 \n", "2 0.000336 0.293852 0.947522 0.005996 0.020210 \n", "3 0.000421 0.206890 0.973147 0.007627 0.025210 \n", "4 0.000447 0.366428 0.920540 0.008848 0.026812 \n", "\n", " ... density^2 density * pH density * sulphates density * alcohol \\\n", "0 ... 0.000096 0.000296 0.000044 0.001315 \n", "1 ... 0.000021 0.000067 0.000013 0.000192 \n", "2 ... 0.000036 0.000121 0.000014 0.000345 \n", "3 ... 0.000058 0.000192 0.000024 0.000549 \n", "4 ... 0.000078 0.000237 0.000036 0.000981 \n", "\n", " pH^2 pH * sulphates pH * alcohol sulphates^2 sulphates * alcohol \\\n", "0 0.000916 0.000138 0.004070 0.000021 0.000612 \n", "1 0.000212 0.000042 0.000613 0.000008 0.000121 \n", "2 0.000408 0.000046 0.001163 0.000005 0.000131 \n", "3 0.000636 0.000079 0.001816 0.000010 0.000226 \n", "4 0.000719 0.000108 0.002971 0.000016 0.000446 \n", "\n", " alcohol^2 \n", "0 0.018090 \n", "1 0.001772 \n", "2 0.003314 \n", "3 0.005188 \n", "4 0.012282 \n", "\n", "[5 rows x 78 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "names = poly.get_feature_names_out(input_features=data.columns[:-2])\n", "names = [n.replace(\" \", \" * \") for n in names]\n", "pft = pandas.DataFrame(poly_feat_train, columns=names)\n", "pft.head()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Model: OLS Adj. R-squared: 0.306
Dependent Variable: quality AIC: 10768.5223
Date: 2024-01-23 00:09 BIC: 11268.3493
No. Observations: 4872 Log-Likelihood: -5307.3
Df Model: 76 F-statistic: 29.30
Df Residuals: 4795 Prob (F-statistic): 0.00
R-squared: 0.317 Scale: 0.52557
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Coef. Std.Err. t P>|t| [0.025 0.975]
1 874.2126 1866.6100 0.4683 0.6396 -2785.1996 4533.6248
fixed_acidity 17.2438 25.1175 0.6865 0.4924 -31.9980 66.4856
volatile_acidity -735.9147 164.2593 -4.4802 0.0000 -1057.9383 -413.8911
citric_acid -375.2205 200.9788 -1.8670 0.0620 -769.2311 18.7900
residual_sugar 2.1457 13.7859 0.1556 0.8763 -24.8809 29.1723
chlorides -1219.9140 760.0849 -1.6050 0.1086 -2710.0291 270.2011
free_sulfur_dioxide 33.0684 8.6300 3.8318 0.0001 16.1496 49.9873
total_sulfur_dioxide 45.6122 23.6785 1.9263 0.0541 -0.8085 92.0328
density -1621.7821 721.4602 -2.2479 0.0246 -3036.1752 -207.3890
pH -123.6719 196.5043 -0.6294 0.5291 -508.9104 261.5667
sulphates -213.6188 172.6441 -1.2373 0.2160 -552.0806 124.8429
alcohol 274.6811 25.3731 10.8257 0.0000 224.9381 324.4241
fixed_acidity^2 -888.0924 1860.0506 -0.4775 0.6331 -4534.6449 2758.4602
fixed_acidity * volatile_acidity 213.0448 149.3410 1.4266 0.1538 -79.7320 505.8216
fixed_acidity * citric_acid -169.2454 191.2389 -0.8850 0.3762 -544.1614 205.6706
fixed_acidity * residual_sugar -2.3959 21.0911 -0.1136 0.9096 -43.7441 38.9523
fixed_acidity * chlorides 151.4367 661.2643 0.2290 0.8189 -1144.9447 1447.8180
fixed_acidity * free_sulfur_dioxide -13.3943 9.8122 -1.3651 0.1723 -32.6306 5.8421
fixed_acidity * total_sulfur_dioxide -12.3144 22.1599 -0.5557 0.5784 -55.7580 31.1291
fixed_acidity * density -228.1023 1055.2972 -0.2161 0.8289 -2296.9691 1840.7644
fixed_acidity * pH 263.5729 260.8085 1.0106 0.3123 -247.7314 774.8773
fixed_acidity * sulphates 210.1261 147.0152 1.4293 0.1530 -78.0912 498.3434
fixed_acidity * alcohol -102.4573 26.1357 -3.9202 0.0001 -153.6952 -51.2193
volatile_acidity^2 -1256.2263 1979.4050 -0.6346 0.5257 -5136.7684 2624.3158
volatile_acidity * citric_acid 2503.6940 1629.0043 1.5369 0.1244 -689.9019 5697.2899
volatile_acidity * residual_sugar -304.5840 139.4970 -2.1834 0.0291 -578.0621 -31.1060
volatile_acidity * chlorides 6503.3193 4276.6479 1.5207 0.1284 -1880.8729 14887.5116
volatile_acidity * free_sulfur_dioxide 176.1465 65.4348 2.6919 0.0071 47.8643 304.4288
volatile_acidity * total_sulfur_dioxide 541.9165 145.6814 3.7199 0.0002 256.3141 827.5190
volatile_acidity * density -5408.0462 5170.0682 -1.0460 0.2956 -15543.7521 4727.6598
volatile_acidity * pH 1591.6613 1576.5785 1.0096 0.3128 -1499.1559 4682.4786
volatile_acidity * sulphates -3066.2318 1347.2069 -2.2760 0.0229 -5707.3755 -425.0882
volatile_acidity * alcohol 611.6934 183.2814 3.3375 0.0009 252.3778 971.0089
citric_acid^2 861.9664 2070.4337 0.4163 0.6772 -3197.0336 4920.9664
citric_acid * residual_sugar -307.3877 171.8498 -1.7887 0.0737 -644.2921 29.5168
citric_acid * chlorides -8483.4913 6547.3948 -1.2957 0.1951 -21319.3893 4352.4067
citric_acid * free_sulfur_dioxide 150.5489 83.1030 1.8116 0.0701 -12.3711 313.4689
citric_acid * total_sulfur_dioxide 300.7497 178.4947 1.6849 0.0921 -49.1819 650.6813
citric_acid * density 14067.7740 7800.2113 1.8035 0.0714 -1224.2191 29359.7672
citric_acid * pH -5133.5558 2077.6861 -2.4708 0.0135 -9206.7738 -1060.3378
citric_acid * sulphates -2372.2746 1576.8448 -1.5044 0.1325 -5463.6139 719.0647
citric_acid * alcohol 708.8006 236.3385 2.9991 0.0027 245.4687 1172.1325
residual_sugar^2 -910.1293 1867.0943 -0.4875 0.6260 -4570.4908 2750.2323
residual_sugar * chlorides 1971.4865 757.4887 2.6027 0.0093 486.4611 3456.5118
residual_sugar * free_sulfur_dioxide -7.6328 5.0273 -1.5183 0.1290 -17.4886 2.2230
residual_sugar * total_sulfur_dioxide 2.8665 12.6000 0.2275 0.8200 -21.8354 27.5684
residual_sugar * density 1429.2194 705.7754 2.0250 0.0429 45.5757 2812.8631
residual_sugar * pH -287.7160 203.0709 -1.4168 0.1566 -685.8281 110.3961
residual_sugar * sulphates -189.7045 168.1916 -1.1279 0.2594 -519.4371 140.0282
residual_sugar * alcohol -18.0129 19.5540 -0.9212 0.3570 -56.3477 20.3219
chlorides^2 10142.2002 7074.9790 1.4335 0.1518 -3728.0049 24012.4054
chlorides * free_sulfur_dioxide -201.9536 331.1123 -0.6099 0.5419 -851.0857 447.1785
chlorides * total_sulfur_dioxide 1197.2260 682.4747 1.7542 0.0795 -140.7375 2535.1896
chlorides * density 20265.9335 23064.5613 0.8787 0.3796 -24951.1898 65483.0568
chlorides * pH -12226.8108 6517.5063 -1.8760 0.0607 -25004.1137 550.4920
chlorides * sulphates -3613.4768 4978.7999 -0.7258 0.4680 -13374.2092 6147.2555
chlorides * alcohol 2196.0436 843.4185 2.6037 0.0092 542.5563 3849.5310
free_sulfur_dioxide^2 -909.6884 1867.1743 -0.4872 0.6261 -4570.2067 2750.8299
free_sulfur_dioxide * total_sulfur_dioxide -24.9437 7.6926 -3.2426 0.0012 -40.0247 -9.8627
free_sulfur_dioxide * density 549.2329 298.2374 1.8416 0.0656 -35.4492 1133.9150
free_sulfur_dioxide * pH -13.8185 79.9507 -0.1728 0.8628 -170.5585 142.9216
free_sulfur_dioxide * sulphates 59.9612 69.6414 0.8610 0.3893 -76.5679 196.4903
free_sulfur_dioxide * alcohol -60.9958 9.6344 -6.3310 0.0000 -79.8838 -42.1079
total_sulfur_dioxide^2 -915.2771 1867.4927 -0.4901 0.6241 -4576.4197 2745.8655
total_sulfur_dioxide * density 856.3795 643.6237 1.3306 0.1834 -405.4183 2118.1773
total_sulfur_dioxide * pH 172.8750 176.1471 0.9814 0.3264 -172.4541 518.2041
total_sulfur_dioxide * sulphates 233.8040 154.5694 1.5126 0.1304 -69.2230 536.8310
total_sulfur_dioxide * alcohol -208.7613 22.8714 -9.1276 0.0000 -253.5998 -163.9229
density^2 2377.5655 18554.7387 0.1281 0.8980 -33998.2360 38753.3671
density * pH 11008.3563 10291.6307 1.0696 0.2848 -9167.9622 31184.6748
density * sulphates -6367.0944 6065.2219 -1.0498 0.2939 -18257.7123 5523.5234
density * alcohol -2160.1700 944.6336 -2.2868 0.0223 -4012.0853 -308.2546
pH^2 -4031.2392 1137.0986 -3.5452 0.0004 -6260.4741 -1802.0043
pH * sulphates 1604.8538 1639.6859 0.9788 0.3277 -1609.6830 4819.3905
pH * alcohol 766.9861 249.3365 3.0761 0.0021 278.1721 1255.8001
sulphates^2 -2563.1016 2045.6050 -1.2530 0.2103 -6573.4261 1447.2228
sulphates * alcohol 596.8511 210.7614 2.8319 0.0046 183.6620 1010.0402
alcohol^2 -1033.7653 1865.5885 -0.5541 0.5795 -4691.1748 2623.6442
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 74.655 Durbin-Watson: 2.012
Prob(Omnibus): 0.000 Jarque-Bera (JB): 119.025
Skew: 0.141 Prob(JB): 0.000
Kurtosis: 3.712 Condition No.: 4390541685726112

\n", "Notes:
\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
\n", "[2] The smallest eigenvalue is 7.09e-28. This might indicate that there are strong multicollinearity problems or that the design matrix is singular." ], "text/latex": [ "\\begin{table}\n", "\\caption{Results: Ordinary least squares}\n", "\\label{}\n", "\\begin{center}\n", "\\begin{tabular}{llll}\n", "\\hline\n", "Model: & OLS & Adj. R-squared: & 0.306 \\\\\n", "Dependent Variable: & quality & AIC: & 10768.5223 \\\\\n", "Date: & 2024-01-23 00:09 & BIC: & 11268.3493 \\\\\n", "No. Observations: & 4872 & Log-Likelihood: & -5307.3 \\\\\n", "Df Model: & 76 & F-statistic: & 29.30 \\\\\n", "Df Residuals: & 4795 & Prob (F-statistic): & 0.00 \\\\\n", "R-squared: & 0.317 & Scale: & 0.52557 \\\\\n", "\\hline\n", "\\end{tabular}\n", "\\end{center}\n", "\n", "\\begin{center}\n", "\\begin{tabular}{lrrrrrr}\n", "\\hline\n", " & Coef. & Std.Err. & t & P$> |$t$|$ & [0.025 & 0.975] \\\\\n", "\\hline\n", "1 & 874.2126 & 1866.6100 & 0.4683 & 0.6396 & -2785.1996 & 4533.6248 \\\\\n", "fixed\\_acidity & 17.2438 & 25.1175 & 0.6865 & 0.4924 & -31.9980 & 66.4856 \\\\\n", "volatile\\_acidity & -735.9147 & 164.2593 & -4.4802 & 0.0000 & -1057.9383 & -413.8911 \\\\\n", "citric\\_acid & -375.2205 & 200.9788 & -1.8670 & 0.0620 & -769.2311 & 18.7900 \\\\\n", "residual\\_sugar & 2.1457 & 13.7859 & 0.1556 & 0.8763 & -24.8809 & 29.1723 \\\\\n", "chlorides & -1219.9140 & 760.0849 & -1.6050 & 0.1086 & -2710.0291 & 270.2011 \\\\\n", "free\\_sulfur\\_dioxide & 33.0684 & 8.6300 & 3.8318 & 0.0001 & 16.1496 & 49.9873 \\\\\n", "total\\_sulfur\\_dioxide & 45.6122 & 23.6785 & 1.9263 & 0.0541 & -0.8085 & 92.0328 \\\\\n", "density & -1621.7821 & 721.4602 & -2.2479 & 0.0246 & -3036.1752 & -207.3890 \\\\\n", "pH & -123.6719 & 196.5043 & -0.6294 & 0.5291 & -508.9104 & 261.5667 \\\\\n", "sulphates & -213.6188 & 172.6441 & -1.2373 & 0.2160 & -552.0806 & 124.8429 \\\\\n", "alcohol & 274.6811 & 25.3731 & 10.8257 & 0.0000 & 224.9381 & 324.4241 \\\\\n", "fixed\\_acidity^2 & -888.0924 & 1860.0506 & -0.4775 & 0.6331 & -4534.6449 & 2758.4602 \\\\\n", "fixed\\_acidity * volatile\\_acidity & 213.0448 & 149.3410 & 1.4266 & 0.1538 & -79.7320 & 505.8216 \\\\\n", "fixed\\_acidity * citric\\_acid & -169.2454 & 191.2389 & -0.8850 & 0.3762 & -544.1614 & 205.6706 \\\\\n", "fixed\\_acidity * residual\\_sugar & -2.3959 & 21.0911 & -0.1136 & 0.9096 & -43.7441 & 38.9523 \\\\\n", "fixed\\_acidity * chlorides & 151.4367 & 661.2643 & 0.2290 & 0.8189 & -1144.9447 & 1447.8180 \\\\\n", "fixed\\_acidity * free\\_sulfur\\_dioxide & -13.3943 & 9.8122 & -1.3651 & 0.1723 & -32.6306 & 5.8421 \\\\\n", "fixed\\_acidity * total\\_sulfur\\_dioxide & -12.3144 & 22.1599 & -0.5557 & 0.5784 & -55.7580 & 31.1291 \\\\\n", "fixed\\_acidity * density & -228.1023 & 1055.2972 & -0.2161 & 0.8289 & -2296.9691 & 1840.7644 \\\\\n", "fixed\\_acidity * pH & 263.5729 & 260.8085 & 1.0106 & 0.3123 & -247.7314 & 774.8773 \\\\\n", "fixed\\_acidity * sulphates & 210.1261 & 147.0152 & 1.4293 & 0.1530 & -78.0912 & 498.3434 \\\\\n", "fixed\\_acidity * alcohol & -102.4573 & 26.1357 & -3.9202 & 0.0001 & -153.6952 & -51.2193 \\\\\n", "volatile\\_acidity^2 & -1256.2263 & 1979.4050 & -0.6346 & 0.5257 & -5136.7684 & 2624.3158 \\\\\n", "volatile\\_acidity * citric\\_acid & 2503.6940 & 1629.0043 & 1.5369 & 0.1244 & -689.9019 & 5697.2899 \\\\\n", "volatile\\_acidity * residual\\_sugar & -304.5840 & 139.4970 & -2.1834 & 0.0291 & -578.0621 & -31.1060 \\\\\n", "volatile\\_acidity * chlorides & 6503.3193 & 4276.6479 & 1.5207 & 0.1284 & -1880.8729 & 14887.5116 \\\\\n", "volatile\\_acidity * free\\_sulfur\\_dioxide & 176.1465 & 65.4348 & 2.6919 & 0.0071 & 47.8643 & 304.4288 \\\\\n", "volatile\\_acidity * total\\_sulfur\\_dioxide & 541.9165 & 145.6814 & 3.7199 & 0.0002 & 256.3141 & 827.5190 \\\\\n", "volatile\\_acidity * density & -5408.0462 & 5170.0682 & -1.0460 & 0.2956 & -15543.7521 & 4727.6598 \\\\\n", "volatile\\_acidity * pH & 1591.6613 & 1576.5785 & 1.0096 & 0.3128 & -1499.1559 & 4682.4786 \\\\\n", "volatile\\_acidity * sulphates & -3066.2318 & 1347.2069 & -2.2760 & 0.0229 & -5707.3755 & -425.0882 \\\\\n", "volatile\\_acidity * alcohol & 611.6934 & 183.2814 & 3.3375 & 0.0009 & 252.3778 & 971.0089 \\\\\n", "citric\\_acid^2 & 861.9664 & 2070.4337 & 0.4163 & 0.6772 & -3197.0336 & 4920.9664 \\\\\n", "citric\\_acid * residual\\_sugar & -307.3877 & 171.8498 & -1.7887 & 0.0737 & -644.2921 & 29.5168 \\\\\n", "citric\\_acid * chlorides & -8483.4913 & 6547.3948 & -1.2957 & 0.1951 & -21319.3893 & 4352.4067 \\\\\n", "citric\\_acid * free\\_sulfur\\_dioxide & 150.5489 & 83.1030 & 1.8116 & 0.0701 & -12.3711 & 313.4689 \\\\\n", "citric\\_acid * total\\_sulfur\\_dioxide & 300.7497 & 178.4947 & 1.6849 & 0.0921 & -49.1819 & 650.6813 \\\\\n", "citric\\_acid * density & 14067.7740 & 7800.2113 & 1.8035 & 0.0714 & -1224.2191 & 29359.7672 \\\\\n", "citric\\_acid * pH & -5133.5558 & 2077.6861 & -2.4708 & 0.0135 & -9206.7738 & -1060.3378 \\\\\n", "citric\\_acid * sulphates & -2372.2746 & 1576.8448 & -1.5044 & 0.1325 & -5463.6139 & 719.0647 \\\\\n", "citric\\_acid * alcohol & 708.8006 & 236.3385 & 2.9991 & 0.0027 & 245.4687 & 1172.1325 \\\\\n", "residual\\_sugar^2 & -910.1293 & 1867.0943 & -0.4875 & 0.6260 & -4570.4908 & 2750.2323 \\\\\n", "residual\\_sugar * chlorides & 1971.4865 & 757.4887 & 2.6027 & 0.0093 & 486.4611 & 3456.5118 \\\\\n", "residual\\_sugar * free\\_sulfur\\_dioxide & -7.6328 & 5.0273 & -1.5183 & 0.1290 & -17.4886 & 2.2230 \\\\\n", "residual\\_sugar * total\\_sulfur\\_dioxide & 2.8665 & 12.6000 & 0.2275 & 0.8200 & -21.8354 & 27.5684 \\\\\n", "residual\\_sugar * density & 1429.2194 & 705.7754 & 2.0250 & 0.0429 & 45.5757 & 2812.8631 \\\\\n", "residual\\_sugar * pH & -287.7160 & 203.0709 & -1.4168 & 0.1566 & -685.8281 & 110.3961 \\\\\n", "residual\\_sugar * sulphates & -189.7045 & 168.1916 & -1.1279 & 0.2594 & -519.4371 & 140.0282 \\\\\n", "residual\\_sugar * alcohol & -18.0129 & 19.5540 & -0.9212 & 0.3570 & -56.3477 & 20.3219 \\\\\n", "chlorides^2 & 10142.2002 & 7074.9790 & 1.4335 & 0.1518 & -3728.0049 & 24012.4054 \\\\\n", "chlorides * free\\_sulfur\\_dioxide & -201.9536 & 331.1123 & -0.6099 & 0.5419 & -851.0857 & 447.1785 \\\\\n", "chlorides * total\\_sulfur\\_dioxide & 1197.2260 & 682.4747 & 1.7542 & 0.0795 & -140.7375 & 2535.1896 \\\\\n", "chlorides * density & 20265.9335 & 23064.5613 & 0.8787 & 0.3796 & -24951.1898 & 65483.0568 \\\\\n", "chlorides * pH & -12226.8108 & 6517.5063 & -1.8760 & 0.0607 & -25004.1137 & 550.4920 \\\\\n", "chlorides * sulphates & -3613.4768 & 4978.7999 & -0.7258 & 0.4680 & -13374.2092 & 6147.2555 \\\\\n", "chlorides * alcohol & 2196.0436 & 843.4185 & 2.6037 & 0.0092 & 542.5563 & 3849.5310 \\\\\n", "free\\_sulfur\\_dioxide^2 & -909.6884 & 1867.1743 & -0.4872 & 0.6261 & -4570.2067 & 2750.8299 \\\\\n", "free\\_sulfur\\_dioxide * total\\_sulfur\\_dioxide & -24.9437 & 7.6926 & -3.2426 & 0.0012 & -40.0247 & -9.8627 \\\\\n", "free\\_sulfur\\_dioxide * density & 549.2329 & 298.2374 & 1.8416 & 0.0656 & -35.4492 & 1133.9150 \\\\\n", "free\\_sulfur\\_dioxide * pH & -13.8185 & 79.9507 & -0.1728 & 0.8628 & -170.5585 & 142.9216 \\\\\n", "free\\_sulfur\\_dioxide * sulphates & 59.9612 & 69.6414 & 0.8610 & 0.3893 & -76.5679 & 196.4903 \\\\\n", "free\\_sulfur\\_dioxide * alcohol & -60.9958 & 9.6344 & -6.3310 & 0.0000 & -79.8838 & -42.1079 \\\\\n", "total\\_sulfur\\_dioxide^2 & -915.2771 & 1867.4927 & -0.4901 & 0.6241 & -4576.4197 & 2745.8655 \\\\\n", "total\\_sulfur\\_dioxide * density & 856.3795 & 643.6237 & 1.3306 & 0.1834 & -405.4183 & 2118.1773 \\\\\n", "total\\_sulfur\\_dioxide * pH & 172.8750 & 176.1471 & 0.9814 & 0.3264 & -172.4541 & 518.2041 \\\\\n", "total\\_sulfur\\_dioxide * sulphates & 233.8040 & 154.5694 & 1.5126 & 0.1304 & -69.2230 & 536.8310 \\\\\n", "total\\_sulfur\\_dioxide * alcohol & -208.7613 & 22.8714 & -9.1276 & 0.0000 & -253.5998 & -163.9229 \\\\\n", "density^2 & 2377.5655 & 18554.7387 & 0.1281 & 0.8980 & -33998.2360 & 38753.3671 \\\\\n", "density * pH & 11008.3563 & 10291.6307 & 1.0696 & 0.2848 & -9167.9622 & 31184.6748 \\\\\n", "density * sulphates & -6367.0944 & 6065.2219 & -1.0498 & 0.2939 & -18257.7123 & 5523.5234 \\\\\n", "density * alcohol & -2160.1700 & 944.6336 & -2.2868 & 0.0223 & -4012.0853 & -308.2546 \\\\\n", "pH^2 & -4031.2392 & 1137.0986 & -3.5452 & 0.0004 & -6260.4741 & -1802.0043 \\\\\n", "pH * sulphates & 1604.8538 & 1639.6859 & 0.9788 & 0.3277 & -1609.6830 & 4819.3905 \\\\\n", "pH * alcohol & 766.9861 & 249.3365 & 3.0761 & 0.0021 & 278.1721 & 1255.8001 \\\\\n", "sulphates^2 & -2563.1016 & 2045.6050 & -1.2530 & 0.2103 & -6573.4261 & 1447.2228 \\\\\n", "sulphates * alcohol & 596.8511 & 210.7614 & 2.8319 & 0.0046 & 183.6620 & 1010.0402 \\\\\n", "alcohol^2 & -1033.7653 & 1865.5885 & -0.5541 & 0.5795 & -4691.1748 & 2623.6442 \\\\\n", "\\hline\n", "\\end{tabular}\n", "\\end{center}\n", "\n", "\\begin{center}\n", "\\begin{tabular}{llll}\n", "\\hline\n", "Omnibus: & 74.655 & Durbin-Watson: & 2.012 \\\\\n", "Prob(Omnibus): & 0.000 & Jarque-Bera (JB): & 119.025 \\\\\n", "Skew: & 0.141 & Prob(JB): & 0.000 \\\\\n", "Kurtosis: & 3.712 & Condition No.: & 4390541685726112 \\\\\n", "\\hline\n", "\\end{tabular}\n", "\\end{center}\n", "\\end{table}\n", "\\bigskip\n", "Notes: \\newline \n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified. \\newline \n", "[2] The smallest eigenvalue is 7.09e-28. This might indicate that there are strong multicollinearity problems or that the design matrix is singular." ], "text/plain": [ "\n", "\"\"\"\n", " Results: Ordinary least squares\n", "=======================================================================================================\n", "Model: OLS Adj. R-squared: 0.306 \n", "Dependent Variable: quality AIC: 10768.5223\n", "Date: 2024-01-23 00:09 BIC: 11268.3493\n", "No. Observations: 4872 Log-Likelihood: -5307.3 \n", "Df Model: 76 F-statistic: 29.30 \n", "Df Residuals: 4795 Prob (F-statistic): 0.00 \n", "R-squared: 0.317 Scale: 0.52557 \n", "-------------------------------------------------------------------------------------------------------\n", " Coef. Std.Err. t P>|t| [0.025 0.975] \n", "-------------------------------------------------------------------------------------------------------\n", "1 874.2126 1866.6100 0.4683 0.6396 -2785.1996 4533.6248\n", "fixed_acidity 17.2438 25.1175 0.6865 0.4924 -31.9980 66.4856\n", "volatile_acidity -735.9147 164.2593 -4.4802 0.0000 -1057.9383 -413.8911\n", "citric_acid -375.2205 200.9788 -1.8670 0.0620 -769.2311 18.7900\n", "residual_sugar 2.1457 13.7859 0.1556 0.8763 -24.8809 29.1723\n", "chlorides -1219.9140 760.0849 -1.6050 0.1086 -2710.0291 270.2011\n", "free_sulfur_dioxide 33.0684 8.6300 3.8318 0.0001 16.1496 49.9873\n", "total_sulfur_dioxide 45.6122 23.6785 1.9263 0.0541 -0.8085 92.0328\n", "density -1621.7821 721.4602 -2.2479 0.0246 -3036.1752 -207.3890\n", "pH -123.6719 196.5043 -0.6294 0.5291 -508.9104 261.5667\n", "sulphates -213.6188 172.6441 -1.2373 0.2160 -552.0806 124.8429\n", "alcohol 274.6811 25.3731 10.8257 0.0000 224.9381 324.4241\n", "fixed_acidity^2 -888.0924 1860.0506 -0.4775 0.6331 -4534.6449 2758.4602\n", "fixed_acidity * volatile_acidity 213.0448 149.3410 1.4266 0.1538 -79.7320 505.8216\n", "fixed_acidity * citric_acid -169.2454 191.2389 -0.8850 0.3762 -544.1614 205.6706\n", "fixed_acidity * residual_sugar -2.3959 21.0911 -0.1136 0.9096 -43.7441 38.9523\n", "fixed_acidity * chlorides 151.4367 661.2643 0.2290 0.8189 -1144.9447 1447.8180\n", "fixed_acidity * free_sulfur_dioxide -13.3943 9.8122 -1.3651 0.1723 -32.6306 5.8421\n", "fixed_acidity * total_sulfur_dioxide -12.3144 22.1599 -0.5557 0.5784 -55.7580 31.1291\n", "fixed_acidity * density -228.1023 1055.2972 -0.2161 0.8289 -2296.9691 1840.7644\n", "fixed_acidity * pH 263.5729 260.8085 1.0106 0.3123 -247.7314 774.8773\n", "fixed_acidity * sulphates 210.1261 147.0152 1.4293 0.1530 -78.0912 498.3434\n", "fixed_acidity * alcohol -102.4573 26.1357 -3.9202 0.0001 -153.6952 -51.2193\n", "volatile_acidity^2 -1256.2263 1979.4050 -0.6346 0.5257 -5136.7684 2624.3158\n", "volatile_acidity * citric_acid 2503.6940 1629.0043 1.5369 0.1244 -689.9019 5697.2899\n", "volatile_acidity * residual_sugar -304.5840 139.4970 -2.1834 0.0291 -578.0621 -31.1060\n", "volatile_acidity * chlorides 6503.3193 4276.6479 1.5207 0.1284 -1880.8729 14887.5116\n", "volatile_acidity * free_sulfur_dioxide 176.1465 65.4348 2.6919 0.0071 47.8643 304.4288\n", "volatile_acidity * total_sulfur_dioxide 541.9165 145.6814 3.7199 0.0002 256.3141 827.5190\n", "volatile_acidity * density -5408.0462 5170.0682 -1.0460 0.2956 -15543.7521 4727.6598\n", "volatile_acidity * pH 1591.6613 1576.5785 1.0096 0.3128 -1499.1559 4682.4786\n", "volatile_acidity * sulphates -3066.2318 1347.2069 -2.2760 0.0229 -5707.3755 -425.0882\n", "volatile_acidity * alcohol 611.6934 183.2814 3.3375 0.0009 252.3778 971.0089\n", "citric_acid^2 861.9664 2070.4337 0.4163 0.6772 -3197.0336 4920.9664\n", "citric_acid * residual_sugar -307.3877 171.8498 -1.7887 0.0737 -644.2921 29.5168\n", "citric_acid * chlorides -8483.4913 6547.3948 -1.2957 0.1951 -21319.3893 4352.4067\n", "citric_acid * free_sulfur_dioxide 150.5489 83.1030 1.8116 0.0701 -12.3711 313.4689\n", "citric_acid * total_sulfur_dioxide 300.7497 178.4947 1.6849 0.0921 -49.1819 650.6813\n", "citric_acid * density 14067.7740 7800.2113 1.8035 0.0714 -1224.2191 29359.7672\n", "citric_acid * pH -5133.5558 2077.6861 -2.4708 0.0135 -9206.7738 -1060.3378\n", "citric_acid * sulphates -2372.2746 1576.8448 -1.5044 0.1325 -5463.6139 719.0647\n", "citric_acid * alcohol 708.8006 236.3385 2.9991 0.0027 245.4687 1172.1325\n", "residual_sugar^2 -910.1293 1867.0943 -0.4875 0.6260 -4570.4908 2750.2323\n", "residual_sugar * chlorides 1971.4865 757.4887 2.6027 0.0093 486.4611 3456.5118\n", "residual_sugar * free_sulfur_dioxide -7.6328 5.0273 -1.5183 0.1290 -17.4886 2.2230\n", "residual_sugar * total_sulfur_dioxide 2.8665 12.6000 0.2275 0.8200 -21.8354 27.5684\n", "residual_sugar * density 1429.2194 705.7754 2.0250 0.0429 45.5757 2812.8631\n", "residual_sugar * pH -287.7160 203.0709 -1.4168 0.1566 -685.8281 110.3961\n", "residual_sugar * sulphates -189.7045 168.1916 -1.1279 0.2594 -519.4371 140.0282\n", "residual_sugar * alcohol -18.0129 19.5540 -0.9212 0.3570 -56.3477 20.3219\n", "chlorides^2 10142.2002 7074.9790 1.4335 0.1518 -3728.0049 24012.4054\n", "chlorides * free_sulfur_dioxide -201.9536 331.1123 -0.6099 0.5419 -851.0857 447.1785\n", "chlorides * total_sulfur_dioxide 1197.2260 682.4747 1.7542 0.0795 -140.7375 2535.1896\n", "chlorides * density 20265.9335 23064.5613 0.8787 0.3796 -24951.1898 65483.0568\n", "chlorides * pH -12226.8108 6517.5063 -1.8760 0.0607 -25004.1137 550.4920\n", "chlorides * sulphates -3613.4768 4978.7999 -0.7258 0.4680 -13374.2092 6147.2555\n", "chlorides * alcohol 2196.0436 843.4185 2.6037 0.0092 542.5563 3849.5310\n", "free_sulfur_dioxide^2 -909.6884 1867.1743 -0.4872 0.6261 -4570.2067 2750.8299\n", "free_sulfur_dioxide * total_sulfur_dioxide -24.9437 7.6926 -3.2426 0.0012 -40.0247 -9.8627\n", "free_sulfur_dioxide * density 549.2329 298.2374 1.8416 0.0656 -35.4492 1133.9150\n", "free_sulfur_dioxide * pH -13.8185 79.9507 -0.1728 0.8628 -170.5585 142.9216\n", "free_sulfur_dioxide * sulphates 59.9612 69.6414 0.8610 0.3893 -76.5679 196.4903\n", "free_sulfur_dioxide * alcohol -60.9958 9.6344 -6.3310 0.0000 -79.8838 -42.1079\n", "total_sulfur_dioxide^2 -915.2771 1867.4927 -0.4901 0.6241 -4576.4197 2745.8655\n", "total_sulfur_dioxide * density 856.3795 643.6237 1.3306 0.1834 -405.4183 2118.1773\n", "total_sulfur_dioxide * pH 172.8750 176.1471 0.9814 0.3264 -172.4541 518.2041\n", "total_sulfur_dioxide * sulphates 233.8040 154.5694 1.5126 0.1304 -69.2230 536.8310\n", "total_sulfur_dioxide * alcohol -208.7613 22.8714 -9.1276 0.0000 -253.5998 -163.9229\n", "density^2 2377.5655 18554.7387 0.1281 0.8980 -33998.2360 38753.3671\n", "density * pH 11008.3563 10291.6307 1.0696 0.2848 -9167.9622 31184.6748\n", "density * sulphates -6367.0944 6065.2219 -1.0498 0.2939 -18257.7123 5523.5234\n", "density * alcohol -2160.1700 944.6336 -2.2868 0.0223 -4012.0853 -308.2546\n", "pH^2 -4031.2392 1137.0986 -3.5452 0.0004 -6260.4741 -1802.0043\n", "pH * sulphates 1604.8538 1639.6859 0.9788 0.3277 -1609.6830 4819.3905\n", "pH * alcohol 766.9861 249.3365 3.0761 0.0021 278.1721 1255.8001\n", "sulphates^2 -2563.1016 2045.6050 -1.2530 0.2103 -6573.4261 1447.2228\n", "sulphates * alcohol 596.8511 210.7614 2.8319 0.0046 183.6620 1010.0402\n", "alcohol^2 -1033.7653 1865.5885 -0.5541 0.5795 -4691.1748 2623.6442\n", "-------------------------------------------------------------------------------------------------------\n", "Omnibus: 74.655 Durbin-Watson: 2.012 \n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 119.025 \n", "Skew: 0.141 Prob(JB): 0.000 \n", "Kurtosis: 3.712 Condition No.: 4390541685726112\n", "=======================================================================================================\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "[2] The smallest eigenvalue is 7.09e-28. This might indicate that there are strong\n", "multicollinearity problems or that the design matrix is singular.\n", "\"\"\"" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results.summary2(xname=pft.columns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On ne garde que celles dont la [p-value](https://sdpython.github.io/doc/mlstatpy/dev/c_metric/pvalues.html) est inférieur à 0.05." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "x2 7.630773e-06\n", "x6 1.288486e-04\n", "x8 2.462679e-02\n", "x11 5.327338e-27\n", "x22 8.970958e-05\n", "x25 2.905133e-02\n", "x27 7.128466e-03\n", "x28 2.016020e-04\n", "x31 2.289041e-02\n", "x32 8.519337e-04\n", "x39 1.351547e-02\n", "x41 2.721775e-03\n", "x43 9.278805e-03\n", "x46 4.291916e-02\n", "x56 9.249665e-03\n", "x58 1.192719e-03\n", "x62 2.657965e-10\n", "x67 1.010018e-19\n", "x71 2.225201e-02\n", "x72 3.960632e-04\n", "x74 2.109039e-03\n", "x76 4.646809e-03\n", "dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pval = results.pvalues.copy()\n", "pval[pval <= 0.05]" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "volatile_acidity 7.630773e-06\n", "free_sulfur_dioxide 1.288486e-04\n", "density 2.462679e-02\n", "alcohol 5.327338e-27\n", "fixed_acidity * alcohol 8.970958e-05\n", "volatile_acidity * residual_sugar 2.905133e-02\n", "volatile_acidity * free_sulfur_dioxide 7.128466e-03\n", "volatile_acidity * total_sulfur_dioxide 2.016020e-04\n", "volatile_acidity * sulphates 2.289041e-02\n", "volatile_acidity * alcohol 8.519337e-04\n", "citric_acid * pH 1.351547e-02\n", "citric_acid * alcohol 2.721775e-03\n", "residual_sugar * chlorides 9.278805e-03\n", "residual_sugar * density 4.291916e-02\n", "chlorides * alcohol 9.249665e-03\n", "free_sulfur_dioxide * total_sulfur_dioxide 1.192719e-03\n", "free_sulfur_dioxide * alcohol 2.657965e-10\n", "total_sulfur_dioxide * alcohol 1.010018e-19\n", "density * alcohol 2.225201e-02\n", "pH^2 3.960632e-04\n", "pH * alcohol 2.109039e-03\n", "sulphates * alcohol 4.646809e-03\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pval.index = pft.columns\n", "pval[pval <= 0.05]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Le modèle fonctionne mieux mais il est plus compliqué de savoir si la contribution de l'alcool est corrélée positivement avec la qualité car l'alcool apparaît dans plus d'une variable." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }