Factorisation et matrice et ACP#

Un exemple pour montrer l’équivalence entre l’ACP et une factorisation de matrice.

[2]:
%matplotlib inline

Factorisation de matrices#

[3]:
def erreur_mf(M, W, H):
    d = M - W @ H
    a = d.ravel()
    e = a @ a.T
    e**0.5 / (M.shape[0] * M.shape[1])
    return e

On crée un nuage de points avec que des coordonnées positives pour satisfaire les hypothèses de la factorisation de matrices.

[4]:
from numpy.random import rand

M = rand(2, 20)
M[1, :] += 3 * M[0, :]
M
[4]:
array([[ 0.81960047,  0.63887134,  0.74019269,  0.96110175,  0.0685406 ,
         0.11103301,  0.06033529,  0.67913157,  0.10460611,  0.98860048,
         0.50497448,  0.26893866,  0.73143267,  0.32617974,  0.1332449 ,
         0.83328515,  0.3775355 ,  0.69163261,  0.53095348,  0.15601268],
       [ 2.48031078,  2.2279066 ,  2.85929872,  3.27833973,  0.27323095,
         0.53806662,  0.48019992,  2.09428487,  0.40521666,  3.94539474,
         2.36639105,  1.66857684,  3.14027534,  1.94032092,  1.22602705,
         3.09679803,  1.696636  ,  2.69144798,  1.84350664,  1.16862532]])
[5]:
from sklearn.decomposition import NMF

mf = NMF(1)
W = mf.fit_transform(M)
H = mf.components_
erreur_mf(M, W, H)
[5]:
0.19729615330190822
[6]:
wh = W @ H
[7]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1)
ax.plot(M[0, :], M[1, :], "ob")
ax.plot(wh[0, :], wh[1, :], "or")
ax.set_xlim([0, 1])
ax.set_ylim([0, 4])
[7]:
(0, 4)
../../_images/notebooks_ml_mf_acp_8_1.png

ACP : analyse en composantes principales#

[8]:
from sklearn.decomposition import PCA

pca = PCA(n_components=1)
pca.fit(M.T)
[8]:
PCA(copy=True, iterated_power='auto', n_components=1, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)
[9]:
projected_points = pca.inverse_transform(pca.transform(M.T))
pj = projected_points.T
[10]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1)
ax.plot(M[0, :], M[1, :], "ob")
ax.plot(wh[0, :], wh[1, :], "or")
ax.plot(pj[0, :], pj[1, :], "og")
ax.set_xlim([0, 1])
ax.set_ylim([0, 4])
[10]:
(0, 4)
../../_images/notebooks_ml_mf_acp_12_1.png

Les résultats ne sont pas exactement identiques car l”ACP centre le nuage de points par défaut. On utilise celui de statsmodels pour éviter cela.

[11]:
from statsmodels.multivariate.pca import PCA

pca = PCA(M.T, ncomp=1, standardize=False, demean=False, normalize=False)
pca
[11]:
Principal Component Analysis(nobs: 20, nvar: 2, transformation: None, normalization: False, number of components: 1, SVD, id: 0x1c01a2861d0)
[12]:
pj2 = pca.projection.T
[13]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1)
ax.plot(M[0, :], M[1, :], "ob")
# ax.plot(wh[0,:], wh[1,:], "or")
ax.plot(pj[0, :], pj[1, :], "og")
ax.plot(pj2[0, :], pj2[1, :], "oy")
ax.set_xlim([0, 1])
ax.set_ylim([0, 4])
[13]:
(0, 4)
../../_images/notebooks_ml_mf_acp_16_1.png

On retrouve exactement les mêmes résultats.

[14]: