Numpy et tableau de contingence¶
Un exercice classique : écrire le calcul du \(\chi_2\) d’un tableau de contingence sans écrire explicitement une boucle. numpy s’en chargera. A suivre jusqu’à ce que vous n’en ayez plus besoin.
\(M = \sum_{ij} m_{ij}\)
\(\forall i, \; m_{i \bullet} = \sum_j m_{ij}\)
\(\forall j, \; m_{\bullet j} = \sum_i m_{ij}\)
\(\forall i,j \; n_{ij} = \frac{m_{i \bullet} m_{\bullet j}}{N}\)
Avec ces notations :
\(\chi_2(M) = \sum_{ij} \frac{ (m_{ij} - n_{ij})^2}{n_{ij}}\)
[1]:
import numpy as np
A = np.array([[0, 1, 2, 3], [4, 5, 6, 7]], dtype=float)
A
[1]:
array([[0., 1., 2., 3.],
[4., 5., 6., 7.]])
[2]:
A.sum(axis=1, keepdims=1)
[2]:
array([[ 6.],
[22.]])
[3]:
A + A.sum(axis=1, keepdims=1)
[3]:
array([[ 6., 7., 8., 9.],
[26., 27., 28., 29.]])
[4]:
A.sum(axis=0, keepdims=1)
[4]:
array([[ 4., 6., 8., 10.]])
[5]:
B = np.zeros(A.shape, dtype=A.dtype)
N2 = A.sum() ** 2
L = A.sum(axis=1)
C = A.sum(axis=0)
for i in range(A.shape[0]):
for j in range(A.shape[1]):
B[i, j] = A[i, j] - L[i] * C[j] / N2
B
[5]:
array([[-0.03061224, 0.95408163, 1.93877551, 2.92346939],
[ 3.8877551 , 4.83163265, 5.7755102 , 6.71938776]])
[6]:
A - A.sum(axis=1, keepdims=1) * A.sum(axis=0, keepdims=1) / A.sum() ** 2
[6]:
array([[-0.03061224, 0.95408163, 1.93877551, 2.92346939],
[ 3.8877551 , 4.83163265, 5.7755102 , 6.71938776]])
[7]:
L = A.sum(axis=1, keepdims=1)
C = A.sum(axis=0, keepdims=1)
L.shape, C.shape
[7]:
((2, 1), (1, 4))
[8]:
L * C
[8]:
array([[ 24., 36., 48., 60.],
[ 88., 132., 176., 220.]])
[9]:
C * L
[9]:
array([[ 24., 36., 48., 60.],
[ 88., 132., 176., 220.]])
[10]:
L @ C
[10]:
array([[ 24., 36., 48., 60.],
[ 88., 132., 176., 220.]])