Numpy et tableau de contingence

Un exercice classique : écrire le calcul du \chi_2 d’un tableau de contingence sans écrire explicitement une boucle. numpy s’en chargera. A suivre jusqu’à ce que vous n’en ayez plus besoin.

  • M = \sum_{ij} m_{ij}

  • \forall i, \; m_{i \bullet} = \sum_j m_{ij}

  • \forall j, \; m_{\bullet j} = \sum_i m_{ij}

  • \forall i,j \; n_{ij} = \frac{m_{i \bullet} m_{\bullet j}}{N}

Avec ces notations :

\chi_2(M) = \sum_{ij} \frac{ (m_{ij} - n_{ij})^2}{n_{ij}}

[1]:
import numpy as np

A = np.array([[0, 1, 2, 3], [4, 5, 6, 7]], dtype=float)
A
[1]:
array([[0., 1., 2., 3.],
       [4., 5., 6., 7.]])
[2]:
A.sum(axis=1, keepdims=1)
[2]:
array([[ 6.],
       [22.]])
[3]:
A + A.sum(axis=1, keepdims=1)
[3]:
array([[ 6.,  7.,  8.,  9.],
       [26., 27., 28., 29.]])
[4]:
A.sum(axis=0, keepdims=1)
[4]:
array([[ 4.,  6.,  8., 10.]])
[5]:
B = np.zeros(A.shape, dtype=A.dtype)
N2 = A.sum() ** 2
L = A.sum(axis=1)
C = A.sum(axis=0)
for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        B[i, j] = A[i, j] - L[i] * C[j] / N2
B
[5]:
array([[-0.03061224,  0.95408163,  1.93877551,  2.92346939],
       [ 3.8877551 ,  4.83163265,  5.7755102 ,  6.71938776]])
[6]:
A - A.sum(axis=1, keepdims=1) * A.sum(axis=0, keepdims=1) / A.sum() ** 2
[6]:
array([[-0.03061224,  0.95408163,  1.93877551,  2.92346939],
       [ 3.8877551 ,  4.83163265,  5.7755102 ,  6.71938776]])
[7]:
L = A.sum(axis=1, keepdims=1)
C = A.sum(axis=0, keepdims=1)
L.shape, C.shape
[7]:
((2, 1), (1, 4))
[8]:
L * C
[8]:
array([[ 24.,  36.,  48.,  60.],
       [ 88., 132., 176., 220.]])
[9]:
C * L
[9]:
array([[ 24.,  36.,  48.,  60.],
       [ 88., 132., 176., 220.]])
[10]:
L @ C
[10]:
array([[ 24.,  36.,  48.,  60.],
       [ 88., 132., 176., 220.]])

Notebook on github