Numpy et tableau de contingence#

Un exercice classique : écrire le calcul du $\chi_2$ d’un tableau de contingence sans écrire explicitement une boucle. numpy s’en chargera. A suivre jusqu’à ce que vous n’en ayez plus besoin.

$M = \sum_{ij} m_{ij}$
$\forall i, \; m_{i \bullet} = \sum_j m_{ij}$
$\forall j, \; m_{\bullet j} = \sum_i m_{ij}$
$\forall i,j \; n_{ij} = \frac{m_{i \bullet} m_{\bullet j}}{N}$

Avec ces notations :

$\chi_2(M) = \sum_{ij} \frac{ (m_{ij} - n_{ij})^2}{n_{ij}}$

[1]:

import numpy as np

A = np.array([[0, 1, 2, 3], [4, 5, 6, 7]], dtype=float)
A

[1]:

array([[0., 1., 2., 3.],
       [4., 5., 6., 7.]])

[2]:

A.sum(axis=1, keepdims=1)

[2]:

array([[ 6.],
       [22.]])

[3]:

A + A.sum(axis=1, keepdims=1)

[3]:

array([[ 6.,  7.,  8.,  9.],
       [26., 27., 28., 29.]])

[4]:

A.sum(axis=0, keepdims=1)

[4]:

array([[ 4.,  6.,  8., 10.]])

[5]:

B = np.zeros(A.shape, dtype=A.dtype)
N2 = A.sum() ** 2
L = A.sum(axis=1)
C = A.sum(axis=0)
for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        B[i, j] = A[i, j] - L[i] * C[j] / N2
B

[5]:

array([[-0.03061224,  0.95408163,  1.93877551,  2.92346939],
       [ 3.8877551 ,  4.83163265,  5.7755102 ,  6.71938776]])

[6]:

A - A.sum(axis=1, keepdims=1) * A.sum(axis=0, keepdims=1) / A.sum() ** 2

[6]:

array([[-0.03061224,  0.95408163,  1.93877551,  2.92346939],
       [ 3.8877551 ,  4.83163265,  5.7755102 ,  6.71938776]])

[7]:

L = A.sum(axis=1, keepdims=1)
C = A.sum(axis=0, keepdims=1)
L.shape, C.shape

[7]:

((2, 1), (1, 4))

[8]:

L * C

[8]:

array([[ 24.,  36.,  48.,  60.],
       [ 88., 132., 176., 220.]])

[9]:

C * L

[9]:

array([[ 24.,  36.,  48.,  60.],
       [ 88., 132., 176., 220.]])

[10]:

L @ C

[10]:

array([[ 24.,  36.,  48.,  60.],
       [ 88., 132., 176., 220.]])

Notebook on github