Random order for a sum

Parallelization usually means a summation is done with a random order. That may lead to different values if the computation is made many times even though the result should be the same. This example compares summation of random permutation of the same array of values.

Setup

from tqdm import tqdm
import numpy as np
import pandas

unique_values = np.array(
    [2.1102535724639893, 0.5986238718032837, -0.49545818567276], dtype=np.float32
)
random_index = np.random.randint(0, 3, 2000)
assert set(random_index) == {0, 1, 2}
values = unique_values[random_index]

s0 = values.sum()
s1 = np.array(0, dtype=np.float32)
for n in values:
    s1 += n

delta = s1 - s0
print(f"reduced sum={s0}, iterative sum={s1}, delta={delta}")
reduced sum=1440.249755859375, iterative sum=1440.235107421875, delta=-0.0146484375

There are discrepancies.

Random order

Let’s go further and check the sum of random permutation of the same set. Let’s compare the result with the same sum done with a higher precision (double).

def check_orders(values, n=200, bias=0):
    double_sums = []
    sums = []
    reduced_sums = []
    reduced_dsums = []
    for i in tqdm(range(n)):
        permuted_values = np.random.permutation(values)
        s = np.array(bias, dtype=np.float32)
        sd = np.array(bias, dtype=np.float64)
        for n in permuted_values:
            s += n
            sd += n
        sums.append(s)
        double_sums.append(sd)
        reduced_sums.append(permuted_values.sum() + bias)
        reduced_dsums.append(permuted_values.astype(np.float64).sum() + bias)

    data = []
    mi, ma = min(sums), max(sums)
    data.append(dict(name="seq_fp32", min=mi, max=ma, bias=bias))
    print(f"min={mi} max={ma} delta={ma-mi}")
    mi, ma = min(double_sums), max(double_sums)
    data.append(dict(name="seq_fp64", min=mi, max=ma, bias=bias))
    print(f"min={mi} max={ma} delta={ma-mi} (double)")
    mi, ma = min(reduced_sums), max(reduced_sums)
    data.append(dict(name="red_f32", min=mi, max=ma, bias=bias))
    print(f"min={mi} max={ma} delta={ma-mi} (reduced)")
    mi, ma = min(reduced_dsums), max(reduced_dsums)
    data.append(dict(name="red_f64", min=mi, max=ma, bias=bias))
    print(f"min={mi} max={ma} delta={ma-mi} (reduced)")
    return data


data1 = check_orders(values)
  0%|          | 0/200 [00:00<?, ?it/s]
  4%|▍         | 9/200 [00:00<00:02, 82.91it/s]
 10%|▉         | 19/200 [00:00<00:02, 86.94it/s]
 14%|█▍        | 28/200 [00:00<00:02, 65.00it/s]
 18%|█▊        | 35/200 [00:00<00:02, 60.78it/s]
 21%|██        | 42/200 [00:00<00:02, 60.55it/s]
 26%|██▌       | 52/200 [00:00<00:02, 71.63it/s]
 30%|███       | 60/200 [00:00<00:01, 73.24it/s]
 34%|███▍      | 68/200 [00:01<00:02, 60.65it/s]
 40%|████      | 81/200 [00:01<00:01, 77.86it/s]
 45%|████▌     | 90/200 [00:01<00:01, 80.86it/s]
 50%|█████     | 100/200 [00:01<00:01, 83.64it/s]
 55%|█████▍    | 109/200 [00:01<00:01, 79.93it/s]
 60%|██████    | 121/200 [00:01<00:00, 89.91it/s]
 66%|██████▌   | 131/200 [00:01<00:00, 75.73it/s]
 72%|███████▎  | 145/200 [00:01<00:00, 90.65it/s]
 79%|███████▉  | 158/200 [00:01<00:00, 100.06it/s]
 86%|████████▌ | 171/200 [00:02<00:00, 107.30it/s]
 92%|█████████▏| 183/200 [00:02<00:00, 107.24it/s]
 98%|█████████▊| 195/200 [00:02<00:00, 107.48it/s]
100%|██████████| 200/200 [00:02<00:00, 85.56it/s]
min=1440.234130859375 max=1440.2376708984375 delta=0.0035400390625
min=1440.2498691082 max=1440.2498691082 delta=0.0 (double)
min=1440.249755859375 max=1440.25 delta=0.000244140625 (reduced)
min=1440.2498691082 max=1440.2498691082 delta=0.0 (reduced)

This example clearly shows the order has an impact. It is usually unavoidable but it could reduced if the sum it close to zero. In that case, the sum would be of the same order of magnitude of the add values.

Removing the average

Computing the average of the values requires to compute the sum. However if we have an estimator of this average, not necessarily the exact value, we would help the summation to keep the same order of magnitude than the values it adds.

mean = unique_values.mean()
values -= mean
data2 = check_orders(values, bias=len(values) * mean)
  0%|          | 0/200 [00:00<?, ?it/s]
  4%|▍         | 9/200 [00:00<00:02, 84.55it/s]
 10%|█         | 21/200 [00:00<00:01, 104.03it/s]
 18%|█▊        | 36/200 [00:00<00:01, 122.78it/s]
 26%|██▌       | 52/200 [00:00<00:01, 136.73it/s]
 33%|███▎      | 66/200 [00:00<00:01, 125.16it/s]
 40%|███▉      | 79/200 [00:00<00:01, 111.86it/s]
 46%|████▋     | 93/200 [00:00<00:00, 118.43it/s]
 54%|█████▎    | 107/200 [00:00<00:00, 124.51it/s]
 60%|██████    | 120/200 [00:01<00:00, 120.20it/s]
 66%|██████▋   | 133/200 [00:01<00:00, 108.45it/s]
 74%|███████▎  | 147/200 [00:01<00:00, 115.72it/s]
 80%|███████▉  | 159/200 [00:01<00:00, 108.38it/s]
 86%|████████▌ | 171/200 [00:01<00:00, 106.13it/s]
 91%|█████████ | 182/200 [00:01<00:00, 89.58it/s]
 96%|█████████▌| 192/200 [00:01<00:00, 60.65it/s]
100%|██████████| 200/200 [00:02<00:00, 51.50it/s]
100%|██████████| 200/200 [00:02<00:00, 89.79it/s]
min=1440.2513427734375 max=1440.2513427734375 delta=0.0
min=1440.2498311400414 max=1440.2498311400414 delta=0.0 (double)
min=1440.2498388290405 max=1440.2498655319214 delta=2.6702880859375e-05 (reduced)
min=1440.2498311400414 max=1440.2498311400414 delta=0.0 (reduced)

The differences are clearly lower.

df = pandas.DataFrame(data1 + data2)
df["delta"] = df["max"] - df["min"]
piv = df.pivot(index="name", columns="bias", values="delta")
print(piv)
bias     0.000000    1475.612998
name
red_f32     0.000244    0.000027
red_f64          0.0         0.0
seq_fp32     0.00354         0.0
seq_fp64         0.0         0.0

Plots.

ax = piv.plot.barh()
ax.set_title("max(sum) - min(sum) over random orders")
ax.get_figure().tight_layout()
ax.get_figure().savefig("plot_check_random_order.png")
max(sum) - min(sum) over random orders

Total running time of the script: (0 minutes 5.287 seconds)

Gallery generated by Sphinx-Gallery