Note
Go to the end to download the full example code.
Compares dot implementations (numpy, python, blas)¶
numpy has a very fast implementation of the dot product. It is difficult to be better and very easy to be slower. This example looks into a couple of slower implementations.
Compared implementations:
python dot: pydot¶
The first function pydot
uses
python to implement the dot product.
ctxs = [
dict(
va=numpy.random.randn(n).astype(numpy.float64),
vb=numpy.random.randn(n).astype(numpy.float64),
pydot=pydot,
x_name=n,
)
for n in range(10, 1000, 100)
]
res_pydot = list(measure_time_dim("pydot(va, vb)", ctxs, verbose=1))
pprint.pprint(res_pydot[:2])
0%| | 0/10 [00:00<?, ?it/s]
60%|██████ | 6/10 [00:00<00:00, 46.24it/s]
100%|██████████| 10/10 [00:00<00:00, 26.21it/s]
[{'average': np.float64(2.628452001317783e-06),
'context_size': 184,
'deviation': np.float64(1.30425555254186e-07),
'max_exec': np.float64(3.0103200015219047e-06),
'min_exec': np.float64(2.5332199993499673e-06),
'number': 50,
'repeat': 10,
'ttime': np.float64(2.6284520013177827e-05),
'warmup_time': 2.0225000071150134e-05,
'x_name': 10},
{'average': np.float64(1.8355095999595507e-05),
'context_size': 184,
'deviation': np.float64(4.6556812484716014e-07),
'max_exec': np.float64(1.9151500000589296e-05),
'min_exec': np.float64(1.771008000105212e-05),
'number': 50,
'repeat': 10,
'ttime': np.float64(0.00018355095999595506),
'warmup_time': 2.355199990233814e-05,
'x_name': 110}]
numpy dot¶
ctxs = [
dict(
va=numpy.random.randn(n).astype(numpy.float64),
vb=numpy.random.randn(n).astype(numpy.float64),
dot=numpy.dot,
x_name=n,
)
for n in range(10, 50000, 100)
]
res_dot = list(measure_time_dim("dot(va, vb)", ctxs, verbose=1))
pprint.pprint(res_dot[:2])
0%| | 0/500 [00:00<?, ?it/s]
20%|██ | 101/500 [00:01<00:07, 54.77it/s]
21%|██▏ | 107/500 [00:09<00:45, 8.64it/s]
22%|██▏ | 110/500 [00:11<00:54, 7.14it/s]
22%|██▏ | 112/500 [00:12<01:04, 6.03it/s]
23%|██▎ | 113/500 [00:13<01:09, 5.57it/s]
23%|██▎ | 114/500 [00:13<01:18, 4.91it/s]
23%|██▎ | 115/500 [00:14<01:37, 3.95it/s]
23%|██▎ | 116/500 [00:15<01:50, 3.47it/s]
23%|██▎ | 117/500 [00:16<02:05, 3.05it/s]
24%|██▎ | 118/500 [00:16<02:20, 2.72it/s]
24%|██▍ | 119/500 [00:17<02:32, 2.51it/s]
24%|██▍ | 120/500 [00:18<02:47, 2.28it/s]
24%|██▍ | 121/500 [00:18<03:07, 2.03it/s]
24%|██▍ | 122/500 [00:19<03:15, 1.94it/s]
25%|██▌ | 125/500 [00:19<02:14, 2.79it/s]
25%|██▌ | 126/500 [00:20<02:30, 2.48it/s]
25%|██▌ | 127/500 [00:20<02:32, 2.44it/s]
26%|██▌ | 128/500 [00:21<02:47, 2.22it/s]
26%|██▌ | 129/500 [00:21<02:47, 2.21it/s]
26%|██▌ | 130/500 [00:22<02:53, 2.13it/s]
26%|██▌ | 131/500 [00:22<02:56, 2.09it/s]
26%|██▋ | 132/500 [00:23<03:03, 2.00it/s]
27%|██▋ | 133/500 [00:24<03:16, 1.87it/s]
27%|██▋ | 134/500 [00:24<03:28, 1.76it/s]
27%|██▋ | 135/500 [00:25<03:26, 1.77it/s]
27%|██▋ | 136/500 [00:25<03:27, 1.75it/s]
27%|██▋ | 137/500 [00:26<03:09, 1.91it/s]
28%|██▊ | 138/500 [00:27<03:25, 1.76it/s]
28%|██▊ | 139/500 [00:27<03:34, 1.68it/s]
28%|██▊ | 140/500 [00:28<03:43, 1.61it/s]
28%|██▊ | 141/500 [00:29<04:18, 1.39it/s]
28%|██▊ | 142/500 [00:30<04:33, 1.31it/s]
29%|██▊ | 143/500 [00:30<04:27, 1.33it/s]
29%|██▉ | 144/500 [00:31<04:09, 1.43it/s]
29%|██▉ | 145/500 [00:32<03:57, 1.49it/s]
29%|██▉ | 146/500 [00:32<03:38, 1.62it/s]
29%|██▉ | 147/500 [00:33<03:31, 1.67it/s]
30%|██▉ | 148/500 [00:33<03:24, 1.72it/s]
30%|██▉ | 149/500 [00:33<02:41, 2.17it/s]
30%|███ | 150/500 [00:34<02:46, 2.11it/s]
30%|███ | 151/500 [00:34<02:52, 2.03it/s]
30%|███ | 152/500 [00:35<03:03, 1.90it/s]
31%|███ | 153/500 [00:36<04:06, 1.40it/s]
31%|███ | 154/500 [00:37<04:20, 1.33it/s]
31%|███ | 155/500 [00:38<04:04, 1.41it/s]
31%|███ | 156/500 [00:38<03:44, 1.53it/s]
31%|███▏ | 157/500 [00:39<03:29, 1.64it/s]
32%|███▏ | 158/500 [00:39<03:23, 1.68it/s]
32%|███▏ | 159/500 [00:40<03:28, 1.63it/s]
32%|███▏ | 160/500 [00:40<03:20, 1.69it/s]
32%|███▏ | 161/500 [00:41<03:17, 1.71it/s]
32%|███▏ | 162/500 [00:42<03:19, 1.69it/s]
33%|███▎ | 163/500 [00:42<03:27, 1.62it/s]
33%|███▎ | 164/500 [00:43<03:23, 1.65it/s]
33%|███▎ | 165/500 [00:43<03:23, 1.65it/s]
33%|███▎ | 166/500 [00:44<03:41, 1.51it/s]
33%|███▎ | 167/500 [00:46<05:06, 1.09it/s]
34%|███▎ | 168/500 [00:48<07:19, 1.32s/it]
34%|███▍ | 169/500 [00:49<05:59, 1.09s/it]
34%|███▍ | 170/500 [00:49<05:12, 1.06it/s]
34%|███▍ | 171/500 [00:51<06:15, 1.14s/it]
34%|███▍ | 172/500 [00:53<07:39, 1.40s/it]
35%|███▍ | 173/500 [00:54<06:49, 1.25s/it]
35%|███▍ | 174/500 [00:54<05:47, 1.07s/it]
35%|███▌ | 175/500 [00:55<05:16, 1.03it/s]
35%|███▌ | 176/500 [00:56<05:05, 1.06it/s]
35%|███▌ | 177/500 [00:57<05:00, 1.07it/s]
36%|███▌ | 178/500 [00:57<04:30, 1.19it/s]
36%|███▌ | 179/500 [00:59<05:29, 1.03s/it]
36%|███▌ | 180/500 [01:00<05:00, 1.07it/s]
36%|███▌ | 181/500 [01:00<04:38, 1.14it/s]
36%|███▋ | 182/500 [01:01<04:19, 1.23it/s]
37%|███▋ | 183/500 [01:02<04:18, 1.23it/s]
37%|███▋ | 184/500 [01:03<04:11, 1.26it/s]
37%|███▋ | 185/500 [01:03<03:55, 1.34it/s]
37%|███▋ | 186/500 [01:04<03:55, 1.34it/s]
37%|███▋ | 187/500 [01:05<04:44, 1.10it/s]
38%|███▊ | 188/500 [01:06<04:31, 1.15it/s]
38%|███▊ | 189/500 [01:07<03:59, 1.30it/s]
38%|███▊ | 190/500 [01:08<04:18, 1.20it/s]
38%|███▊ | 191/500 [01:08<03:54, 1.32it/s]
38%|███▊ | 192/500 [01:09<03:41, 1.39it/s]
39%|███▊ | 193/500 [01:09<03:05, 1.65it/s]
40%|███▉ | 199/500 [01:09<01:00, 4.95it/s]
41%|████ | 206/500 [01:10<00:40, 7.35it/s]
41%|████▏ | 207/500 [01:10<00:46, 6.36it/s]
45%|████▍ | 223/500 [01:10<00:14, 19.12it/s]
47%|████▋ | 235/500 [01:11<00:08, 29.73it/s]
49%|████▊ | 243/500 [01:12<00:23, 10.75it/s]
53%|█████▎ | 267/500 [01:13<00:10, 22.77it/s]
58%|█████▊ | 288/500 [01:13<00:05, 35.70it/s]
61%|██████ | 303/500 [01:13<00:04, 45.90it/s]
64%|██████▍ | 321/500 [01:13<00:02, 61.05it/s]
67%|██████▋ | 337/500 [01:13<00:02, 74.63it/s]
72%|███████▏ | 358/500 [01:13<00:01, 97.04it/s]
77%|███████▋ | 384/500 [01:13<00:00, 128.09it/s]
81%|████████ | 404/500 [01:13<00:00, 138.57it/s]
85%|████████▍ | 423/500 [01:13<00:00, 137.27it/s]
88%|████████▊ | 441/500 [01:14<00:00, 114.59it/s]
91%|█████████ | 456/500 [01:15<00:01, 31.58it/s]
93%|█████████▎| 467/500 [01:15<00:00, 36.58it/s]
96%|█████████▋| 482/500 [01:15<00:00, 46.57it/s]
99%|█████████▉| 494/500 [01:16<00:00, 52.43it/s]
100%|██████████| 500/500 [01:16<00:00, 6.57it/s]
[{'average': np.float64(5.089619994578243e-07),
'context_size': 184,
'deviation': np.float64(1.1037039081255569e-07),
'max_exec': np.float64(8.386800027437858e-07),
'min_exec': np.float64(4.653599989978829e-07),
'number': 50,
'repeat': 10,
'ttime': np.float64(5.089619994578243e-06),
'warmup_time': 1.5185999927780358e-05,
'x_name': 10},
{'average': np.float64(5.16787999913504e-07),
'context_size': 184,
'deviation': np.float64(7.752201159421148e-08),
'max_exec': np.float64(7.030999995549791e-07),
'min_exec': np.float64(4.777399999511545e-07),
'number': 50,
'repeat': 10,
'ttime': np.float64(5.16787999913504e-06),
'warmup_time': 3.971000069213915e-06,
'x_name': 110}]
blas dot¶
numpy implementation uses BLAS. Let’s make a direct call to it.
0%| | 0/500 [00:00<?, ?it/s]
10%|▉ | 48/500 [00:00<00:00, 476.00it/s]
20%|██ | 101/500 [00:00<00:01, 397.94it/s]
28%|██▊ | 142/500 [00:00<00:01, 322.88it/s]
35%|███▌ | 176/500 [00:00<00:02, 152.02it/s]
40%|████ | 200/500 [00:01<00:02, 126.95it/s]
44%|████▎ | 218/500 [00:01<00:02, 127.99it/s]
47%|████▋ | 235/500 [00:01<00:02, 122.34it/s]
50%|█████ | 250/500 [00:01<00:02, 110.59it/s]
53%|█████▎ | 265/500 [00:01<00:01, 117.89it/s]
56%|█████▌ | 279/500 [00:01<00:01, 113.74it/s]
59%|█████▉ | 297/500 [00:01<00:01, 125.89it/s]
63%|██████▎ | 317/500 [00:02<00:01, 136.60it/s]
66%|██████▋ | 332/500 [00:02<00:01, 116.69it/s]
70%|██████▉ | 348/500 [00:02<00:01, 125.52it/s]
73%|███████▎ | 366/500 [00:02<00:00, 137.18it/s]
76%|███████▌ | 381/500 [00:02<00:00, 137.90it/s]
79%|███████▉ | 396/500 [00:02<00:00, 104.76it/s]
83%|████████▎ | 414/500 [00:02<00:00, 120.57it/s]
86%|████████▌ | 428/500 [00:03<00:00, 103.97it/s]
88%|████████▊ | 440/500 [00:03<00:00, 90.93it/s]
91%|█████████ | 455/500 [00:03<00:00, 100.99it/s]
94%|█████████▍| 472/500 [00:03<00:00, 115.86it/s]
98%|█████████▊| 489/500 [00:03<00:00, 128.31it/s]
100%|██████████| 500/500 [00:03<00:00, 135.14it/s]
[{'average': np.float64(2.5997840007221384e-06),
'context_size': 272,
'deviation': np.float64(2.8652879951316554e-07),
'max_exec': np.float64(3.2305600007020986e-06),
'min_exec': np.float64(2.397080002083385e-06),
'number': 50,
'repeat': 10,
'ttime': np.float64(2.5997840007221386e-05),
'warmup_time': 0.0001528119998965849,
'x_name': 10},
{'average': np.float64(2.3516419992120065e-06),
'context_size': 272,
'deviation': np.float64(3.001319868108143e-08),
'max_exec': np.float64(2.4202399981732015e-06),
'min_exec': np.float64(2.309159999640542e-06),
'number': 50,
'repeat': 10,
'ttime': np.float64(2.3516419992120065e-05),
'warmup_time': 9.661999911259045e-06,
'x_name': 110}]
Let’s display the results¶
df1 = DataFrame(res_pydot)
df1["fct"] = "pydot"
df2 = DataFrame(res_dot)
df2["fct"] = "numpy.dot"
df3 = DataFrame(res_ddot)
df3["fct"] = "ddot"
cc = concat([df1, df2, df3])
cc["N"] = cc["x_name"]
fig, ax = plt.subplots(1, 2, figsize=(10, 4))
cc[cc.N <= 1100].pivot(index="N", columns="fct", values="average").plot(
logy=True, logx=True, ax=ax[0]
)
cc[cc.fct != "pydot"].pivot(index="N", columns="fct", values="average").plot(
logy=True, logx=True, ax=ax[1]
)
ax[0].set_title("Comparison of dot implementations")
ax[1].set_title("Comparison of dot implementations\nwithout python")

Text(0.5, 1.0, 'Comparison of dot implementations\nwithout python')
The results depends on the machine, its number of cores, the compilation settings of numpy or this module.
Total running time of the script: (1 minutes 21.381 seconds)