Note
Go to the end to download the full example code.
Compares dot implementations (numpy, python, blas)¶
numpy has a very fast implementation of the dot product. It is difficult to be better and very easy to be slower. This example looks into a couple of slower implementations.
Compared implementations:
python dot: pydot¶
The first function pydot
uses
python to implement the dot product.
ctxs = [
dict(
va=numpy.random.randn(n).astype(numpy.float64),
vb=numpy.random.randn(n).astype(numpy.float64),
pydot=pydot,
x_name=n,
)
for n in range(10, 1000, 100)
]
res_pydot = list(measure_time_dim("pydot(va, vb)", ctxs, verbose=1))
pprint.pprint(res_pydot[:2])
0%| | 0/10 [00:00<?, ?it/s]
50%|█████ | 5/10 [00:00<00:00, 44.32it/s]
100%|██████████| 10/10 [00:00<00:00, 20.18it/s]
100%|██████████| 10/10 [00:00<00:00, 21.96it/s]
[{'average': np.float64(7.1148180140880884e-06),
'context_size': 232,
'deviation': np.float64(8.319901218604011e-07),
'max_exec': np.float64(9.287340071750805e-06),
'min_exec': np.float64(6.240200018510223e-06),
'number': 50,
'repeat': 10,
'ttime': np.float64(7.114818014088088e-05),
'warmup_time': 3.305499558337033e-05,
'x_name': 10},
{'average': np.float64(3.3648765980615274e-05),
'context_size': 232,
'deviation': np.float64(8.809493797155993e-06),
'max_exec': np.float64(4.549306002445519e-05),
'min_exec': np.float64(2.0034240005770698e-05),
'number': 50,
'repeat': 10,
'ttime': np.float64(0.00033648765980615275),
'warmup_time': 4.712300142273307e-05,
'x_name': 110}]
numpy dot¶
ctxs = [
dict(
va=numpy.random.randn(n).astype(numpy.float64),
vb=numpy.random.randn(n).astype(numpy.float64),
dot=numpy.dot,
x_name=n,
)
for n in range(10, 50000, 100)
]
res_dot = list(measure_time_dim("dot(va, vb)", ctxs, verbose=1))
pprint.pprint(res_dot[:2])
0%| | 0/500 [00:00<?, ?it/s]
12%|█▏ | 62/500 [00:00<00:00, 616.51it/s]
25%|██▍ | 124/500 [00:03<00:12, 29.94it/s]
30%|███ | 151/500 [00:04<00:10, 33.35it/s]
34%|███▎ | 168/500 [00:04<00:09, 35.99it/s]
36%|███▋ | 182/500 [00:04<00:07, 41.65it/s]
39%|███▉ | 195/500 [00:04<00:06, 47.79it/s]
42%|████▏ | 208/500 [00:04<00:05, 53.81it/s]
44%|████▍ | 220/500 [00:04<00:04, 56.75it/s]
47%|████▋ | 236/500 [00:05<00:03, 68.39it/s]
50%|████▉ | 248/500 [00:05<00:03, 72.74it/s]
52%|█████▏ | 261/500 [00:05<00:02, 82.26it/s]
55%|█████▍ | 273/500 [00:05<00:03, 58.20it/s]
56%|█████▋ | 282/500 [00:05<00:03, 61.99it/s]
59%|█████▉ | 295/500 [00:05<00:02, 73.85it/s]
61%|██████ | 306/500 [00:05<00:02, 81.07it/s]
63%|██████▎ | 317/500 [00:06<00:02, 79.26it/s]
65%|██████▌ | 327/500 [00:06<00:02, 75.15it/s]
68%|██████▊ | 340/500 [00:06<00:01, 87.34it/s]
71%|███████ | 355/500 [00:06<00:01, 101.82it/s]
73%|███████▎ | 367/500 [00:06<00:01, 104.61it/s]
76%|███████▌ | 379/500 [00:06<00:01, 102.72it/s]
78%|███████▊ | 391/500 [00:06<00:01, 106.59it/s]
81%|████████ | 403/500 [00:06<00:01, 94.09it/s]
83%|████████▎ | 413/500 [00:07<00:00, 95.14it/s]
85%|████████▍ | 423/500 [00:07<00:01, 70.52it/s]
86%|████████▋ | 432/500 [00:07<00:00, 68.36it/s]
89%|████████▊ | 443/500 [00:07<00:00, 76.14it/s]
91%|█████████ | 453/500 [00:07<00:00, 80.52it/s]
93%|█████████▎| 466/500 [00:07<00:00, 92.21it/s]
95%|█████████▌| 477/500 [00:07<00:00, 94.98it/s]
97%|█████████▋| 487/500 [00:08<00:00, 93.61it/s]
100%|█████████▉| 499/500 [00:08<00:00, 98.81it/s]
100%|██████████| 500/500 [00:08<00:00, 61.54it/s]
[{'average': np.float64(1.3419879833236337e-06),
'context_size': 232,
'deviation': np.float64(4.8575769902713836e-08),
'max_exec': np.float64(1.3776400010101497e-06),
'min_exec': np.float64(1.2179798795841635e-06),
'number': 50,
'repeat': 10,
'ttime': np.float64(1.3419879833236337e-05),
'warmup_time': 0.00022104499657871202,
'x_name': 10},
{'average': np.float64(2.359903999604284e-06),
'context_size': 232,
'deviation': np.float64(1.363464003323548e-06),
'max_exec': np.float64(5.30657998751849e-06),
'min_exec': np.float64(1.1841800005640834e-06),
'number': 50,
'repeat': 10,
'ttime': np.float64(2.3599039996042843e-05),
'warmup_time': 1.2753000191878527e-05,
'x_name': 110}]
blas dot¶
numpy implementation uses BLAS. Let’s make a direct call to it.
0%| | 0/500 [00:00<?, ?it/s]
7%|▋ | 34/500 [00:00<00:01, 333.77it/s]
18%|█▊ | 91/500 [00:00<00:00, 467.49it/s]
28%|██▊ | 138/500 [00:01<00:03, 100.05it/s]
33%|███▎ | 166/500 [00:01<00:03, 109.31it/s]
38%|███▊ | 189/500 [00:01<00:02, 118.62it/s]
42%|████▏ | 210/500 [00:01<00:02, 111.76it/s]
45%|████▌ | 227/500 [00:01<00:02, 111.61it/s]
48%|████▊ | 242/500 [00:01<00:02, 117.94it/s]
51%|█████▏ | 257/500 [00:02<00:02, 89.46it/s]
55%|█████▍ | 273/500 [00:02<00:02, 99.85it/s]
57%|█████▋ | 286/500 [00:02<00:02, 96.53it/s]
60%|█████▉ | 299/500 [00:02<00:01, 102.14it/s]
62%|██████▏ | 311/500 [00:02<00:01, 99.65it/s]
65%|██████▌ | 326/500 [00:02<00:01, 110.84it/s]
68%|██████▊ | 341/500 [00:02<00:01, 118.64it/s]
71%|███████ | 354/500 [00:03<00:01, 112.08it/s]
73%|███████▎ | 366/500 [00:03<00:01, 112.95it/s]
76%|███████▌ | 379/500 [00:03<00:01, 117.20it/s]
79%|███████▉ | 394/500 [00:03<00:00, 125.87it/s]
81%|████████▏ | 407/500 [00:03<00:00, 122.20it/s]
84%|████████▍ | 421/500 [00:03<00:00, 126.20it/s]
87%|████████▋ | 436/500 [00:03<00:00, 132.62it/s]
90%|█████████ | 450/500 [00:03<00:00, 124.92it/s]
93%|█████████▎| 463/500 [00:03<00:00, 123.14it/s]
95%|█████████▌| 476/500 [00:04<00:00, 122.36it/s]
100%|█████████▉| 498/500 [00:04<00:00, 148.05it/s]
100%|██████████| 500/500 [00:04<00:00, 120.44it/s]
[{'average': np.float64(2.9745399951934814e-06),
'context_size': 360,
'deviation': np.float64(3.619303911832157e-07),
'max_exec': np.float64(3.878860006807372e-06),
'min_exec': np.float64(2.701079938560724e-06),
'number': 50,
'repeat': 10,
'ttime': np.float64(2.9745399951934815e-05),
'warmup_time': 5.827599670737982e-05,
'x_name': 10},
{'average': np.float64(3.612484011682682e-06),
'context_size': 360,
'deviation': np.float64(3.3887765024843066e-07),
'max_exec': np.float64(4.377120058052242e-06),
'min_exec': np.float64(3.176620084559545e-06),
'number': 50,
'repeat': 10,
'ttime': np.float64(3.612484011682682e-05),
'warmup_time': 1.2769000022672117e-05,
'x_name': 110}]
Let’s display the results¶
df1 = DataFrame(res_pydot)
df1["fct"] = "pydot"
df2 = DataFrame(res_dot)
df2["fct"] = "numpy.dot"
df3 = DataFrame(res_ddot)
df3["fct"] = "ddot"
cc = concat([df1, df2, df3])
cc["N"] = cc["x_name"]
fig, ax = plt.subplots(1, 2, figsize=(10, 4))
cc[cc.N <= 1100].pivot(index="N", columns="fct", values="average").plot(
logy=True, logx=True, ax=ax[0]
)
cc[cc.fct != "pydot"].pivot(index="N", columns="fct", values="average").plot(
logy=True, logx=True, ax=ax[1]
)
ax[0].set_title("Comparison of dot implementations")
ax[1].set_title("Comparison of dot implementations\nwithout python")
Text(0.5, 1.0, 'Comparison of dot implementations\nwithout python')
The results depends on the machine, its number of cores, the compilation settings of numpy or this module.
Total running time of the script: (0 minutes 14.699 seconds)