Compares dot implementations (numpy, python, blas)

numpy has a very fast implementation of the dot product. It is difficult to be better and very easy to be slower. This example looks into a couple of slower implementations.

Compared implementations:

import pprint
import numpy
import matplotlib.pyplot as plt
from pandas import DataFrame, concat
from teachcompute.validation.cython.dotpy import pydot
from teachcompute.validation.cython.dot_blas_lapack import cblas_ddot
from teachcompute.ext_test_case import measure_time_dim

python dot: pydot

The first function pydot uses python to implement the dot product.

ctxs = [
    dict(
        va=numpy.random.randn(n).astype(numpy.float64),
        vb=numpy.random.randn(n).astype(numpy.float64),
        pydot=pydot,
        x_name=n,
    )
    for n in range(10, 1000, 100)
]

res_pydot = list(measure_time_dim("pydot(va, vb)", ctxs, verbose=1))

pprint.pprint(res_pydot[:2])
  0%|          | 0/10 [00:00<?, ?it/s]
 60%|██████    | 6/10 [00:00<00:00, 46.24it/s]
100%|██████████| 10/10 [00:00<00:00, 26.21it/s]
[{'average': np.float64(2.628452001317783e-06),
  'context_size': 184,
  'deviation': np.float64(1.30425555254186e-07),
  'max_exec': np.float64(3.0103200015219047e-06),
  'min_exec': np.float64(2.5332199993499673e-06),
  'number': 50,
  'repeat': 10,
  'ttime': np.float64(2.6284520013177827e-05),
  'warmup_time': 2.0225000071150134e-05,
  'x_name': 10},
 {'average': np.float64(1.8355095999595507e-05),
  'context_size': 184,
  'deviation': np.float64(4.6556812484716014e-07),
  'max_exec': np.float64(1.9151500000589296e-05),
  'min_exec': np.float64(1.771008000105212e-05),
  'number': 50,
  'repeat': 10,
  'ttime': np.float64(0.00018355095999595506),
  'warmup_time': 2.355199990233814e-05,
  'x_name': 110}]

numpy dot

ctxs = [
    dict(
        va=numpy.random.randn(n).astype(numpy.float64),
        vb=numpy.random.randn(n).astype(numpy.float64),
        dot=numpy.dot,
        x_name=n,
    )
    for n in range(10, 50000, 100)
]

res_dot = list(measure_time_dim("dot(va, vb)", ctxs, verbose=1))

pprint.pprint(res_dot[:2])
  0%|          | 0/500 [00:00<?, ?it/s]
 20%|██        | 101/500 [00:01<00:07, 54.77it/s]
 21%|██▏       | 107/500 [00:09<00:45,  8.64it/s]
 22%|██▏       | 110/500 [00:11<00:54,  7.14it/s]
 22%|██▏       | 112/500 [00:12<01:04,  6.03it/s]
 23%|██▎       | 113/500 [00:13<01:09,  5.57it/s]
 23%|██▎       | 114/500 [00:13<01:18,  4.91it/s]
 23%|██▎       | 115/500 [00:14<01:37,  3.95it/s]
 23%|██▎       | 116/500 [00:15<01:50,  3.47it/s]
 23%|██▎       | 117/500 [00:16<02:05,  3.05it/s]
 24%|██▎       | 118/500 [00:16<02:20,  2.72it/s]
 24%|██▍       | 119/500 [00:17<02:32,  2.51it/s]
 24%|██▍       | 120/500 [00:18<02:47,  2.28it/s]
 24%|██▍       | 121/500 [00:18<03:07,  2.03it/s]
 24%|██▍       | 122/500 [00:19<03:15,  1.94it/s]
 25%|██▌       | 125/500 [00:19<02:14,  2.79it/s]
 25%|██▌       | 126/500 [00:20<02:30,  2.48it/s]
 25%|██▌       | 127/500 [00:20<02:32,  2.44it/s]
 26%|██▌       | 128/500 [00:21<02:47,  2.22it/s]
 26%|██▌       | 129/500 [00:21<02:47,  2.21it/s]
 26%|██▌       | 130/500 [00:22<02:53,  2.13it/s]
 26%|██▌       | 131/500 [00:22<02:56,  2.09it/s]
 26%|██▋       | 132/500 [00:23<03:03,  2.00it/s]
 27%|██▋       | 133/500 [00:24<03:16,  1.87it/s]
 27%|██▋       | 134/500 [00:24<03:28,  1.76it/s]
 27%|██▋       | 135/500 [00:25<03:26,  1.77it/s]
 27%|██▋       | 136/500 [00:25<03:27,  1.75it/s]
 27%|██▋       | 137/500 [00:26<03:09,  1.91it/s]
 28%|██▊       | 138/500 [00:27<03:25,  1.76it/s]
 28%|██▊       | 139/500 [00:27<03:34,  1.68it/s]
 28%|██▊       | 140/500 [00:28<03:43,  1.61it/s]
 28%|██▊       | 141/500 [00:29<04:18,  1.39it/s]
 28%|██▊       | 142/500 [00:30<04:33,  1.31it/s]
 29%|██▊       | 143/500 [00:30<04:27,  1.33it/s]
 29%|██▉       | 144/500 [00:31<04:09,  1.43it/s]
 29%|██▉       | 145/500 [00:32<03:57,  1.49it/s]
 29%|██▉       | 146/500 [00:32<03:38,  1.62it/s]
 29%|██▉       | 147/500 [00:33<03:31,  1.67it/s]
 30%|██▉       | 148/500 [00:33<03:24,  1.72it/s]
 30%|██▉       | 149/500 [00:33<02:41,  2.17it/s]
 30%|███       | 150/500 [00:34<02:46,  2.11it/s]
 30%|███       | 151/500 [00:34<02:52,  2.03it/s]
 30%|███       | 152/500 [00:35<03:03,  1.90it/s]
 31%|███       | 153/500 [00:36<04:06,  1.40it/s]
 31%|███       | 154/500 [00:37<04:20,  1.33it/s]
 31%|███       | 155/500 [00:38<04:04,  1.41it/s]
 31%|███       | 156/500 [00:38<03:44,  1.53it/s]
 31%|███▏      | 157/500 [00:39<03:29,  1.64it/s]
 32%|███▏      | 158/500 [00:39<03:23,  1.68it/s]
 32%|███▏      | 159/500 [00:40<03:28,  1.63it/s]
 32%|███▏      | 160/500 [00:40<03:20,  1.69it/s]
 32%|███▏      | 161/500 [00:41<03:17,  1.71it/s]
 32%|███▏      | 162/500 [00:42<03:19,  1.69it/s]
 33%|███▎      | 163/500 [00:42<03:27,  1.62it/s]
 33%|███▎      | 164/500 [00:43<03:23,  1.65it/s]
 33%|███▎      | 165/500 [00:43<03:23,  1.65it/s]
 33%|███▎      | 166/500 [00:44<03:41,  1.51it/s]
 33%|███▎      | 167/500 [00:46<05:06,  1.09it/s]
 34%|███▎      | 168/500 [00:48<07:19,  1.32s/it]
 34%|███▍      | 169/500 [00:49<05:59,  1.09s/it]
 34%|███▍      | 170/500 [00:49<05:12,  1.06it/s]
 34%|███▍      | 171/500 [00:51<06:15,  1.14s/it]
 34%|███▍      | 172/500 [00:53<07:39,  1.40s/it]
 35%|███▍      | 173/500 [00:54<06:49,  1.25s/it]
 35%|███▍      | 174/500 [00:54<05:47,  1.07s/it]
 35%|███▌      | 175/500 [00:55<05:16,  1.03it/s]
 35%|███▌      | 176/500 [00:56<05:05,  1.06it/s]
 35%|███▌      | 177/500 [00:57<05:00,  1.07it/s]
 36%|███▌      | 178/500 [00:57<04:30,  1.19it/s]
 36%|███▌      | 179/500 [00:59<05:29,  1.03s/it]
 36%|███▌      | 180/500 [01:00<05:00,  1.07it/s]
 36%|███▌      | 181/500 [01:00<04:38,  1.14it/s]
 36%|███▋      | 182/500 [01:01<04:19,  1.23it/s]
 37%|███▋      | 183/500 [01:02<04:18,  1.23it/s]
 37%|███▋      | 184/500 [01:03<04:11,  1.26it/s]
 37%|███▋      | 185/500 [01:03<03:55,  1.34it/s]
 37%|███▋      | 186/500 [01:04<03:55,  1.34it/s]
 37%|███▋      | 187/500 [01:05<04:44,  1.10it/s]
 38%|███▊      | 188/500 [01:06<04:31,  1.15it/s]
 38%|███▊      | 189/500 [01:07<03:59,  1.30it/s]
 38%|███▊      | 190/500 [01:08<04:18,  1.20it/s]
 38%|███▊      | 191/500 [01:08<03:54,  1.32it/s]
 38%|███▊      | 192/500 [01:09<03:41,  1.39it/s]
 39%|███▊      | 193/500 [01:09<03:05,  1.65it/s]
 40%|███▉      | 199/500 [01:09<01:00,  4.95it/s]
 41%|████      | 206/500 [01:10<00:40,  7.35it/s]
 41%|████▏     | 207/500 [01:10<00:46,  6.36it/s]
 45%|████▍     | 223/500 [01:10<00:14, 19.12it/s]
 47%|████▋     | 235/500 [01:11<00:08, 29.73it/s]
 49%|████▊     | 243/500 [01:12<00:23, 10.75it/s]
 53%|█████▎    | 267/500 [01:13<00:10, 22.77it/s]
 58%|█████▊    | 288/500 [01:13<00:05, 35.70it/s]
 61%|██████    | 303/500 [01:13<00:04, 45.90it/s]
 64%|██████▍   | 321/500 [01:13<00:02, 61.05it/s]
 67%|██████▋   | 337/500 [01:13<00:02, 74.63it/s]
 72%|███████▏  | 358/500 [01:13<00:01, 97.04it/s]
 77%|███████▋  | 384/500 [01:13<00:00, 128.09it/s]
 81%|████████  | 404/500 [01:13<00:00, 138.57it/s]
 85%|████████▍ | 423/500 [01:13<00:00, 137.27it/s]
 88%|████████▊ | 441/500 [01:14<00:00, 114.59it/s]
 91%|█████████ | 456/500 [01:15<00:01, 31.58it/s]
 93%|█████████▎| 467/500 [01:15<00:00, 36.58it/s]
 96%|█████████▋| 482/500 [01:15<00:00, 46.57it/s]
 99%|█████████▉| 494/500 [01:16<00:00, 52.43it/s]
100%|██████████| 500/500 [01:16<00:00,  6.57it/s]
[{'average': np.float64(5.089619994578243e-07),
  'context_size': 184,
  'deviation': np.float64(1.1037039081255569e-07),
  'max_exec': np.float64(8.386800027437858e-07),
  'min_exec': np.float64(4.653599989978829e-07),
  'number': 50,
  'repeat': 10,
  'ttime': np.float64(5.089619994578243e-06),
  'warmup_time': 1.5185999927780358e-05,
  'x_name': 10},
 {'average': np.float64(5.16787999913504e-07),
  'context_size': 184,
  'deviation': np.float64(7.752201159421148e-08),
  'max_exec': np.float64(7.030999995549791e-07),
  'min_exec': np.float64(4.777399999511545e-07),
  'number': 50,
  'repeat': 10,
  'ttime': np.float64(5.16787999913504e-06),
  'warmup_time': 3.971000069213915e-06,
  'x_name': 110}]

blas dot

numpy implementation uses BLAS. Let’s make a direct call to it.

for ctx in ctxs:
    ctx["ddot"] = cblas_ddot

res_ddot = list(measure_time_dim("ddot(va, vb)", ctxs, verbose=1))

pprint.pprint(res_ddot[:2])
  0%|          | 0/500 [00:00<?, ?it/s]
 10%|▉         | 48/500 [00:00<00:00, 476.00it/s]
 20%|██        | 101/500 [00:00<00:01, 397.94it/s]
 28%|██▊       | 142/500 [00:00<00:01, 322.88it/s]
 35%|███▌      | 176/500 [00:00<00:02, 152.02it/s]
 40%|████      | 200/500 [00:01<00:02, 126.95it/s]
 44%|████▎     | 218/500 [00:01<00:02, 127.99it/s]
 47%|████▋     | 235/500 [00:01<00:02, 122.34it/s]
 50%|█████     | 250/500 [00:01<00:02, 110.59it/s]
 53%|█████▎    | 265/500 [00:01<00:01, 117.89it/s]
 56%|█████▌    | 279/500 [00:01<00:01, 113.74it/s]
 59%|█████▉    | 297/500 [00:01<00:01, 125.89it/s]
 63%|██████▎   | 317/500 [00:02<00:01, 136.60it/s]
 66%|██████▋   | 332/500 [00:02<00:01, 116.69it/s]
 70%|██████▉   | 348/500 [00:02<00:01, 125.52it/s]
 73%|███████▎  | 366/500 [00:02<00:00, 137.18it/s]
 76%|███████▌  | 381/500 [00:02<00:00, 137.90it/s]
 79%|███████▉  | 396/500 [00:02<00:00, 104.76it/s]
 83%|████████▎ | 414/500 [00:02<00:00, 120.57it/s]
 86%|████████▌ | 428/500 [00:03<00:00, 103.97it/s]
 88%|████████▊ | 440/500 [00:03<00:00, 90.93it/s]
 91%|█████████ | 455/500 [00:03<00:00, 100.99it/s]
 94%|█████████▍| 472/500 [00:03<00:00, 115.86it/s]
 98%|█████████▊| 489/500 [00:03<00:00, 128.31it/s]
100%|██████████| 500/500 [00:03<00:00, 135.14it/s]
[{'average': np.float64(2.5997840007221384e-06),
  'context_size': 272,
  'deviation': np.float64(2.8652879951316554e-07),
  'max_exec': np.float64(3.2305600007020986e-06),
  'min_exec': np.float64(2.397080002083385e-06),
  'number': 50,
  'repeat': 10,
  'ttime': np.float64(2.5997840007221386e-05),
  'warmup_time': 0.0001528119998965849,
  'x_name': 10},
 {'average': np.float64(2.3516419992120065e-06),
  'context_size': 272,
  'deviation': np.float64(3.001319868108143e-08),
  'max_exec': np.float64(2.4202399981732015e-06),
  'min_exec': np.float64(2.309159999640542e-06),
  'number': 50,
  'repeat': 10,
  'ttime': np.float64(2.3516419992120065e-05),
  'warmup_time': 9.661999911259045e-06,
  'x_name': 110}]

Let’s display the results

df1 = DataFrame(res_pydot)
df1["fct"] = "pydot"
df2 = DataFrame(res_dot)
df2["fct"] = "numpy.dot"
df3 = DataFrame(res_ddot)
df3["fct"] = "ddot"

cc = concat([df1, df2, df3])
cc["N"] = cc["x_name"]

fig, ax = plt.subplots(1, 2, figsize=(10, 4))
cc[cc.N <= 1100].pivot(index="N", columns="fct", values="average").plot(
    logy=True, logx=True, ax=ax[0]
)
cc[cc.fct != "pydot"].pivot(index="N", columns="fct", values="average").plot(
    logy=True, logx=True, ax=ax[1]
)
ax[0].set_title("Comparison of dot implementations")
ax[1].set_title("Comparison of dot implementations\nwithout python")
Comparison of dot implementations, Comparison of dot implementations without python
Text(0.5, 1.0, 'Comparison of dot implementations\nwithout python')

The results depends on the machine, its number of cores, the compilation settings of numpy or this module.

Total running time of the script: (1 minutes 21.381 seconds)

Gallery generated by Sphinx-Gallery