Exemple de profiling#

Profiling et fonction pdf. Le profiling est utilisé pour mesurer le temps que passe un programme dans chaque fonction.

Bizarrerie#

C’est un exemple qui m’a été envoyé par un étudiant pendant l’été pour montrer que la fonction pdf est plus lente qu’une réimplémentation simple qui fait à la même chose.

[1]:
import time
from scipy.stats import norm
import numpy as np
[2]:
debut = time.time()
for i in range(10**3):
    norm(2, 3).pdf(4)
fin = time.time()
fin - debut
[2]:
0.9644453525543213
[3]:
def density(x, mean, sigma2):
    return np.exp(-((x - mean) ** 2) / (2 * sigma2)) / (2 * np.pi * sigma2) ** 0.5


debut = time.time()
for i in range(10**3):
    density(4, 2, 3)
fin = time.time()
fin - debut
[3]:
0.001481771469116211

Que se passe-t-il ?

Tout d’abord la fonction pdf comme toute les fonctions des librairies numériques sont optimisées pour le calcul sur des matrices ou des vecteurs et non sur des nombres. Pour la suite, on utilise un profileur.

Profiler#

[4]:
import cProfile, io, pstats, os, sys


def doprofile(func, filename, *l):
    pr = cProfile.Profile()
    pr.enable()  # début du profiling
    func(*l)  # appel de la fonction
    pr.disable()  # fin du profiling
    s = io.StringIO()
    ps = pstats.Stats(pr, stream=s).sort_stats("cumulative")
    ps.print_stats()
    rem = os.path.normpath(os.path.join(os.getcwd(), "..", "..", ".."))
    res = s.getvalue().replace(rem, "")
    res = res.replace(sys.base_prefix, "").replace("\\", "/")
    ps.dump_stats(filename)
    return res
[5]:
import numpy

x = numpy.ones((10000000, 1)) * 4
x.shape
[5]:
(10000000, 1)
[6]:
debut = time.time()
y = norm.pdf(x)
fin = time.time()
print(fin - debut, y.shape, y[0])
0.6027283668518066 (10000000, 1) [0.00013383]
[17]:
import os
import scipy

path = os.path.normpath(os.path.join(scipy.__file__, "..", "..", ".."))

r = doprofile(norm.pdf, "pdf.prof", x)
print(r.replace(path, ""))
         113 function calls in 0.450 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.178    0.178    0.450    0.450 /site-packages/scipy/stats/_distn_infrastructure.py:1958(pdf)
        1    0.000    0.000    0.142    0.142 /site-packages/scipy/stats/_continuous_distns.py:361(_pdf)
        1    0.142    0.142    0.142    0.142 /site-packages/scipy/stats/_continuous_distns.py:300(_norm_pdf)
        7    0.022    0.003    0.093    0.013 {built-in method numpy.core._multiarray_umath.implement_array_function}
        1    0.000    0.000    0.048    0.048 <__array_function__ internals>:177(place)
        1    0.000    0.000    0.048    0.048 /site-packages/numpy/lib/function_base.py:1912(place)
        1    0.048    0.048    0.048    0.048 {built-in method numpy.core._multiarray_umath._insert}
        1    0.028    0.028    0.028    0.028 /site-packages/scipy/stats/_distn_infrastructure.py:975(_support_mask)
        1    0.000    0.000    0.022    0.022 <__array_function__ internals>:177(putmask)
        2    0.000    0.000    0.022    0.011 /site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
        2    0.021    0.011    0.021    0.011 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.000    0.000    0.021    0.021 /site-packages/scipy/stats/_distn_infrastructure.py:559(argsreduce)
        1    0.000    0.000    0.011    0.011 <__array_function__ internals>:177(all)
        1    0.000    0.000    0.011    0.011 /site-packages/numpy/core/fromnumeric.py:2406(all)
        1    0.000    0.000    0.011    0.011 <__array_function__ internals>:177(any)
        1    0.000    0.000    0.011    0.011 /site-packages/numpy/core/fromnumeric.py:2307(any)
        1    0.000    0.000    0.010    0.010 /site-packages/scipy/stats/_distn_infrastructure.py:604(<listcomp>)
        2    0.010    0.005    0.010    0.005 {method 'ravel' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(broadcast_arrays)
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/lib/stride_tricks.py:480(broadcast_arrays)
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/lib/stride_tricks.py:546(<listcomp>)
        3    0.000    0.000    0.000    0.000 /site-packages/numpy/lib/stride_tricks.py:340(_broadcast_to)
        2    0.000    0.000    0.000    0.000 /site-packages/numpy/core/_ufunc_config.py:32(seterr)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(atleast_1d)
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/core/_ufunc_config.py:434(__exit__)
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/core/_ufunc_config.py:429(__enter__)
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/core/shape_base.py:23(atleast_1d)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(shape)
        1    0.000    0.000    0.000    0.000 {built-in method numpy.zeros}
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/lib/stride_tricks.py:416(_broadcast_shape)
        2    0.000    0.000    0.000    0.000 /site-packages/numpy/core/_ufunc_config.py:131(geterr)
        3    0.000    0.000    0.000    0.000 {built-in method builtins.any}
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/lib/stride_tricks.py:538(<listcomp>)
        2    0.000    0.000    0.000    0.000 {built-in method numpy.seterrobj}
        3    0.000    0.000    0.000    0.000 /site-packages/numpy/lib/function_base.py:346(iterable)
        4    0.000    0.000    0.000    0.000 {built-in method numpy.geterrobj}
        9    0.000    0.000    0.000    0.000 /site-packages/numpy/lib/stride_tricks.py:345(<genexpr>)
        1    0.000    0.000    0.000    0.000 {built-in method numpy.asarray}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.all}
        6    0.000    0.000    0.000    0.000 {built-in method numpy.array}
        2    0.000    0.000    0.000    0.000 /site-packages/numpy/core/fromnumeric.py:70(<dictcomp>)
        1    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/core/fromnumeric.py:1965(shape)
        3    0.000    0.000    0.000    0.000 /site-packages/numpy/lib/stride_tricks.py:542(<genexpr>)
        1    0.000    0.000    0.000    0.000 {built-in method numpy.promote_types}
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/core/_ufunc_config.py:425(__init__)
        1    0.000    0.000    0.000    0.000 /site-packages/scipy/stats/_distn_infrastructure.py:941(_argcheck)
        3    0.000    0.000    0.000    0.000 /site-packages/numpy/lib/stride_tricks.py:25(_maybe_view_as_subclass)
        2    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/core/fromnumeric.py:2302(_any_dispatcher)
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/core/multiarray.py:1106(putmask)
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/core/fromnumeric.py:1961(_shape_dispatcher)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        3    0.000    0.000    0.000    0.000 {method '__exit__' of 'numpy.nditer' objects}
        3    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/lib/function_base.py:1908(_place_dispatcher)
        2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 /site-packages/scipy/stats/_distn_infrastructure.py:953(_get_support)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {built-in method numpy.asanyarray}
        1    0.000    0.000    0.000    0.000 <string>:2(_parse_args)
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/core/shape_base.py:19(_atleast_1d_dispatcher)
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/core/fromnumeric.py:2401(_all_dispatcher)
        1    0.000    0.000    0.000    0.000 /site-packages/numpy/lib/stride_tricks.py:476(_broadcast_arrays_dispatcher)



[8]:
def density(x, mean, sigma2):
    return np.exp(-((x - mean) ** 2) / (2 * sigma2)) / (2 * np.pi * sigma2) ** 0.5


debut = time.time()
y = density(x, 0.0, 1.0)
fin = time.time()
print(fin - debut, y.shape, y[0])
0.1882781982421875 (10000000, 1) [0.00013383]
[9]:
r = doprofile(density, "pdf.prof", x, 0, 1)
print(r)
         2 function calls in 0.177 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.177    0.177    0.177    0.177 /tmp/ipykernel_29119/200996087.py:1(density)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



Quand on regarde le code de la fonction, on s’aperçoit que la fonction perd du temps dans argsreduce. Elle fait aussi d’autres choses comme regarder les valeurs manquantes. En guise de conclusion, lorsqu’une fonction gère trop de cas particuliers (type, valeurs), elle est nécessairement plus lente qu’une fonction qu’on implémente soi-même.

[ ]:

[ ]:


Notebook on github