{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Mesures de vitesse sur les dataframes\n", "\n", "Le notebook montre comment lire un [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) avec un itérateur quand on ne connaît pas sa taille, ou lire un [array](https://numpy.org/doc/stable/reference/generated/numpy.array.html) avec un itérateur." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Création d'un dataframe à partir d'un itérateur\n", "\n", "On cherche à créer un dataframe à partir d'un ensemble de lignes dont on ne connaît pas le nombre au moment où on créé le dataframe car on les reçoit sous la forme d'un itérateur ou un générateur." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(0.4584214264768637,\n", " 0.0957370472492135,\n", " 0.825720254865909,\n", " 0.056222146826998554,\n", " 0.012568801665460705,\n", " 0.20797581971445256,\n", " 0.6508447830614892,\n", " 0.817974554103244,\n", " 0.04182207570159391,\n", " 0.591375261282058),\n", " (0.5818213564160107,\n", " 0.3384435930913253,\n", " 0.5900215149482624,\n", " 0.9556893663618211,\n", " 0.9156247392985197,\n", " 0.20153581804870713,\n", " 0.893987513368823,\n", " 0.11112779556835362,\n", " 0.043959856261986174,\n", " 0.233344273733338)]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import random\n", "\n", "\n", "def enumerate_row(nb=10000, n=10):\n", " for i in range(nb):\n", " # on retourne un tuple, les données sont\n", " # plus souvent recopiées car le type est immuable\n", " yield tuple(random.random() for k in range(n))\n", " # on retourne une liste, ces listes ne sont pas\n", " # recopiées en général, seule la liste qui les tient\n", " # l'est\n", " # yield list(random.random() for k in range(n))\n", "\n", "\n", "list(enumerate_row(2))" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | c0 | \n", "c1 | \n", "c2 | \n", "c3 | \n", "c4 | \n", "c5 | \n", "c6 | \n", "c7 | \n", "c8 | \n", "c9 | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0.155969 | \n", "0.431193 | \n", "0.995451 | \n", "0.081467 | \n", "0.257834 | \n", "0.457617 | \n", "0.773857 | \n", "0.843436 | \n", "0.842255 | \n", "0.570137 | \n", "
1 | \n", "0.876386 | \n", "0.702447 | \n", "0.130592 | \n", "0.084160 | \n", "0.782795 | \n", "0.065442 | \n", "0.682476 | \n", "0.077565 | \n", "0.444916 | \n", "0.025166 | \n", "
2 | \n", "0.854808 | \n", "0.873240 | \n", "0.055319 | \n", "0.518709 | \n", "0.486142 | \n", "0.034237 | \n", "0.979128 | \n", "0.997898 | \n", "0.472220 | \n", "0.512437 | \n", "
3 | \n", "0.476952 | \n", "0.250016 | \n", "0.964843 | \n", "0.579930 | \n", "0.693238 | \n", "0.103160 | \n", "0.249000 | \n", "0.850935 | \n", "0.632083 | \n", "0.738248 | \n", "
4 | \n", "0.773502 | \n", "0.237446 | \n", "0.974755 | \n", "0.564504 | \n", "0.684763 | \n", "0.361164 | \n", "0.152243 | \n", "0.320242 | \n", "0.218529 | \n", "0.411604 | \n", "