{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Plus proches voisins en grande dimension\n", "\n", "La méthodes des [plus proches voisins](https://fr.wikipedia.org/wiki/Recherche_des_plus_proches_voisins) est un algorithme assez simple. Que se passe-t-il quand la dimension de l'espace des features augmente ? Comment y remédier ? Le profiling [memory_profiler](https://pypi.python.org/pypi/memory_profiler) ou [cprofile](https://docs.python.org/3.7/library/profile.html#module-cProfile) sont des outils utiles pour savoir où le temps est perdu. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q1 : k-nn : mesurer la performance" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 3.62557523, -3.92972784, -2.19327029, -3.01669145, -3.66440003,\n", " 0.05373302, -0.09569564, -1.62733 , -3.05437465, 3.43404744],\n", " [-1.88137987, 1.1603541 , 1.97569429, 2.28962313, 1.06727548,\n", " 0.81364917, -2.15972723, -1.99923386, 0.25393473, 2.67807834],\n", " [ 1.74986482, -2.68848993, 0.83230911, -0.15836161, 0.71428315,\n", " -2.53155132, -0.49799497, -1.53866452, -2.55477724, 2.79401366]])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.datasets import make_classification\n", "\n", "datax, datay = make_classification(\n", " 10000, n_features=10, n_classes=3, n_clusters_per_class=2, n_informative=8\n", ")\n", "datax[:3]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
KNeighborsClassifier(algorithm='brute')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
KNeighborsClassifier(algorithm='brute')