.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_constraint_kmeans.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_constraint_kmeans.py: ================= Constraint KMeans ================= Simple example to show how to cluster keeping approximatively the same number of points in every cluster. Data ==== .. GENERATED FROM PYTHON SOURCE LINES 13-47 .. code-block:: Python from collections import Counter import matplotlib.pyplot as plt import numpy from mlinsights.mlmodel import ConstraintKMeans from sklearn.cluster import KMeans from sklearn.datasets import make_blobs n_samples = 100 data = make_blobs( n_samples=n_samples, n_features=2, centers=2, cluster_std=1.0, center_box=(-10.0, 0.0), shuffle=True, random_state=2, ) X1 = data[0] data = make_blobs( n_samples=n_samples // 2, n_features=2, centers=2, cluster_std=1.0, center_box=(0.0, 10.0), shuffle=True, random_state=2, ) X2 = data[0] X = numpy.vstack([X1, X2]) X.shape .. rst-class:: sphx-glr-script-out .. code-block:: none (150, 2) .. GENERATED FROM PYTHON SOURCE LINES 48-49 Plots. .. GENERATED FROM PYTHON SOURCE LINES 49-54 .. code-block:: Python fig, ax = plt.subplots(1, 1, figsize=(4, 4)) ax.plot(X[:, 0], X[:, 1], ".") ax.set_title("4 clusters") .. image-sg:: /auto_examples/images/sphx_glr_plot_constraint_kmeans_001.png :alt: 4 clusters :srcset: /auto_examples/images/sphx_glr_plot_constraint_kmeans_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 1.0, '4 clusters') .. GENERATED FROM PYTHON SOURCE LINES 55-57 Standard KMeans =============== .. GENERATED FROM PYTHON SOURCE LINES 57-73 .. code-block:: Python km = KMeans(n_clusters=4) km.fit(X) cl = km.predict(X) hist = Counter(cl) colors = "brgy" fig, ax = plt.subplots(1, 1, figsize=(4, 4)) for i in range(max(cl) + 1): ax.plot(X[cl == i, 0], X[cl == i, 1], colors[i] + ".", label="cl%d" % i) x = [km.cluster_centers_[i, 0], km.cluster_centers_[i, 0]] y = [km.cluster_centers_[i, 1], km.cluster_centers_[i, 1]] ax.plot(x, y, colors[i] + "+") ax.set_title(f"KMeans 4 clusters\n{hist!r}") ax.legend() .. image-sg:: /auto_examples/images/sphx_glr_plot_constraint_kmeans_002.png :alt: KMeans 4 clusters Counter({np.int32(1): 50, np.int32(2): 48, np.int32(3): 35, np.int32(0): 17}) :srcset: /auto_examples/images/sphx_glr_plot_constraint_kmeans_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 74-76 Constraint KMeans ================= .. GENERATED FROM PYTHON SOURCE LINES 76-83 .. code-block:: Python km1 = ConstraintKMeans(n_clusters=4, strategy="gain", balanced_predictions=True) km1.fit(X) km2 = ConstraintKMeans(n_clusters=4, strategy="distance", balanced_predictions=True) km2.fit(X) .. raw:: html
ConstraintKMeans(balanced_predictions=True, n_clusters=4, strategy='distance')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 84-86 This algorithm tries to exchange points between clusters. .. GENERATED FROM PYTHON SOURCE LINES 86-90 .. code-block:: Python cl1 = km1.predict(X) hist1 = Counter(cl1) .. GENERATED FROM PYTHON SOURCE LINES 92-96 .. code-block:: Python cl2 = km2.predict(X) hist2 = Counter(cl2) .. GENERATED FROM PYTHON SOURCE LINES 98-115 .. code-block:: Python fig, ax = plt.subplots(1, 2, figsize=(10, 4)) for i in range(max(cl1) + 1): ax[0].plot(X[cl1 == i, 0], X[cl1 == i, 1], colors[i] + ".", label="cl%d" % i) ax[1].plot(X[cl2 == i, 0], X[cl2 == i, 1], colors[i] + ".", label="cl%d" % i) x = [km1.cluster_centers_[i, 0], km1.cluster_centers_[i, 0]] y = [km1.cluster_centers_[i, 1], km1.cluster_centers_[i, 1]] ax[0].plot(x, y, colors[i] + "+") x = [km2.cluster_centers_[i, 0], km2.cluster_centers_[i, 0]] y = [km2.cluster_centers_[i, 1], km2.cluster_centers_[i, 1]] ax[1].plot(x, y, colors[i] + "+") ax[0].set_title(f"ConstraintKMeans 4 clusters (gains)\n{hist1!r}") ax[0].legend() ax[1].set_title(f"ConstraintKMeans 4 clusters (distances)\n{hist2!r}") ax[1].legend() .. image-sg:: /auto_examples/images/sphx_glr_plot_constraint_kmeans_003.png :alt: ConstraintKMeans 4 clusters (gains) Counter({np.int32(0): 39, np.int32(2): 37, np.int32(3): 37, np.int32(1): 37}), ConstraintKMeans 4 clusters (distances) Counter({np.int32(2): 38, np.int32(0): 38, np.int32(3): 37, np.int32(1): 37}) :srcset: /auto_examples/images/sphx_glr_plot_constraint_kmeans_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 116-118 Another algorithm tries to extend the area of attraction of each cluster. .. GENERATED FROM PYTHON SOURCE LINES 118-125 .. code-block:: Python km = ConstraintKMeans(n_clusters=4, strategy="weights", max_iter=1000, history=True) km.fit(X) cl = km.predict(X) hist = Counter(cl) .. GENERATED FROM PYTHON SOURCE LINES 126-127 Let's plot Delaunay edges as well. .. GENERATED FROM PYTHON SOURCE LINES 127-153 .. code-block:: Python def plot_delaunay(ax, edges, points): for a, b in edges: ax.plot(points[[a, b], 0], points[[a, b], 1], "--", color="#555555") edges = km.cluster_edges() fig, ax = plt.subplots(1, 2, figsize=(10, 4)) for i in range(max(cl) + 1): ax[0].plot(X[cl == i, 0], X[cl == i, 1], colors[i] + ".", label="cl%d" % i) x = [km.cluster_centers_[i, 0], km.cluster_centers_[i, 0]] y = [km.cluster_centers_[i, 1], km.cluster_centers_[i, 1]] ax[0].plot(x, y, colors[i] + "+") ax[0].set_title(f"ConstraintKMeans 4 clusters\nstrategy='weights'\n{hist!r}") ax[0].legend() cls = km.cluster_centers_iter_ ax[1].plot(X[:, 0], X[:, 1], ".", label="X", color="#AAAAAA", ms=3) for i in range(max(cl) + 1): ms = numpy.arange(cls.shape[-1]).astype(numpy.float64) / cls.shape[-1] * 50 + 1 ax[1].scatter(cls[i, 0, :], cls[i, 1, :], color=colors[i], s=ms, label="cl%d" % i) plot_delaunay(ax[1], edges, km.cluster_centers_) ax[1].set_title("Centers movement") .. image-sg:: /auto_examples/images/sphx_glr_plot_constraint_kmeans_004.png :alt: ConstraintKMeans 4 clusters strategy='weights' Counter({np.int32(0): 49, np.int32(3): 49, np.int32(2): 48, np.int32(1): 4}), Centers movement :srcset: /auto_examples/images/sphx_glr_plot_constraint_kmeans_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 1.0, 'Centers movement') .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.568 seconds) .. _sphx_glr_download_auto_examples_plot_constraint_kmeans.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_constraint_kmeans.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_constraint_kmeans.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_constraint_kmeans.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_