KMeans with norm L1

This demonstrates how results change when using norm L1 for a k-means algorithm.

import matplotlib.pyplot as plt
import numpy
import numpy.random as rnd
from sklearn.cluster import KMeans
from mlinsights.mlmodel import KMeansL1L2

Simple datasets

N = 1000
X = numpy.zeros((N * 2, 2), dtype=numpy.float64)
X[:N] = rnd.rand(N, 2)
X[N:] = rnd.rand(N, 2)
# X[N:, 0] += 0.75
X[N:, 1] += 1
X[: N // 10, 0] -= 2
X.shape
(2000, 2)
fig, ax = plt.subplots(1, 1)
ax.plot(X[:, 0], X[:, 1], ".")
ax.set_title("Two squares")
Two squares
Text(0.5, 1.0, 'Two squares')

Classic KMeans

It uses euclidean distance.

km = KMeans(2)
km.fit(X)

km.cluster_centers_


def plot_clusters(km_, X, ax):
    lab = km_.predict(X)
    for i in range(km_.cluster_centers_.shape[0]):
        sub = X[lab == i]
        ax.plot(sub[:, 0], sub[:, 1], ".", label="c=%d" % i)
    C = km_.cluster_centers_
    ax.plot(C[:, 0], C[:, 1], "o", ms=15, label="centers")
    ax.legend()


fig, ax = plt.subplots(1, 1)
plot_clusters(km, X, ax)
ax.set_title("L2 KMeans")
L2 KMeans
Text(0.5, 1.0, 'L2 KMeans')

KMeans with L1 norm

kml1 = KMeansL1L2(2, norm="L1")
kml1.fit(X)
KMeansL1L2(n_clusters=2, norm='L1')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


array([[0.45952102, 0.44113399],
       [0.47613888, 1.42690376]])
fig, ax = plt.subplots(1, 1)
plot_clusters(kml1, X, ax)
ax.set_title("L1 KMeans")
L1 KMeans
Text(0.5, 1.0, 'L1 KMeans')

When clusters are completely different

N = 1000
X = numpy.zeros((N * 2, 2), dtype=numpy.float64)
X[:N] = rnd.rand(N, 2)
X[N:] = rnd.rand(N, 2)
# X[N:, 0] += 0.75
X[N:, 1] += 1
X[: N // 10, 0] -= 4
X.shape
(2000, 2)
km = KMeans(2)
km.fit(X)
KMeans(n_clusters=2)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


kml1 = KMeansL1L2(2, norm="L1")
kml1.fit(X)
KMeansL1L2(n_clusters=2, norm='L1')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


fig, ax = plt.subplots(1, 2, figsize=(10, 4))
plot_clusters(km, X, ax[0])
plot_clusters(kml1, X, ax[1])
ax[0].set_title("L2 KMeans")
ax[1].set_title("L1 KMeans")
L2 KMeans, L1 KMeans
Text(0.5, 1.0, 'L1 KMeans')

Total running time of the script: (0 minutes 0.532 seconds)

Gallery generated by Sphinx-Gallery