pylipid.func.cluster_KMeans

pylipid.func.cluster_KMeans(data, n_clusters)[source]

Cluster data using KMeans.

This function clusters the samples using KMeans provided by scikit. The KMeans cluster separates the samples into n clusters of equal variances, via minimizing the inertia, which is defined as:

\[\sum_{i=0}^{n} \min _{u_{i} \in C}\left(\left\|x_{i}-u_{i}\right\|^{2}\right)\]

where \(u_{i}\) is the centroid of cluster i. KMeans scales well with large dataset but performs poorly with clusters of varying sizes and density.

Parameters
  • data (numpy.ndarray, shape=(n_samples, n_dims)) – Sample data to find clusters.

  • n_clusters (int) – The number of clusters to form as well as the number of centroids to generate.

Returns

labels – Cluster labels for each data point.

Return type

array_like, shape=(n_samples)