pylipid.func.cluster_DBSCAN

pylipid.func.cluster_DBSCAN(data, eps=None, min_samples=None, metric='euclidean')[source]

Cluster data using DBSCAN.

This function clusters the samples using a density-based cluster DBSCAN provided by scikit. DBSCAN finds clusters of core samples of high density. A sample point is a core sample if at least min_samples points are within distance \(\varepsilon\) of it. A cluster is defined as a set of sample points that are mutually density-connected and density-reachable, i.e. there is a path \(\left\langle p_{1}, p_{2}, \ldots, p_{n}\right\rangle\) where each \(p_{i+1}\) is within distance \(\varepsilon\) of \(p_{i}\) for any two p in the two. The values of min_samples and \(\varepsilon\) determine the performance of this cluster.

If None, min_samples takes the value of 2 * n_dims. If \(\varepsilon\) is None, it is set as the value at the knee of the k-distance plot.

Parameters
  • data (numpy.ndarray, shape=(n_samples, n_dims)) – Sample data to find clusters.

  • eps (None or scalar, default=None) – The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function. If None, it is set as the value at the knee of the k-distance plot.

  • min_samples (None or scalar, default=None) – The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself. If None, it takes the value of 2 * n_dims

  • metric (string or callable, default=’euclidean’) – The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by sklearn.metrics.pairwise_distances for its metric parameter.

Returns

  • labels (array_like, shape=(n_samples,)) – Cluster labels for each data point.

  • core_sample_indices (array_like, shape=(n_clusters,)) – Indices of core samples.