spectral_clustering#

sklearn.cluster.spectral_clustering(affinity,*,n_clusters=8,n_components=None,eigen_solver=None,random_state=None,n_init=10,eigen_tol='auto',assign_labels='kmeans',verbose=False)[source]#

Apply clustering to a projection of the normalized Laplacian.

In practice Spectral Clustering is very useful when the structure ofthe individual clusters is highly non-convex or more generally whena measure of the center and spread of the cluster is not a suitabledescription of the complete cluster. For instance, when clusters arenested circles on the 2D plane.

If affinity is the adjacency matrix of a graph, this method can beused to find normalized graph cuts[1],[2].

Read more in theUser Guide.

Parameters:
affinity{array-like, sparse matrix} of shape (n_samples, n_samples)

The affinity matrix describing the relationship of the samples toembed.Must be symmetric.

Possible examples:
  • adjacency matrix of a graph,

  • heat kernel of the pairwise distance matrix of the samples,

  • symmetric k-nearest neighbours connectivity matrix of the samples.

n_clustersint, default=None

Number of clusters to extract.

n_componentsint, default=n_clusters

Number of eigenvectors to use for the spectral embedding.

eigen_solver{None, ‘arpack’, ‘lobpcg’, or ‘amg’}

The eigenvalue decomposition method. If None then'arpack' is used.See[4] for more details regarding'lobpcg'.Eigensolver'amg' runs'lobpcg' with optionalAlgebraic MultiGrid preconditioning and requires pyamg to be installed.It can be faster on very large sparse problems[6] and[7].

random_stateint, RandomState instance, default=None

A pseudo random number generator used for the initializationof the lobpcg eigenvectors decomposition wheneigen_solver=='amg', and for the K-Means initialization. Use an int to makethe results deterministic across calls (SeeGlossary).

Note

When usingeigen_solver=='amg',it is necessary to also fix the global numpy seed withnp.random.seed(int) to get deterministic results. Seepyamg/pyamg#139 for furtherinformation.

n_initint, default=10

Number of time the k-means algorithm will be run with differentcentroid seeds. The final results will be the best output of n_initconsecutive runs in terms of inertia. Only used ifassign_labels='kmeans'.

eigen_tolfloat, default=”auto”

Stopping criterion for eigendecomposition of the Laplacian matrix.Ifeigen_tol="auto" then the passed tolerance will depend on theeigen_solver:

  • Ifeigen_solver="arpack", theneigen_tol=0.0;

  • Ifeigen_solver="lobpcg" oreigen_solver="amg", theneigen_tol=None which configures the underlyinglobpcg solver toautomatically resolve the value according to their heuristics. See,scipy.sparse.linalg.lobpcg for details.

Note that when usingeigen_solver="lobpcg" oreigen_solver="amg"values oftol<1e-5 may lead to convergence issues and should beavoided.

Added in version 1.2:Added ‘auto’ option.

assign_labels{‘kmeans’, ‘discretize’, ‘cluster_qr’}, default=’kmeans’

The strategy to use to assign labels in the embeddingspace. There are three ways to assign labels after the Laplacianembedding. k-means can be applied and is a popular choice. But it canalso be sensitive to initialization. Discretization is anotherapproach which is less sensitive to random initialization[3].The cluster_qr method[5] directly extracts clusters from eigenvectorsin spectral clustering. In contrast to k-means and discretization, cluster_qrhas no tuning parameters and is not an iterative method, yet may outperformk-means and discretization in terms of both quality and speed. For a detailedcomparison of clustering strategies, refer to the following example:Segmenting the picture of greek coins in regions.

Changed in version 1.1:Added new labeling method ‘cluster_qr’.

verbosebool, default=False

Verbosity mode.

Added in version 0.24.

Returns:
labelsarray of integers, shape: n_samples

The labels of the clusters.

Notes

The graph should contain only one connected component, elsewherethe results make little sense.

This algorithm solves the normalized cut fork=2: it is anormalized spectral clustering.

References

Examples

>>>importnumpyasnp>>>fromsklearn.metrics.pairwiseimportpairwise_kernels>>>fromsklearn.clusterimportspectral_clustering>>>X=np.array([[1,1],[2,1],[1,0],...[4,7],[3,5],[3,6]])>>>affinity=pairwise_kernels(X,metric='rbf')>>>spectral_clustering(...affinity=affinity,n_clusters=2,assign_labels="discretize",random_state=0...)array([1, 1, 1, 0, 0, 0])

Gallery examples#

Segmenting the picture of greek coins in regions

Segmenting the picture of greek coins in regions

Spectral clustering for image segmentation

Spectral clustering for image segmentation