Clustering overview

Clustering is an unsupervised machine learning technique you can use to groupsimilar records together. It is a useful approach for when you want tounderstand what groups or clusters you have in your data, but don't havelabeled data to train a model on. For example, if you had unlabeled data aboutsubway ticket purchases, you could cluster that data by ticket purchase time tobetter understand what time periods have the heaviest subway usage. For moreinformation, seeWhat is clustering?

K-means modelsare widely used to perform clustering. You can use k-means models with theML.PREDICT functionto cluster data, or with theML.DETECT_ANOMALIES functionto performanomaly detection.

K-means models usecentroid-based clustering to organize data into clusters.To get information about a k-mean model's centroids, you can use theML.CENTROIDS function.

Recommended knowledge

By using the default settings in theCREATE MODEL statements and theinference functions, you can create and use a clustering model evenwithout much ML knowledge. However, having basic knowledge aboutML development, and clustering models in particular,helps you optimize both your data and your model todeliver better results. We recommend using the following resources to developfamiliarity with ML techniques and processes:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.