Anomaly detection overview
Anomaly detection is a data mining technique that you can use to identify datadeviations in a given dataset. For example, if the return rate for a givenproduct increases substantially from the baseline for that product, that mightindicate a product defect or potential fraud. You can use anomaly detection todetect critical incidents, such as technical issues, or opportunities, such aschanges in consumer behavior.
It can be challenging to determine what counts as anomalous data. If you aren'tcertain what counts as anomalous data, or you don't have labeleddata to train a model on, you can use unsupervised machine learning to performanomaly detection. Use theAI.DETECT_ANOMALIES functionorML.DETECT_ANOMALIES functionwith one of the following models to detect anomalies in training data or newserving data:
| Data type | Model types | Function | What the function does |
|---|---|---|---|
| Time series | TimesFM | AI.DETECT_ANOMALIES | Detect the anomalies in the time series. |
ARIMA_PLUS | ML.DETECT_ANOMALIES | Detect the anomalies in the time series. | |
ARIMA_PLUS_XREG | ML.DETECT_ANOMALIES | Detect the anomalies in the time series with external regressors. | |
| Independent and identically distributed random variables (IID) | K-means | ML.DETECT_ANOMALIES | Detect anomalies based on the shortest distance among the normalized distances from the input data to each cluster centroid. For a definition of normalized distances, seethe k-means model output for theML.DETECT_ANOMALIES function. |
| Autoencoder | Detect anomalies based on the reconstruction loss in terms of mean squared error. For more information, seeML.RECONSTRUCTION_LOSS. TheML.RECONSTRUCTION_LOSS function can retrieve all types of reconstruction loss. | ||
| PCA | Detect anomalies based upon the reconstruction loss in terms of mean squared error. |
If you already have labeled data that identifies anomalies, you canperform anomaly detection by using theML.PREDICT functionwith one of the following supervised machine learning models:
- Linear and logistic regression models
- Boosted trees models
- Random forest models
- Deep neural network (DNN) models
- Wide & Deep models
- AutoML models
Recommended knowledge
By using the default settings in theCREATE MODEL statements and theinference functions, you can create and use an anomaly detectionmodel even without much ML knowledge. However, having basic knowledge aboutML development helps you optimize both your data and your model todeliver better results. We recommend using the following resources to developfamiliarity with ML techniques and processes:
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.