Introduction to ML in BigQuery

Note: This feature may not be available when using reservations that are created with certain BigQuery editions. For more information about which features are enabled in each edition, seeIntroduction to BigQuery editions.

BigQuery ML lets youcreate and run machine learning (ML) models byusing either GoogleSQL queries or the Google Cloud console.BigQuery ML models are storedin BigQuery datasets, similar to tables and views.BigQuery ML also lets you accessVertex AI models andCloud AI APIs to perform artificialintelligence (AI) tasks like text generation or machinetranslation. Gemini for Google Cloud also provides AI-poweredassistance for BigQuery tasks. To see a list of AI-poweredfeatures in BigQuery, seeGemini in BigQuery overview.

Usually, performing ML or AI on large datasets requires extensive programmingand knowledge of ML frameworks. These requirements restrict solution developmentto a very small set of people within each company, and they exclude dataanalysts who understand the data but have limited ML knowledge and programmingexpertise. However, with BigQuery ML, SQL practitioners can useexisting SQL tools and skills to build and evaluate models, and to generateresults from LLMs and Cloud AI APIs.

You can work with BigQuery ML capabilities by using thefollowing:

Advantages of BigQuery ML

BigQuery ML offers several advantages over other approaches tousing ML or AI with a cloud-based data warehouse:

  • BigQuery ML democratizes the use of ML and AI by empowering dataanalysts, the primary data warehouse users, to build and run models usingexisting business intelligence tools and spreadsheets. Predictive analyticscan guide business decision-making across the organization.
  • You don't need to program an ML or AI solution using Python or Java. Youtrain models and access AI resources by using SQL—a language that'sfamiliar to data analysts.
  • BigQuery ML increases the speed of model development andinnovation by removing the need to move data from the data warehouse.Instead, BigQuery ML brings ML to the data, which offers thefollowing advantages:

    • Reduced complexity because fewer tools are required.
    • Increased speed to production because moving and formatting large amountsof data for Python-based ML frameworks isn't required to train a model inBigQuery.

    For more information, watch the videoHow to accelerate machine learning development with BigQuery ML.

Recommended knowledge

By using the default settings in theCREATE MODEL statements and theinference functions, you can create and use BigQuery ML modelseven without much ML knowledge. However, having basic knowledge about theML development lifecycle, such as feature engineering and model training,helps you optimize both your data and your model todeliver better results. We recommend using the following resources to developfamiliarity with ML techniques and processes:

Work with time series

You can use the TimesFM,ARIMA_PLUS, andARIMA_PLUS_XREG models to performforecasting andanomaly detectionon time series data.

Perform contribution analysis

You can create acontribution analysismodel to generate insights about changes to key metrics in yourmulti-dimensional data. For example, you can find out what data contributed toa change in revenue.

Supported models

Amodel inBigQuery ML represents what an ML system haslearned from training data. The following sections describe the types of modelsthat BigQuery ML supports. For more information aboutcreating reservation assignments for the different types of models, seeAssign slots to BigQuery ML workloads.

Internally trained models

The following models are built in to BigQuery ML:

  • Contribution analysis is for determining the effect ofone or more dimensions on the value for a given metric. For example, seeing theeffect of store location and sales date on store revenue. For moreinformation, seeContribution analysis overview.
  • Linear regressionis for predicting the value of a numerical metric for new data by using amodel trained on similar remote data. Labels are real-valued, meaningthey cannot be positive infinity or negative infinity or a NaN (Not a Number).
  • Logistic regressionis for the classification of two or more possible values such as whether aninput islow-value,medium-value, orhigh-value. Labels can have up to 50 unique values.
  • K-means clusteringis for data segmentation. For example, this model identifies customersegments. K-means is an unsupervised learning technique, so model trainingdoesn't require labels or split data for training or evaluation.
  • Matrix factorizationis for creating product recommendation systems. You can create productrecommendations using historical customer behavior, transactions, and productratings, and then use those recommendations for personalized customerexperiences.
  • Principal component analysis (PCA)is the process of computing the principal components and using them toperform a change of basis on the data. It's commonly used for dimensionalityreduction by projecting each data point onto only the first few principalcomponents to obtain lower-dimensional data while preserving as much of thedata's variation as possible.
  • Time series is for performing time series forecasts and anomaly detection.TheARIMA_PLUSandARIMA_PLUS_XREGtime series models offer multiple tuning options, and automatically handleanomalies, seasonality, and holidays.

    If you don't want to manage your own time series forecasting model, you canuse theAI.FORECAST functionwith BigQuery ML's built-inTimesFM time series model(Preview) to perform forecasting.

You can perform adry run on theCREATE MODEL statements for internally trained models to get an estimate ofhow much data they will process if you run them.

Externally trained models

The following models are external to BigQuery ML and trained inVertex AI:

  • Deep neural network (DNN)is for creating TensorFlow-based deep neural networks forclassification and regression models.
  • Wide & Deepis useful for generic large-scale regression and classification problemswith sparse inputs(categorical features with a large number of possible feature values), such as recommendersystems, search, and ranking problems.
  • Autoencoderis for creating TensorFlow-based modelswith the support of sparse data representations. You can use the models inBigQuery ML for tasks such as unsupervised anomaly detectionand non-linear dimensionality reduction.
  • Boosted treesis for creating classification and regression models that are based onXGBoost.
  • Random forestis for constructing multiple learning method decision trees forclassification, regression, and other tasks at training time.
  • AutoMLis a supervised ML service that builds and deploys classification andregression models on tabular data at high speed and scale.

You can't perform adry run on theCREATE MODEL statements for externally trained models to get an estimate ofhow much data they will process if you run them.

Remote models

You can createremote modelsin BigQuery that use models deployed toVertex AI.You reference the deployed model by specifying the model'sHTTPS endpointin the remote model'sCREATE MODEL statement.

TheCREATE MODEL statements for remote models don't process any bytes anddon't incur BigQuery charges.

Imported models

BigQuery ML lets you import custom models that are trained outsideof BigQuery and then perform prediction withinBigQuery. You can import the following models intoBigQuery fromCloud Storage:

  • Open Neural Network Exchange (ONNX)is an open standard format for representing ML models. UsingONNX, you can make models that are trained with popular ML frameworkslike PyTorch and scikit-learn available in BigQuery ML.
  • TensorFlowis a free, open source software libraryfor ML and artificial intelligence. You can use TensorFlowacross a range of tasks, but it has a particular focus on training andinference of deep neural networks. You can load previously trainedTensorFlow models into BigQuery asBigQuery ML models and then perform prediction inBigQuery ML.
  • TensorFlow Liteis a light version of TensorFlow for deployment on mobiledevices, microcontrollers, and other edge devices. TensorFlowoptimizes existing TensorFlow models for reduced model size andfaster inference.
  • XGBoostis an optimized distributed gradient boosting library designed to be highlyefficient, flexible, and portable. It implements ML algorithmsunder thegradient boosting framework.

TheCREATE MODEL statements for imported models don't process any bytes anddon't incur BigQuery charges.

In BigQuery ML, you can use a model with data from multipleBigQuery Datasets for training and for prediction.

Model selection guide

This decision tree maps ML models to actions that you want to accomplish.Download the model selection decision tree.

BigQuery ML and Vertex AI

BigQuery ML integrates with Vertex AI, which is theend-to-end platform for AI and ML in Google Cloud. You can register yourBigQuery ML models to Model Registry inorder to deploy these models to endpoints for online prediction. For moreinformation, see the following:

BigQuery ML and Colab Enterprise

You can now use Colab Enterprise notebooks to perform MLworkflows in BigQuery. Notebooks let you use SQL, Python,and other popular libraries and languages to accomplish your ML tasks.For more information, seeCreate notebooks.

Supported regions

BigQuery ML is supported in the same regions asBigQuery. For more information, seeBigQuery ML locations.

Pricing

You are charged for the compute resources that you use to train models and torun queries against models. The type of model that you create affects where themodel is trained and the pricing that applies to that operation. Queriesagainst models always run in BigQuery and useBigQuery compute pricing.Becauseremote models make calls to Vertex AImodels, queries against remote models also incur charges fromVertex AI.

You are charged for the storage used by trained models, usingBigQuery storage pricing.

For more information, seeBigQuery ML pricing.

Quotas

In addition toBigQuery ML-specific limits,queries that use BigQuery ML functions andCREATE MODELstatements are subject to the quotas and limits on BigQueryquery jobs.

Limitations

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.