ML pipelines overview

This document provides an overview of the services you can use to build an MLpipeline to manage your BigQuery MLMLOpsworkflow.

An ML pipeline is a representation of an MLOps workflow that is composed of aseries ofpipeline tasks. Each pipeline task performs a specific step in theMLOps workflow to train and deploy a model. Separating each step into astandardized, reusable task lets you automate and monitor repeatable processesin your ML practice.

You can use any of the following services to create BigQuery MLML pipelines:

  • Use Vertex AI Pipelines to create portable, extensible ML pipelines.
  • Use GoogleSQL queries to create less complex SQL-based MLpipelines.
  • Use Dataform to create more complex SQL-based ML pipelines, or MLpipelines where you need to use version control.

Vertex AI Pipelines

InVertex AI Pipelines,an ML pipeline is structured as a directed acyclic graph (DAG) of containerizedpipeline tasks that are interconnected using input-output dependencies.Eachpipeline taskis an instantiation of apipeline componentwith specific inputs. When defining your ML pipeline, you connect multiplepipeline tasks to form a DAG by routing the outputs of one pipeline task to theinputs for the next pipeline task in the ML workflow. You can also use theoriginal inputs to the ML pipeline as the inputs for a given pipeline task.

Use theBigQuery ML componentsof the Google Cloud Pipeline Components SDK to compose ML pipelinesin Vertex AI Pipelines. To get started withBigQuery ML components, see the following notebooks:

GoogleSQL queries

You can useGoogleSQL procedural languageto execute multiple statements in amulti-statement query. You can use amulti-statement query to:

  • Run multiple statements in a sequence, with shared state.
  • Automate management tasks such as creating or dropping tables.
  • Implement complex logic using programming constructs such asIF andWHILE.

After creating a multi-statement query, you cansave andschedule the query to automate modeltraining, inference, and monitoring.

If your ML pipeline includes use of theML.GENERATE_TEXT function,seeHandle quota errors by callingML.GENERATE_TEXT iteratively for more information on how to use SQL toiterate through calls to the function. Calling the functioniteratively lets you address any retryable errors that occur due to exceedingthequotas and limits.

Dataform

You can useDataform to develop,test, version control, and schedule complex SQL workflows for datatransformation in BigQuery. You can use Dataform forsuch tasks as data transformation in the Extraction, Loading, andTransformation (ELT) process for data integration. After raw data is extractedfrom source systems and loaded into BigQuery,Dataform helps you to transform it into a well-defined, tested,and documented suite of data tables.

If your ML pipeline includes use of theML.GENERATE_TEXT function,you can adapt thestructured_table_ml.js example libraryto iterate through calls to the function. Calling the functioniteratively lets you address any retryable errors that occur due to exceedingthe quotas and limits that apply to the function.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.