Schedule workloads

BigQuery tasks are usually part of larger workloads, with externaltasks triggering and then being triggered by BigQuery operations.Workload scheduling helps data administrators, analysts, and developersorganize and optimize this chain of actions, creating a seamless connectionacross data resources and processes. Scheduling methods and tools assistin designing, building, implementing, and monitoring these complex dataworkloads.

Choose a scheduling method

To select a scheduling method, you should identify whether your workloadsare event-driven, time-driven, or both. Anevent is defined as a state change,such as a change to data in a database or a file added to a storage system. Inevent-driven scheduling, an action on a website might trigger a dataactivity, or an object landing in a certain bucket might need to be processedimmediately on arrival. Intime-driven scheduling, new data might need tobe loaded once per day or frequently enough to produce hourly reports. You canuse event-driven and time-driven scheduling in scenarios where you need toload objects into a data lake in real time, but activity reports on the datalake are only generated daily.

Choose a scheduling tool

Scheduling tools assist with tasks that are involved in managing complex dataworkloads, such as combining multiple Google Cloud or third-party services withBigQuery jobs, or running multiple BigQuery jobsin parallel. Each workload has unique requirements for dependency and parametermanagement to ensure that tasks are executed in the correct order using thecorrect data. Google Cloud provides several scheduling options that arebased on scheduling method and workload requirements.

We recommend using Dataform, Workflows,Cloud Composer, or Vertex AI Pipelines for most use cases.Consult the following chart for a side-by-side comparison:

DataformWorkflowsCloud ComposerVertex AI Pipelines
FocusData transformationMicroservicesETL or ELTMachine learning
Complexity********
User profileData analyst or administratorData architectData engineerData analyst
Code typeJavaScript, SQL,Python notebooksYAML or JSONPythonPython
Serverless?YesYesFully managedYes
Not suitable forChains of external servicesData transformation and processingLow latency or event-driven pipelinesInfrastructure tasks

The following sections detail these scheduling tools and several others.

Scheduled queries

The simplest form of workload scheduling isscheduling recurring queries directly inBigQuery. While this is the least complex approach toscheduling, we recommend it only for straightforward query chains with noexternal dependencies. Queries scheduled in this way must be written inGoogleSQL andcan includedata definition language (DDL)anddata manipulation language (DML)statements.

Scheduling method: time-driven

Dataform

Dataform is a free, SQL-based, opinionatedtransformation framework that schedules complex data transformation tasks inBigQuery. When raw data is loaded into BigQuery,Dataform helps you create an organized, tested,version-controlled collection of datasets and tables. UseDataform to schedule runs for yourdata preparations,notebooks,andBigQuery pipelines.

Scheduling method: time-driven

Note: If you create an asset in a BigQuery repository—for example, aquery, notebook (including a notebook with an Apache Spark job),BigQuery pipeline, or Dataform workflow—you cannotschedule it for execution in Dataform. Instead, you need to useBigQuery execution and scheduling capabilities. For moreinformation, seeScheduling queries,Schedule notebooks, andSchedule pipelines.

Workflows

Workflows is a serverless tool thatschedules HTTP-based services with very low latency. It is best for chainingmicroservices together, automating infrastructure tasks, integrating withexternal systems, or creating a sequence of operations in Google Cloud.To learn more about using Workflows with BigQuery,seeRun multiple BigQuery jobs in parallel.

Scheduling method: event-driven and time-driven

Cloud Composer

Cloud Composer is a fully managedtool built on Apache Airflow. It is best for extract, transform, load (ETL) orextract, load, transform (ELT) workloads as it supports severaloperatortypes and patterns, as well as task execution across other Google Cloudproducts and external targets. To learn more about using Cloud Composerwith BigQuery, seeRun a data analytics DAG in Google Cloud.

Scheduling method: time-driven

Vertex AI Pipelines

Vertex AI Pipelines is aserverless tool based on Kubeflow Pipelines specially designed for schedulingmachine learning workloads. It automates and connects all tasks of your modeldevelopment and deployment, from training data to code, giving you a completeview of how your models work. To learn more about usingVertex AI Pipelines with BigQuery, seeExport and deploy a BigQuery machine learning model for prediction.

Scheduling method: event-driven

Apigee Integration

Apigee Integrationis an extension of the Apigee platform that includes connectors anddata transformation tools. It is best for integrating with external enterpriseapplications, like Salesforce. To learn more about usingApigee Integration with BigQuery, seeGet started with Apigee Integration and a Salesforce trigger.

Scheduling method: event-driven and time-driven

Cloud Data Fusion

Cloud Data Fusion is a data integration tool thatoffers code-free ELT/ETL pipelines and over 150 preconfigured connectors andtransformations. To learn more about using Cloud Data Fusion withBigQuery, seeReplicating data from MySQL to BigQuery.

Scheduling method: event-driven and time-driven

Cloud Scheduler

Cloud Scheduler is a fully managedscheduler for jobs like batch streaming or infrastructure operations that shouldoccur on defined time intervals. To learn more about usingCloud Scheduler with BigQuery, seeScheduling workflows with Cloud Scheduler.

Scheduling method: time-driven

Cloud Tasks

Cloud Tasks is a fully managedservice for asynchronous task distribution of jobs that can executeindependently, outside of your main workload. It is best for delegating slowbackground operations or managing API call rates. To learn moreabout using Cloud Tasks with BigQuery, seeAdd a task to a Cloud Tasks queue.

Scheduling method: event-driven

Third-party tools

You can also connect to BigQuery using a number ofpopular third-party tools such as CData and SnapLogic. TheBigQuery Ready program offers afull list of validated partner solutions.

Messaging tools

Many data workloads require additional messaging connections between decoupledmicroservices that only need to be activated when certain events occur.Google Cloud provides two tools that are designed to integrate withBigQuery.

Pub/Sub

Pub/Sub is an asynchronous messaging toolfor data integration pipelines. It is designed to ingest and distribute datalike server events and user interactions. It can also be used for parallelprocessing and data streaming from IoT devices. To learn more about usingPub/Sub with BigQuery, seeStream from Pub/Sub to BigQuery.

Eventarc

Eventarc is an event-driven tool thatlets you manage the flow of state changes throughout your data pipeline. Thistool has a wide range of use cases including automated error remediation,resource labeling, image retouching, and more. To learn more about usingEventarc with BigQuery, seeBuild a BigQuery processing pipeline with Eventarc.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.