Use Ray on Vertex AI with BigQuery

When you run a Ray application on Vertex AI, useBigQuery as your cloud database. Thissection covers how to read from and write to a BigQuery database fromyourRay cluster on Vertex AI.The steps in this section assume that you usethe Vertex AI SDK for Python.

To read from a BigQuery dataset,create a newBigQuery dataset or use an existing dataset.

Import and initialize Ray on Vertex AI client

If you're connected to your Ray cluster on Vertex AI, restart yourkernel and run the following code. Theruntime_env variable is necessary atconnection time to run BigQuery commands.

importrayfromgoogle.cloudimportaiplatform# The CLUSTER_RESOURCE_NAME is the one returned from vertex_ray.create_ray_cluster.address='vertex_ray://{}'.format(CLUSTER_RESOURCE_NAME)runtime_env={"pip":["google-cloud-aiplatform[ray]","ray==2.47.1"]}ray.init(address=address,runtime_env=runtime_env)

Read data from BigQuery

Read data from your BigQuery dataset. ARay Task must performthe read operation.

Note: The maximum query response size is 10 GB.
aiplatform.init(project=PROJECT_ID,location=LOCATION)@ray.remotedefrun_remotely():importvertex_raydataset=DATASETparallelism=PARALLELISMquery=QUERYds=vertex_ray.data.read_bigquery(dataset=dataset,parallelism=parallelism,query=query)ds.materialize()

Where:

  • PROJECT_ID: Google Cloud project ID. Find the project IDin the Google Cloud consolewelcomepage.

  • LOCATION: The location where theDataset is stored. For example,us-central1.

  • DATASET: BigQuery dataset. It must be in the formatdataset.table.Set toNone if you provide a query.

  • PARALLELISM: An integer that influences how many read tasks arecreated in parallel. There may be fewer read streams created than yourequested.

  • QUERY: A string containing a SQL query to read from BigQuery database. Set toNone if no query is required.

Transform data

Update and delete rows and columns from your BigQuery tables usingpyarrow orpandas. If you want to usepandas transformations,keep the input type as pyarrow and convert topandaswithin the user-defined function (UDF) so you can catch anypandas conversiontype errors within the UDF. ARay Task must perform the transformation.

@ray.remotedefrun_remotely():# BigQuery Read firstimportpandasaspdimportpyarrowaspadeffilter_batch(table:pa.Table)->pa.Table:df=table.to_pandas(types_mapper={pa.int64():pd.Int64Dtype()}.get)# PANDAS_TRANSFORMATIONS_HEREreturnpa.Table.from_pandas(df)ds=ds.map_batches(filter_batch,batch_format="pyarrow").random_shuffle()ds.materialize()# You can repartition before writing to determine the number of write blocksds=ds.repartition(4)ds.materialize()

Write data to BigQuery

Insert data to your BigQuery dataset. ARay Task must perform the write.

@ray.remotedefrun_remotely():# BigQuery Read and optional data transformation firstdataset=DATASETvertex_ray.data.write_bigquery(ds,dataset=dataset)

Where:

  • DATASET: BigQuery dataset. The dataset must be in the formatdataset.table.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.