Use Ray on Vertex AI with BigQuery Stay organized with collections Save and categorize content based on your preferences.
When you run a Ray application on Vertex AI, useBigQuery as your cloud database. Thissection covers how to read from and write to a BigQuery database fromyourRay cluster on Vertex AI.The steps in this section assume that you usethe Vertex AI SDK for Python.
To read from a BigQuery dataset,create a newBigQuery dataset or use an existing dataset.
Import and initialize Ray on Vertex AI client
If you're connected to your Ray cluster on Vertex AI, restart yourkernel and run the following code. Theruntime_env variable is necessary atconnection time to run BigQuery commands.
importrayfromgoogle.cloudimportaiplatform# The CLUSTER_RESOURCE_NAME is the one returned from vertex_ray.create_ray_cluster.address='vertex_ray://{}'.format(CLUSTER_RESOURCE_NAME)runtime_env={"pip":["google-cloud-aiplatform[ray]","ray==2.47.1"]}ray.init(address=address,runtime_env=runtime_env)
Read data from BigQuery
Read data from your BigQuery dataset. ARay Task must performthe read operation.
Note: The maximum query response size is 10 GB.aiplatform.init(project=PROJECT_ID,location=LOCATION)@ray.remotedefrun_remotely():importvertex_raydataset=DATASETparallelism=PARALLELISMquery=QUERYds=vertex_ray.data.read_bigquery(dataset=dataset,parallelism=parallelism,query=query)ds.materialize()
Where:
PROJECT_ID: Google Cloud project ID. Find the project IDin the Google Cloud consolewelcomepage.
LOCATION: The location where the
Datasetis stored. For example,us-central1.DATASET: BigQuery dataset. It must be in the format
dataset.table.Set toNoneif you provide a query.PARALLELISM: An integer that influences how many read tasks arecreated in parallel. There may be fewer read streams created than yourequested.
QUERY: A string containing a SQL query to read from BigQuery database. Set to
Noneif no query is required.
Transform data
Update and delete rows and columns from your BigQuery tables usingpyarrow orpandas. If you want to usepandas transformations,keep the input type as pyarrow and convert topandaswithin the user-defined function (UDF) so you can catch anypandas conversiontype errors within the UDF. ARay Task must perform the transformation.
@ray.remotedefrun_remotely():# BigQuery Read firstimportpandasaspdimportpyarrowaspadeffilter_batch(table:pa.Table)->pa.Table:df=table.to_pandas(types_mapper={pa.int64():pd.Int64Dtype()}.get)# PANDAS_TRANSFORMATIONS_HEREreturnpa.Table.from_pandas(df)ds=ds.map_batches(filter_batch,batch_format="pyarrow").random_shuffle()ds.materialize()# You can repartition before writing to determine the number of write blocksds=ds.repartition(4)ds.materialize()
Write data to BigQuery
Insert data to your BigQuery dataset. ARay Task must perform the write.
@ray.remotedefrun_remotely():# BigQuery Read and optional data transformation firstdataset=DATASETvertex_ray.data.write_bigquery(ds,dataset=dataset)
Where:
- DATASET: BigQuery dataset. The dataset must be in the format
dataset.table.
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.