Make predictions with scikit-learn models in ONNX format

This tutorial shows you how to import anOpen Neural Network Exchange(ONNX) model that's trained withscikit-learn. You import the model into aBigQuery dataset and use it to make predictions using a SQL query.

ONNX provides a uniform format that is designed to represent any machinelearning (ML) framework. BigQuery ML support for ONNX lets you do thefollowing:

  • Train a model using your favorite framework.
  • Convert the model into the ONNX model format.
  • Import the ONNX model into BigQuery and make predictionsusing BigQuery ML.

Objectives

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use thepricing calculator.

New Google Cloud users might be eligible for afree trial.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, seeClean up.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  4. Verify that billing is enabled for your Google Cloud project.

  5. Enable the BigQuery and Cloud Storage APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the APIs

  6. Ensure that you have thenecessary permissions to perform the tasks in this document.

Required roles

If you create a new project, you're the project owner, and you're granted allof the required Identity and Access Management (IAM) permissions that you need to completethis tutorial.

If you're using an existing project, do the following.

Make sure that you have the following role or roles on the project:

Check for the roles

  1. In the Google Cloud console, go to theIAM page.

    Go to IAM
  2. Select the project.
  3. In thePrincipal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.

  4. For all rows that specify or include you, check theRole column to see whether the list of roles includes the required roles.

Grant the roles

  1. In the Google Cloud console, go to theIAM page.

    Go to IAM
  2. Select the project.
  3. ClickGrant access.
  4. In theNew principals field, enter your user identifier. This is typically the email address for a Google Account.

  5. ClickSelect a role, then search for the role.
  6. To grant additional roles, clickAdd another role and add each additional role.
  7. ClickSave.

For more information about IAM permissions in BigQuery,seeIAM permissions.

Optional: Train a model and convert it to ONNX format

The following code samples show you how to train a classification model withscikit-learn and how to convert the resulting pipeline into ONNX format. Thistutorial uses a prebuilt example model that's stored atgs://cloud-samples-data/bigquery/ml/onnx/pipeline_rf.onnx. You don't have tocomplete these steps if you're using the sample model.

Train a classification model with scikit-learn

Use the following sample code to create and train a scikit-learnpipelineon theIris dataset. For instructions about installing and usingscikit-learn, see thescikit-learn installation guide.

importnumpyfromsklearn.datasetsimportload_irisfromsklearn.pipelineimportPipelinefromsklearn.preprocessingimportStandardScalerfromsklearn.ensembleimportRandomForestClassifierdata=load_iris()X=data.data[:,:4]y=data.targetind=numpy.arange(X.shape[0])numpy.random.shuffle(ind)X=X[ind,:].copy()y=y[ind].copy()pipe=Pipeline([('scaler',StandardScaler()),('clr',RandomForestClassifier())])pipe.fit(X,y)
Note: The scikit-learn pipeline lets you include models from other librariessuch asLightGBM andXGBoost, which can be converted to ONNX bysklearn-onnx. For more information, seeConvert a pipeline andUsingconverters from other libraries.

Convert the pipeline into an ONNX model

Use the following sample code insklearn-onnx to convert the scikit-learnpipeline into an ONNX model that's namedpipeline_rf.onnx.

fromskl2onnximportconvert_sklearnfromskl2onnx.common.data_typesimportFloatTensorType# Disable zipmap as it is not supported in BigQuery ML.options={id(pipe):{'zipmap':False}}# Define input features. scikit-learn does not store information about the# training dataset. It is not always possible to retrieve the number of features# or their types. That's why the function needs another argument called initial_types.initial_types=[('sepal_length',FloatTensorType([None,1])),('sepal_width',FloatTensorType([None,1])),('petal_length',FloatTensorType([None,1])),('petal_width',FloatTensorType([None,1])),]# Convert the model.model_onnx=convert_sklearn(pipe,'pipeline_rf',initial_types=initial_types,options=options)# And save.withopen('pipeline_rf.onnx','wb')asf:f.write(model_onnx.SerializeToString())

Upload the ONNX model to Cloud Storage

After you save your model, do the following:

Create a dataset

Create a BigQuery dataset to store your ML model.

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to the BigQuery page

  2. In theExplorer pane, click your project name.

  3. ClickView actions > Create dataset

  4. On theCreate dataset page, do the following:

    • ForDataset ID, enterbqml_tutorial.

    • ForLocation type, selectMulti-region, and then selectUS (multiple regions in United States).

    • Leave the remaining default settings as they are, and clickCreate dataset.

bq

To create a new dataset, use thebq mk commandwith the--location flag. For a full list of possible parameters, see thebq mk --dataset commandreference.

  1. Create a dataset namedbqml_tutorial with the data location set toUSand a description ofBigQuery ML tutorial dataset:

    bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial

    Instead of using the--dataset flag, the command uses the-d shortcut.If you omit-d and--dataset, the command defaults to creating adataset.

  2. Confirm that the dataset was created:

    bqls

API

Call thedatasets.insertmethod with a defineddataset resource.

{"datasetReference":{"datasetId":"bqml_tutorial"}}

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

importgoogle.cloud.bigquerybqclient=google.cloud.bigquery.Client()bqclient.create_dataset("bqml_tutorial",exists_ok=True)

Import the ONNX model into BigQuery

The following steps show you how to import the sample ONNX model fromCloud Storage by using aCREATE MODEL statement.

To import the ONNX model into your dataset, select one of the following options:

Console

  1. In the Google Cloud console, go to theBigQuery Studiopage.

    Go to BigQuery Studio

  2. In the query editor, enter the followingCREATE MODEL statement.

    CREATEORREPLACEMODEL`bqml_tutorial.imported_onnx_model`OPTIONS(MODEL_TYPE='ONNX',MODEL_PATH='BUCKET_PATH')

    ReplaceBUCKET_PATH with the path to the modelthat you uploaded to Cloud Storage. If you're using the sample model,replaceBUCKET_PATH with the following value:gs://cloud-samples-data/bigquery/ml/onnx/pipeline_rf.onnx.

    When the operation is complete, you see a message similar to thefollowing:Successfully created model named imported_onnx_model.

    Your new model appears in theResources panel. Models areindicated by the model icon:The model icon in the Resources panelIf you select the new model in theResources panel, informationabout the model appears adjacent to theQuery editor.

    The information panel for `imported_onnx_model`

bq

  1. Import the ONNX model from Cloud Storage by entering thefollowingCREATE MODEL statement.

    bqquery--use_legacy_sql=false\"CREATE OR REPLACE MODEL`bqml_tutorial.imported_onnx_model`OPTIONS(MODEL_TYPE='ONNX',  MODEL_PATH='BUCKET_PATH')"

    ReplaceBUCKET_PATH with the path to the modelthat you uploaded to Cloud Storage. If you're using the sample model,replaceBUCKET_PATH with the following value:gs://cloud-samples-data/bigquery/ml/onnx/pipeline_rf.onnx.

    When the operation is complete, you see a message similar to thefollowing:Successfully created model named imported_onnx_model.

  2. After you import the model, verify that the model appears in thedataset.

    bq ls -m bqml_tutorial

    The output is similar to the following:

    tableIdType----------------------------imported_onnx_modelMODEL

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

Import the model by using theONNXModel object.

importbigframesfrombigframes.ml.importedimportONNXModelbigframes.options.bigquery.project=PROJECT_ID# You can change the location to one of the valid locations: https://cloud.google.com/bigquery/docs/locations#supported_locationsbigframes.options.bigquery.location="US"imported_onnx_model=ONNXModel(model_path="gs://cloud-samples-data/bigquery/ml/onnx/pipeline_rf.onnx")

For more information about importing ONNX models into BigQuery,including format and storage requirements, seeTheCREATE MODEL statement forimporting ONNX models.

Make predictions with the imported ONNX model

After importing the ONNX model, you use theML.PREDICT function to makepredictions with the model.

The query in the following steps usesimported_onnx_model to make predictionsusing input data from theiris table in theml_datasets public dataset. TheONNX model expects fourFLOAT values as input:

  • sepal_length
  • sepal_width
  • petal_length
  • petal_width

These inputs match theinitial_types that were defined when youconverted themodel into ONNX format.

The outputs include thelabel andprobabilities columns, and the columnsfrom the input table.label represents the predicted class label.probabilities is an array of probabilities representing probabilities foreach class.

To make predictions with the imported ONNX model, chooseone of the following options:

Console

  1. Go to theBigQuery Studio page.

    Go to BigQuery Studio

  2. In the query editor, enter this query that uses theML.PREDICTfunction.

    SELECT*FROMML.PREDICT(MODEL`bqml_tutorial.imported_onnx_model`,(SELECT*FROM`bigquery-public-data.ml_datasets.iris`))

    The query results are similar to the following:

    The output of the ML.PREDICT query

bq

Run the query that usesML.PREDICT.

bq query --use_legacy_sql=false \'SELECT*FROM ML.PREDICT(MODEL `example_dataset.imported_onnx_model`,(SELECT * FROM`bigquery-public-data.ml_datasets.iris`))'

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

Use thepredict function to run the ONNX model.

importbigframes.pandasasbpddf=bpd.read_gbq("bigquery-public-data.ml_datasets.iris")predictions=imported_onnx_model.predict(df)predictions.peek(5)

The result is similar to the following:

The output of the predict function

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

Console

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

gcloud

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

Delete individual resources

Alternatively, to remove the individual resources used in this tutorial, do thefollowing:

  1. Delete the imported model.

  2. Optional:Delete the dataset.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.