Run a managed notebooks instance on a Dataproc cluster
Vertex AI Workbench managed notebooks isdeprecated. On April 14, 2025, support for managed notebooks ended and the ability to create managed notebooks instances was removed. Existing instances will continue to function until March 30, 2026, but patches, updates, and upgrades won't be available. To continue using Vertex AI Workbench, we recommend that youmigrate your managed notebooks instances to Vertex AI Workbench instances.
This page shows you how to run a managed notebooks instance'snotebook file on a Dataproc cluster.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the Notebooks and Dataproc APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the Notebooks and Dataproc APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.
Required roles
To ensure that the service account has the necessary permissions to run a notebook file on a Serverless for Apache Spark cluster, ask your administrator to grant the service account the following IAM roles:
Important: You must grant these roles to the service account,not to your user account. Failure to grant the roles to the correct principal might result in permission errors.- Dataproc Worker (
roles/dataproc.worker) on your project - Dataproc Editor (
roles/dataproc.editor) on the cluster for thedataproc.clusters.usepermission
For more information about granting roles, seeManage access to projects, folders, and organizations.
These predefined roles contain the permissions required to run a notebook file on a Serverless for Apache Spark cluster. To see the exact permissions that are required, expand theRequired permissions section:
Required permissions
The following permissions are required to run a notebook file on a Serverless for Apache Spark cluster:
dataproc.agents.createdataproc.agents.deletedataproc.agents.getdataproc.agents.updatedataproc.tasks.leasedataproc.tasks.listInvalidatedLeasesdataproc.tasks.reportStatusdataproc.clusters.use
Your administrator might also be able to give the service account these permissions withcustom roles or otherpredefined roles.
Create a Dataproc cluster
To run a managed notebooks instance's notebook filein a Dataproc cluster, your cluster must meet the followingcriteria:
The cluster's component gateway must be enabled.
The cluster must havetheJupyter component.
The cluster must be in the same region asyour managed notebooks instance.
To create your Dataproc cluster,enter the following command in eitherCloud Shell or anotherenvironment where theGoogle Cloud CLI is installed.
gclouddataprocclusterscreateCLUSTER_NAME\--region=REGION\--enable-component-gateway\--optional-components=JUPYTER
Replace the following:
REGION: the Google Cloud location ofyour managed notebooks instanceCLUSTER_NAME: the name of your newcluster
After a few minutes, your Dataproc clusteris available for use.Learn more about creating Dataprocclusters.
Open JupyterLab
In the Google Cloud console, go to theManaged notebooks page.
Next to your managed notebooks instance's name,clickOpen JupyterLab.
Run a notebook file in your Dataproc cluster
You can run a notebook file in your Dataproc clusterfrom any managed notebooks instance in the same project andregion.
Run a new notebook file
In your managed notebooks instance's JupyterLab interface,selectFile >New > Notebook.
Your Dataproc cluster's available kernels appear intheSelect kernel menu. Select the kernel that you want to use,and then clickSelect.
Your new notebook file opens.
Add code to your new notebook file, and run the code.
To change the kernel that you want to useafter you've created your notebook file, see the following section.
Run an existing notebook file
In your managed notebooks instance's JupyterLab interface,click the File Browser button,navigate to the notebook file that you want to run, and open it.
To open theSelect kernel dialog, click the kernel name of your notebookfile, for example:Python (Local).
To select a kernel from your Dataproc cluster,select a kernel name that includes your cluster name at the end of it.For example, a PySpark kernel on a Dataproc clusternamed
myclusteris namedPySpark on mycluster.ClickSelect to close the dialog.
You can now run your notebook file's codeon the Dataproc cluster.
What's next
- Learn more aboutDataproc.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-17 UTC.