Dataproc optional Jupyter component Stay organized with collections Save and categorize content based on your preferences.
You can install additional components like Jupyter when you create a Dataproccluster using theOptional componentsfeature. This page describes the Jupyter component.
TheJupyter componentis a Web-basedsingle-user notebook for interactive data analytics and supports theJupyterLabWeb UI. The Jupyter Web UI is available on port8123 on the cluster's first master node.
Launch notebooks for multiple users. You can create a Dataproc-enabledVertex AI Workbench instanceorinstall the Dataproc JupyterLab pluginon a VM to serve notebooks to multiple users.
Configure Jupyter. Jupyter can be configured by providingdataproc:jupytercluster properties.To reduce the risk of remote code execution over unsecured notebook serverAPIs, the defaultdataproc:jupyter.listen.all.interfaces cluster propertysetting isfalse, which restricts connections tolocalhost (127.0.0.1) whentheComponent Gateway isenabled (Component Gateway activation is required when installing the Jupyter component).
The Jupyter notebook provides a Python kernel to runSpark code, and aPySpark kernel. By default, notebooks aresaved in Cloud Storagein the Dataproc staging bucket, which is specified by the user orauto-createdwhen the cluster is created. The location can be changed at cluster creation time using thedataproc:jupyter.notebook.gcs.dir cluster property.
Work with data files. You can use a Jupyter notebook to work with data files that have beenuploaded to Cloud Storage.Since theCloud Storage connectoris pre-installed on a Dataproc cluster, you can reference thefiles directly in your notebook. Here's an example that accesses CSV files inCloud Storage:
df = spark.read.csv("gs://bucket/path/file.csv")df.show()SeeGeneric Load and Save Functionsfor PySpark examples.
Install Jupyter
Install the component when you create a Dataproc cluster.The Jupyter component requires activation of the DataprocComponent Gateway.
Note: Only when usingimage version 1.5,installation of the Jupyter component also requires installation of theAnaconda component.Console
- Enable the component.
- In the Google Cloud console, open the DataprocCreate a cluster page. TheSet up cluster panel is selected.
- In theComponents section:
- UnderOptional components, select theJupyter component.
- UnderComponent Gateway, selectEnable component gateway (seeViewing and Accessing Component Gateway URLs).
gcloud CLI
To create a Dataproc cluster that includes the Jupyter component,use thegcloud dataproc clusters createcluster-name command with the--optional-components flag.
Latest default image version example
The following example installs the Jupytercomponent on a cluster that uses the latest default image version.
gcloud dataproc clusters createcluster-name \ --optional-components=JUPYTER \ --region=region \ --enable-component-gateway \ ... other flags
REST API
The Jupyter componentcan be installed through the Dataproc API usingSoftwareConfig.Componentas part of aclusters.create request.
- Set theEndpointConfig.enableHttpPortAccessproperty to
trueas part of theclusters.createrequest to enable connecting to the Jupyter notebook Web UI using theComponent Gateway.
Open the Jupyter and JupyterLab UIs
Click theGoogle Cloud console Component Gateway linksto open in your local browser the Jupyter notebook or JupyterLab UI running onthe cluster master node.
Select "GCS" or "Local Disk" to create a new Jupyter Notebook ineither location.
Attach GPUs to master and worker nodes
You canadd GPUsto your cluster's master and worker nodes when using a Jupyter notebook to:
- Preprocess data in Spark, then collect aDataFrame onto the master and runTensorFlow
- Use Spark to orchestrate TensorFlow runs in parallel
- RunTensorflow-on-YARN
- Use with other machine learning scenarios that use GPUs
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.