Troubleshooting Vertex AI Workbench

This page describes troubleshooting steps that you might find helpful if you runinto problems when you use Vertex AI Workbench.

See alsoTroubleshooting Vertex AI for help using other components of Vertex AI.

To filter this page's content, click a topic:

Vertex AI Workbench instances

This section describes troubleshooting steps for Vertex AI Workbenchinstances.

Troubleshooting with AI Tools

This section discusses how to use AI tools for troubleshooting;

Troubleshooting with Cloud Assistance Investigations

When connecting Vertex AI with other Google Cloud products, you may findGemini Cloud Assist Investigations to behelpful in troubleshooting integration issues. It may also acceleratetroubleshooting on the instance itself. Gemini Cloud Assist lets you draw insights from metrics and logs generated by the instance.

Stop the instance and follow theView in Compute Engine link.
Install theOps Agent (Recommended). This will take a few minutes
Add aCustom Metadata fieldnotebook-enable-debug and set this totrue
Restart the instance and reproduce the issue.
Enable and configure the Cloud Assist Investigations API.
Create an new investigation and describe the issue in detail using a naturallanguage prompt.
As you type, a dialog appears that suggests resources to add to theinvestigation. Review this list and be sure to add the instance as a resource as well as any otherresources in this list ofsupported products
Start the investigation and review the results.

Troubleshoot diagnostic files with Gemini CLI

You may use the results of from the Cloud Assistance Investigation to informfurther AI driven investigation on the diagnostic file from the instance.

Run the diagnostic tool and specify a Cloud Storage bucket to upload the results.

sudo/opt/deeplearning/bin/diagnostic_tool.sh[--repair][--bucket=$BUCKET]

Download the diagnosting file to your workstation then install and configureGemini CLI.
Start the application then describe your issue. Include the hypothesis from theCloud Assistance investigation in the context. Ask the model to extend the investigationby reading the contents of the diagnostic file using natural language prompts.

Connecting to and opening JupyterLab

This section describes troubleshooting steps for connecting to and openingJupyterLab.

Nothing happens after clicking Open JupyterLab

Issue

When you clickOpen JupyterLab, nothing happens.

Solution

Verify that your browser doesn't block new tabs from opening automatically.JupyterLab opens in a new browser tab.

Can't access the terminal in a Vertex AI Workbench instance

Issue

If you're unable to access the terminal or can't find the terminal window in thelauncher, it could be because your Vertex AI Workbench instance doesn't have terminal access enabled.

Solution

You must create a new Vertex AI Workbench instance with theTerminal access option enabled. This option can't be changed after instancecreation.

502 error when opening JupyterLab

Issue

A 502 error might mean that your Vertex AI Workbench instance isn't readyyet.

Solution

Wait a few minutes, refresh the Google Cloud console browser tab, and tryagain.

Notebook is unresponsive

Issue

Your Vertex AI Workbench instance isn't running cells or appears to befrozen.

Solution

First try restarting the kernel by clickingKernel from the top menu andthenRestart Kernel. If that doesn't work, you can try the following:

Refresh the JupyterLab browser page. Unsaved cell output doesn't persist, soyou must run those cells again to regenerate the output.
Reset your instance.

Unable to connect with Vertex AI Workbench instance using SSH

Issue

You're unable to connect to your instance by using SSH through a terminal window.

Vertex AI Workbench instances useOS Login toenable SSH access. When you create an instance, Vertex AI Workbench enablesOS Login by default by setting the metadata keyenable-oslogin toTRUE. Ifyou're unable to use SSH to connect to your instance, this metadata key mightneed to be set toTRUE.

Solution

Connecting to a Vertex AI Workbench instance by using the Google Cloud consoleisn't supported. If you're unable to connect to your instance by using SSHthrough a terminal window, see the following:

To set the metadata keyenable-oslogin toTRUE, use theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK.

GPU quota has been exceeded

Issue

You're unable to create a Vertex AI Workbench instance with GPUs.

Solution

Determine the number of GPUs available in your project bychecking the quotas page.If GPUs aren't listed on the quotas page, or you require additional GPU quota,you can request a quota increase for Compute Engine GPUs. SeeRequest ahigher quota limit.

Creating Vertex AI Workbench instances

This section describes how to troubleshoot issues related to creating Vertex AI Workbenchinstances.

Instance stays in pending state indefinitely or is stuck in provisioning status

Issue

After creating a Vertex AI Workbench instance, it stays inthe pending state indefinitely. An error like the following might appearin the serial logs:

Could not resolve host: notebooks.googleapis.com

If your instance is stuck in provisioning status, this could be because you havean invalid private networking configuration for your instance.

Solution

Tip: You can usegcpdiag to investigate an instance stuck in provisioning status.

Follow the steps in theInstance logs show connection or timeout errorssection.

Unable to create an instance within a Shared VPC network

Issue

Attempting to create an instance within a Shared VPC network results inan error message like the following:

Required 'compute.subnetworks.use' permission for'projects/network-administration/regions/us-central1/subnetworks/v'

Solution

The issue is that theNotebooks Service Accountis attempting to create the instance without the correct permissions.

To ensure that the Notebooks Service Account has the necessary permissions to ensure that the Notebooks Service Account can create a Vertex AI Workbench instance within a Shared VPC network, ask your administrator to grant the Notebooks Service Account the Compute Network User role (roles/compute.networkUser) IAM role on the host project.Important: You must grant this role to the Notebooks Service Account,not to your user account. Failure to grant the role to the correct principal might result in permission errors. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to ensure that the Notebooks Service Account can create a Vertex AI Workbench instance within a Shared VPC network. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to ensure that the Notebooks Service Account can create a Vertex AI Workbench instance within a Shared VPC network:

To use subnetworks: compute.subnetworks.use

Your administrator might also be able to give the Notebooks Service Account these permissions withcustom roles or otherpredefined roles.

Can't create a Vertex AI Workbench instance with a custom container

Issue

There isn't an option to use a custom container when creating aVertex AI Workbench instance in the Google Cloud console.

Solution

Adding a custom container to a Vertex AI Workbench instance isn'tsupported, and you can't add a custom container by usingthe Google Cloud console.

Adding a conda environment is recommendedinstead of using a custom container.

You can add a custom container to a Vertex AI Workbench instance by usingtheNotebooksAPI,but this capability isn't supported.

Mount shared storage button isn't there

Issue

TheMount shared storage button isn't in theFile Browser tab of theJupyterLab interface.

Solution

Thestorage.buckets.list permission is required for theMount shared storage button to appear in the JupyterLab interface of yourVertex AI Workbench instance. Ask your administrator to grant yourVertex AI Workbench instance's service account thestorage.buckets.list permission on the project.

599 error when using Dataproc

Issue

Attempting to create a Dataproc-enabled instanceresults in an error message like the following:

HTTP 599: Unknown (Error from Gateway: [Timeout while connecting]Exception while attempting to connect to Gateway server url.Ensure gateway url is valid and the Gateway instance is running.)

Solution

In your Cloud DNS configuration, add a Cloud DNS entry for the*.googleusercontent.com domain.

Unable to install third-party JupyterLab extension

Issue

Attempting to install a third-party JupyterLab extension results in anError: 500 message.

Solution

Third-party JupyterLab extensions aren't supported in Vertex AI Workbenchinstances.

Unable to edit underlying virtual machine

Issue

When you try to edit the underlying virtual machine (VM) of a Vertex AI Workbenchinstance, you might get an error message similar to the following:

Current principal doesn't have permission to mutate this resource.

Solution

This error occurs because you can't edit the underlying VM of an instance byusing the Google Cloud console or the Compute Engine API.

To edit a Vertex AI Workbench instance's underlying VM, use theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK

`pip` packages aren't available after adding conda environment

Issue

Yourpip packages aren't available after you add a conda-based kernel.

Solution

To resolve the issue, seeAdd a conda environment and try thefollowing:

Check that you used theDL_ANACONDA_ENV_HOME variable and that it containsthe name of your environment.
Check thatpip is located in a path similar toopt/conda/envs/ENVIRONMENT/bin/pip. You can runthewhich pip command to get the path.

Unable to access or copy data of an instance with single user access

Issue

The data on an instance with single user access is inaccessible.

For Vertex AI Workbench instances that are set up with single user access,only the specified single user (the owner) can access the data on the instance.

Solution

To access or copy the data when you aren't the owner of the instance, open asupport case.

Unexpected shutdown

Issue

Your Vertex AI Workbench instance shuts down unexpectedly.

Solution

If your instance shuts down unexpectedly, this could be becauseidle shutdown was initiated.

If you enabled idle shutdown, your instance shuts down when there is no kernelactivity for the specified time period. For example, running a cell or newoutput printing to a notebook is activity that resets the idle timeout timer.CPU usage doesn't reset the idle timeout timer.

Instance logs show connection or timeout errors

Issue

Your Vertex AI Workbench instance's logs show connection or timeout errors.

Solution

If you notice connection or timeout errors in the instance's logs make sure thatthe Jupyter server is running on port 8080. Follow the steps in theVerify that the Jupyter internal API is active section.

If you have turned offExternal IP and you are using a private VPCnetwork make sure you have also followed thenetwork configuration options documentation.Consider the following:

You must enable Private Google Access on the chosen subnetwork in the sameregion where your instance is located in the VPC host project.For more information on configuring Private Google Access, see thePrivate Google Access documentation.
If you're using Cloud DNS, the instance must be able to resolve therequired Cloud DNS domains specified in thenetwork configuration options documentation.To verify this, follow the steps in theVerify the instance can resolve the required DNS domains section.

Instance logs show 'Unable to contact Jupyter API' 'ReadTimeoutError'

Issue

Your Vertex AI Workbench instance logs show an error such as:

notebooks_collection_agent. Unable to contact Jupyter API:HTTPConnectionPool(host=\'127.0.0.1\', port=8080):Max retries exceeded ReadTimeoutError(\"HTTPConnectionPool(host=\'127.0.0.1\', port=8080

Solution

Follow the steps in theInstance logs show connection or timeout errors section.You can also trymodifying the Notebooks Collection Agent scriptto changeHTTP_TIMEOUT_SESSION to a larger value,for example:60, to help verify whether the request has failed due tothe call taking too long to respond or the requested URL can't be reached.

`docker0` address conflicts with VPC addressing

Issue

By default, thedocker0 interface is created with an IP address of172.17.0.1/16. This might conflict with the IP addressing in your VPC network such that the instance is unable to connect to other endpoints with172.17.0.1/16 addresses.

Solution

You can force thedocker0 interface to be created with an IP address thatdoesn't conflict with your VPC network by using the followingpost-startup script and setting the post-startup script behavior torun_once.

#!/bin/bash# Wait for Docker to be fully startedwhile!systemctlis-activedocker;dosleep1done# Stop the Docker servicesystemctlstopdocker# Modify /etc/docker/daemon.jsoncat </etc/docker/daemon.json{"bip":"CUSTOM_DOCKER_IP/16"}EOF# Restart the Docker servicesystemctlstartdocker

Specified reservations don't exist

Issue

The operation to create the instance results in aSpecified reservations donot exist error message. The operation's output might be similar to the following:

{"name":"projects/PROJECT/locations/LOCATION/operations/OPERATION_ID","metadata":{"@type":"type.googleapis.com/google.cloud.notebooks.v2.OperationMetadata","createTime":"2025-01-01T01:00:01.000000000Z","endTime":"2025-01-01T01:00:01.000000000Z","target":"projects/PROJECT/locations/LOCATION/instances/INSTANCE_NAME","verb":"create","requestedCancellation":false,"apiVersion":"v2","endpoint":"CreateInstance"},"done":true,"error":{"code":3,"message":"Invalid value for field 'resource.reservationAffinity': '{  \"consumeReservationType\": \"SPECIFIC_ALLOCATION\",  \"key\": \"compute.googleapis.com/reservation-name...'. Specified reservations [projects/PROJECT/zones/ZONE/futureReservations/RESERVATION_NAME] do not exist.","details":[{"@type":"type.googleapis.com/google.rpc.RequestInfo","requestId":"REQUEST_ID"}]}}

Solution

Some Compute Engine machine types require additional parameters at creationsuch as local SSDs or a minimum CPU platform. The instance specification mustinclude these additional fields.

Vertex AI Workbench instances use automatic minimum CPU platform bydefault. If your reservation sets a specific platform, you need to set themin_cpu_platform accordingly when creating Vertex AI Workbench instances.
Vertex AI Workbench instances always set the number of local SSDs to thedefault values according to the machine type.For example,a2-ultragpu-1g always has 1 local SSD, whilea2-highgpu-1galways has 0 local SSD. When creating reservations to be used forVertex AI Workbench instances, you need to leave the local SSD to itsdefault value.

Managed notebooks

Vertex AI Workbench managed notebooks is deprecated. On April 14, 2025, support for managed notebooks ended and the ability to create managed notebooks instances was removed. Existing instances will continue to function until March 30, 2026, but patches, updates, and upgrades won't be available. To continue using Vertex AI Workbench, we recommend that youmigrate your managed notebooks instances to Vertex AI Workbench instances.

This section describes troubleshooting steps for managed notebooks.

Connecting to and opening JupyterLab

This section describes troubleshooting issues with connecting to and openingJupyterLab.

Nothing happens after clicking Open JupyterLab

Issue

When you clickOpen JupyterLab, nothing happens.

Solution

Verify that your browser doesn't block new tabs from opening automatically.JupyterLab opens in a new browser tab.

Unable to connect with managed notebooks instance using SSH

Issue

There isn't an option to connect with managed notebooks instancesby using SSH.

Solution

SSH access to managed notebooks instances isn't available.

Can't access the terminal in a managed notebooks instance

Issue

If you're unable to access the terminal or can't find the terminal window in thelauncher, it could be because your managed notebooks instancedoesn't have terminal access enabled.

Solution

You must create a new managed notebooks instance with theTerminal access option enabled. This option can't be changed after instancecreation.

502 error when opening JupyterLab

Issue

A 502 error might mean that your managed notebooks instance isn'tready yet.

Solution

Wait a few minutes, refresh the Google Cloud console browser tab, and tryagain.

Opening a notebook results in a 524 (A Timeout Occurred) error

Issue

A 524 error is usually an indication that the Inverting Proxy agent isn'tconnecting to the Inverting Proxy server or the requests are taking too long onthe backend server side (Jupyter). Common causes of this error includenetworking issues, the Inverting Proxy agent isn't running, or the Jupyterservice isn't running.

Solution

Verify that your managed notebooks instance is started.

Notebook is unresponsive

Issue

managed notebooks instance isn't running cells or appears to befrozen.

Solution

First try restarting the kernel by clickingKernel from the top menu andthenRestart Kernel. If that doesn't work, you can try the following:

Refresh the JupyterLab browser page. Unsaved cell output doesn't persist, soyou must run those cells again to regenerate the output.
Reset your instance.

Migrating to Vertex AI Workbench instances

This section describes methods for diagnosing and resolving issues withmigrating from a managed notebooks instance to a Vertex AI Workbenchinstance.

Can't find a kernel that was in the managed notebooks instance

Issue

A kernel that was in your managed notebooks instance doesn'tappear in the Vertex AI Workbench instance that you migrated to.

Custom containers appear as kernels in managed notebooks.The Vertex AI Workbench migration tool doesn't support custom containermigration.

Solution

To resolve this issue,add a conda environmentto your Vertex AI Workbench instance.

Different version of framework in migrated instance

Issue

A framework that was in your managed notebooks instance was adifferent version than the one in the Vertex AI Workbench instance that youmigrated to.

Vertex AI Workbench instances provide a default set of framework versions.The migration tool doesn't add framework versions from your originalmanaged notebooks instance. Seedefault migration toolbehaviors.

Solution

To add a specific version of a framework,add a conda environmentto your Vertex AI Workbench instance.

GPUs aren't migrated to the new Vertex AI Workbench instance

Issue

GPUs that were in your managed notebooks instance aren't in theVertex AI Workbench instance that you migrated to.

Vertex AI Workbench instances support a default set of GPUs. If the GPUs inyour original managed notebooks instance aren't available, yourinstance is migrated without any GPUs.

Solution

After migration, you can add GPUs to your Vertex AI Workbench instance byusing theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK.

Migrated instance's machine type is different

Issue

The machine type of your managed notebooks instance is differentfrom the Vertex AI Workbench instance that you migrated to.

Vertex AI Workbench instances don't support all machine types. If themachine type in your original managed notebooks instance isn'tavailable, your instance is migrated to thee2-standard-4 machine type.

Solution

After migration, you can change the machine type of your Vertex AI Workbenchinstance by using theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK.

GPU quota has been exceeded

Issue

You are unable to create a managed notebooks instance with GPUs.

Solution

Determine the number of GPUs available in your project bychecking the quotas page.If GPUs aren't listed on the quotas page, or you require additional GPU quota, you can request a quota increase. SeeRequest a higher quota limit.

Using container images

This section describes troubleshooting issues with using container images.

Container image doesn't appear as a kernel in JupyterLab

Issue

Container images that don't have a valid kernelspec don't successfully load askernels in JupyterLab.

Solution

Make sure that your container meets our requirements. For more information,see thecustom container requirements.

Notebook disconnects on long-running job

Issue

If you see the following error message when running a job in a notebook, itmight be caused by the request taking too long to load, or the CPU or memoryutilization is high, which can make the Jupyter Service unresponsive.

{"log":"2021/06/29 18:10:33 failure fetching a VM ID: compute: Received 500`internal error`\n","stream":"stderr","time":"2021-06-29T18:10:33.383650241Z"}{"log":"2021/06/29 18:38:26 Websocket failure: failed to read a websocketmessage from the server: read tcp [::1]:40168-\u003e[::1]:8080: use of closednetwork connection\n","stream":"stderr","time":"2021-06-29T18:38:26.057622824Z"}

Solution

This issue is caused by running a long-running job within a notebook.To run a job that might take a long time to complete, it's recommended tousethe executor.

Using the executor

This section describes troubleshooting issues with using executor.

Package installations not available to the executor

Note: Make sure that your Virtual Private Cloud allows access to PyPI and VPC Service Controls, and that your organization policy doesn't block access.

Issue

The executor runs your notebook code in a separate environment from the kernelwhere you run your notebook file's code. Because of this, some of the packagesyou installed might not be available in the executor's environment.

Solution

To resolve this issue, see Ensure package installations are available tothe executor.

401 or 403 errors when running the notebook code using the executor

Issue

A 401 or 403 error when you run the executor can mean that the executor isn'table to access resources.

Solution

See the following for possible causes:

The executor runs your notebook code in atenant projectseparate from your managed notebooks instance's project.Therefore, when you access resources through code run by the executor,the executor might not connect to the correct Google Cloud project bydefault. To resolve this issue,use explicit projectselection.
By default, your managed notebooks instance can have accessto resources that exist in the same project, and therefore, when you runyour notebook file's code manually, these resources don't need additionalauthentication. However, because the executor runs in a separate tenantproject, it does not have the same default access. To resolve this issue,authenticate access using service accounts.
The executor can't use end-user credentials to authenticate access toresources, for example, thegcloud auth login command.To resolve this issue,authenticate access using service accounts.

`exited with a non-zero status of 127` error when using the executor

Issue

Anexited with a non-zero status of 127 error, or "command not found" error,can happen when you use the executor to run code on a custom container thatdoesn't have thenbexecutor extension installed.

Solution

To ensure that your custom container has thenbexecutor extension, you cancreate a derivative container image from a Deep Learning Containers image.Deep Learning Containers images include thenbexecutor extension.

Invalid service networking configuration error message

Issue

This error might look like the following:

Invalid Service Networking configuration. Couldn't find free blocks in allocated IP ranges.Please use a valid range using: /24 mask or below (/23,/22, etc).

This means that no free blocks were found in the allocated IP ranges of yournetwork.

Solution

Use a subnet mask of/24 or lower.Create a bigger allocatedIP address rangeand attach this range bymodifying theprivate service connectionforservicenetworking-googleapis-com.

For more information, seeSet up a network.

Unable to install third-party JupyterLab extension

Issue

Attempting to install a third-party JupyterLab extension results in anError: 500 message.

Solution

Third-party JupyterLab extensions aren't supported in managed notebooks instances.

Unable to access or copy data of an instance with single user access

Issue

The data on an instance with single user access is inaccessible.

Solution

For managed notebooks instances that are set up with single useraccess, only the specified single user (the owner) can access the data on theinstance.

To access or copy the data when you aren't the owner of the instance, open asupport case.

Unexpected shutdown

Issue

Your Vertex AI Workbench instance shuts down unexpectedly.

Solution

If your instance shuts down unexpectedly, this could be becauseidle shutdown was initiated.

Restore instance

Issue

Restoring a managed notebooks instance after it's been deletedisn't supported.

Solution

To back up the data on your instance, you cansave your notebooks to GitHub.

Recover data from an instance

Issue

Recovering data from a managed notebooks instance after it's beendeleted isn't supported.

Solution

To back up the data on your instance, you cansave your notebooks to GitHub.

Creating managed notebooks instances

Creating a managed notebooks instance isn't supported.For more information, seeDeprecations.

Starting an instance results in a resource availability error

Issue

You're unable to start an instance because of a resource availability error.

This error can look like the following:

The zoneZONE_NAME doesn't have enough resources available to fulfillthe request. '(resource type:compute)'.

Resource errors occur when you try to start an instance in a zone that can'taccommodate your request due to the current unavailability of Compute Engineresources, such as GPUs or CPUs.

Resource errors only apply to the resources you specified in your request at thetime you sent the request, not to all resources in the zone. Resource errorsaren't related to your Compute Engine quota. Resource errors are temporary andcan change frequently based on fluctuating demand.

Solution

To proceed, try the following:

Change the machine type of your instance.
Migrate your files and data to an instance in a different zone.
Attempt the request again later.
Reduce the amount of resources that you're requesting. For example,start a different instance with less GPUs, disks, vCPUs, or memory.

`No route to host` on outbound connections from managed notebooks

Issue

Typically, the only routes you can see in the Google Cloud console are those knownto your own VPC as well as the ranges reserved when you completetheVPC Network Peering configuration.

Managed notebooks instances reside in a Google-managed networkand run a modified version of Jupyter in a Docker networking namespace withinthe instance.

The Docker network interface and Linux bridge on this instance may select alocal IP that conflicts with IP ranges being exported over the peering by yourVPC. These are typically in the172.16.0.0/161 and192.168.10.0/24 ranges, respectively.

In these circumstances, outbound connections from the instance to these rangeswill fail with a complaint that is some variation ofNo route to host despiteVPC routes being correctly shared.

Note: The source IP addresses of outbound connections from the instance arestill translated to the peering range that you selected.

Solution

Invokeifconfig in a terminal session and ensure that no IP addresses on anyvirtual interfaces in the instance conflict with IP ranges that yourVPC is exporting to the peering connection.

User-managed notebooks

Vertex AI Workbench user-managed notebooks is deprecated. On April 14, 2025, support for user-managed notebooks ended and the ability to create user-managed notebooks instances was removed. Existing instances will continue to function until March 30, 2026, but patches, updates, and upgrades won't be available. To continue using Vertex AI Workbench, we recommend that youmigrate your user-managed notebooks instances to Vertex AI Workbench instances.

This section describes troubleshooting steps foruser-managed notebooks.

Connecting to and opening JupyterLab

This section describes troubleshooting issues with connecting to and openingJupyterLab.

Nothing happens after clicking Open JupyterLab

Issue

When you clickOpen JupyterLab, nothing happens.

Solution

Verify that your browser doesn't block new tabs from opening automatically.JupyterLab opens in a new browser tab.

No Inverting Proxy server access to JupyterLab

Issue

You are unable to access JupyterLab.

Vertex AI Workbench uses a Google internal Inverting Proxy server toprovide access to JupyterLab. User-managed notebooks instancesettings, network configuration, and other factors can prevent access toJupyterLab.

Solution

Use SSH to connect to JupyterLab and learn more about why you might not haveaccess through the Inverting Proxy.

Unable to connect with user-managed notebooks instance using SSH

Issue

You're unable to connect to your instance by using SSH through a terminal window.

User-managed notebooks instances useOS Loginto enable SSH access. When you create an instance, Vertex AI Workbenchenables OS Login by default by setting the metadata keyenable-oslogin toTRUE. If you're unable to use SSH to connect to your instance, this metadatakey might need to be set toTRUE.

Solution

To enable SSH access for user-managed notebooks for users,complete thesteps for configuring OS Login roles on user accounts.

Opening a user-managed notebooks instance results in a 403 (Forbidden) error

Issue

A403 (Forbidden) error when opening a user-managed notebooksinstance often means that there is an access issue.

Solution

To troubleshoot access issues, consider the three ways that access can begranted to a user-managed notebooks instance:

Single user
Service account
Project editors

The access mode is configured during user-managed notebooksinstance creation and it is defined in the notebook metadata:

Single user:proxy-mode=mail, proxy-user-mail=user@domain.com
Service account:proxy-mode=service_account
Project editors:proxy-mode=project_editors

Caution: Changing the access mode metadata isn't supported and can make theJupyterLab user interface inaccessible.

If you can't access a notebook when you clickOpen JupyterLab, try thefollowing:

Verify that theproxy-mode metadata entry iscorrect.
Verify that the user accessing the instance has theiam.serviceAccounts.ActAspermission for the instance's service account. The service account is eitherthe Compute Engine default service account or a service accountthat is specified when the instance is created.
If your instance uses single user access with a specified service accountas the single user, seeNo JupyterLab access, single user modeenabled.

The following example shows how to specify a service account when you create aninstance:

gcloud notebooks instances create nb-1 \  --vm-image-family=tf-latest-cpu \  --metadata=proxy-mode=mail,proxy-user-mail=user@domain.com \  --service-account=your_service_account@project_id.iam.gserviceaccount.com \  --location=us-west1-a

When you clickOpen JupyterLab to open a notebook, the notebook opens in anew browser tab. If you are signed in to more than one Google Account, the newtab opens with your default Google Account. If you didn't create youruser-managed notebooks instance with your default Google Account,the new browser tab will show a403 (Forbidden) error.

No JupyterLab access, single user mode enabled

Issue

You are unable to access JupyterLab.

Solution

If a user is unable to access JupyterLab and the instance's access to JupyterLabis set toSingle user only, try the following:

On theUser-managed notebooks page ofthe Google Cloud console,click the name of your instance to open theNotebook details page.
Next toView VM details, clickView in Compute Engine.
On the VM details page, clickEdit.
In theMetadata section,verify that theproxy-mode metadata entry is set tomail.
Verify that theproxy-user-mail metadata entry is set to a valid useremail address, not a service account.
ClickSave.
On theUser-managed notebooks page ofthe Google Cloud console,initialize the updated metadata bystopping your instanceandstarting the instanceback up again.

Opening a notebook results in a 504 (Gateway Timeout) error

Issue

This is an indication of an internal proxy timeout or a backend server (Jupyter)timeout. This can be seen when:

The request never reached the internal Inverting Proxy server
Backend (Jupyter) returns a 504 error.

Solution

Open a Google support case.

Opening a notebook results in a 524 (A Timeout Occurred) error

Issue

The internal Inverting Proxy server hasn't received a response from theInverting Proxy agent for the request within the timeout period. Inverting Proxyagent runs inside your user-managed notebooks instance as aDocker container. A 524 error is usually an indication that the Inverting Proxyagent isn't connecting to the Inverting Proxy server or the requests are takingtoo long on the backend server side (Jupyter). A typical case for this error ison the user side (for example, a networking issue, or the Inverting Proxy agentservice isn't running).

Solution

If you can't access a notebook, verify that your user-managed notebooksinstance is started and try the following:

Option 1: Run thediagnostictool toautomatically check and repair user-managed notebooks coreservices, verify available storage, and generate useful log files. To run thetool in your instance, perform the following steps:

Make sure that your instance is on version M58 or newer.
Connect to your Deep Learning VM Images instance usingSSH.
Run the following command:
```
sudo/opt/deeplearning/bin/diagnostic_tool.sh[--repair][--bucket=$BUCKET]
```
Note that the--repair flag and--bucket flags are optional. The--repair flag will attempt to fix common core service errors, and the--bucket flag will let you specify a Cloud Storage bucket tostore the created log files.
The output of this command will display useful status messages foruser-managed notebooks core services and will export logfiles of its findings.

Option 2: Use the following steps to check specific user-managed notebooksrequirements individually.

Verify that the user-managed notebooks instance disk isn't outof space.
1. Connect to your Deep Learning VM Images instance usingSSH.
2. Run the following command:
```
df-h-T/home/jupyter
```
  If theUse% is more than85%, you need to manuallydelete files from/home/jupyter. As a first step, youcan empty the trash with the following command:
```
sudorm-rf/home/jupyter/.local/share/Trash/*
```
Verify that the Docker service is started.
Verify that the Inverting Proxy agent is running.If the agent is started, try restarting it.
Make sure the Jupyter service is running. If it is,tryrestarting it.
Verify memory utilization in the user-managed notebooks instance.
1. Connect to your Deep Learning VM instance usingSSH.
2. Run the following command:
```
free-t-h
```
  If theused memory is more than85% of thetotal,you should considerchanging themachine type.
3. You caninstall Cloud Monitoringagentto monitor if there is high memory usage inyour user-managed notebooks instance.Seepricing information.
Verify that you are usingDeep Learning VM versionM55 or later. To learn more aboutupgrading, seeUpgrade the environment ofa user-managed notebooks instance.

Opening a notebook results in a 598 (Network read timeout) error

Issue

The Inverting Proxy server hasn't heard from the Inverting Proxy agent at allfor more than 10 minutes. This is a strong indication of an Inverting Proxyagent issue.

Solution

If you can't access a notebook, try the following:

Verify that your user-managed notebooks instance is started.
Verify that the Docker service is started.
Verify that the Inverting Proxy agent is running.If the agent is started, try restarting it.
Make sure the Jupyter service is running. If it is,tryrestarting it.
Verify that you are usingDeep Learning VM versionM55 or later. To learn more aboutupgrading, seeUpgrade the environment ofa user-managed notebooks instance.

Notebook is unresponsive

Issue

Your user-managed notebooks instance isn't running cells orappears to be frozen.

Solution

First try restarting the kernel by clickingKernel from the top menu andthenRestart Kernel. If that doesn't work, you can try the following:

Refresh the JupyterLab browser page. Any unsaved cell output doesn't persist,so you must run those cells again to regenerate the output.
From a terminal session in the notebook, run the commandtop to see if thereare processes consuming the CPU.
From the terminal, check the amount of free disk space using the commanddf,or check the available RAM using the commandfree.
Shut your instance down by selecting it from theUser-managed notebooks pageand clickingStop. After it has stopped completely, select it and clickStart.

Migrating to Vertex AI Workbench instances

This section describes methods for diagnosing and resolving issues withmigrating from a user-managed notebooks instance toa Vertex AI Workbench instance.

Can't find R, Beam, or other kernels that were in the user-managed notebooks instance

Issue

A kernel that was in your user-managed notebooks instance doesn'tappear in the Vertex AI Workbench instance that you migrated to.

Some kernels, such as the R and Beam kernels, aren't available inVertex AI Workbench instances by default. Migration of those kernels isn'tsupported.

Solution

To resolve this issue,add a conda environmentto your Vertex AI Workbench instance.

Can't set up a Dataproc Hub instance in the Vertex AI Workbench instance

Issue

Dataproc Hub isn't supported in Vertex AI Workbench instances.

Solution

Continue to use Dataproc Hub in user-managed notebooksinstances.

Different version of framework in migrated instance

Issue

A framework that was in your user-managed notebooks instancewas a different version than the one in the Vertex AI Workbench instancethat you migrated to.

Vertex AI Workbench instances provide a default set of framework versions.The migration tool doesn't add framework versions from your originaluser-managed notebooks instance. Seedefault migration toolbehaviors.

Solution

To add a specific version of a framework,add a conda environmentto your Vertex AI Workbench instance.

GPUs aren't migrated to the new Vertex AI Workbench instance

Issue

GPUs that were in your user-managed notebooks instancearen't in the Vertex AI Workbench instance that you migrated to.

Vertex AI Workbench instances support a default set of GPUs. If the GPUs inyour original user-managed notebooks instance aren't available,your instance is migrated without any GPUs.

Solution

After migration, you can add GPUs to your Vertex AI Workbench instanceby using theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK.

Migrated instance's machine type is different

Issue

The machine type of your user-managed notebooks instanceis different from the Vertex AI Workbench instance that you migrated to.

Vertex AI Workbench instances don't support all machine types.If the machine type inyour original user-managed notebooks instance isn't available,your instance is migrated to thee2-standard-4 machine type.

Solution

After migration, you can change the machine type of yourVertex AI Workbench instance by using theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK.

Working with files

This section describes troubleshooting issues withfiles for user-managed notebooks instances.

File downloading disabled but user can still download files

Issue

For Dataproc Hub user-managed notebooks instances,disabling file downloading from the JupyterLab user interfaceisn't supported. User-managed notebooks instancesthat use the Dataproc Hub framework permit file downloading evenif you don't selectEnable file downloading from JupyterLab UIwhen you create the instance.

Solution

Dataproc Hub user-managed notebooks instances don'tsupport restricting file downloads.

Downloaded files are truncated or don't complete downloading

Issue

When you download files from your user-managed notebooks instance,a timeout setting on the proxy-forwarding agent limits the connection time forthe download to complete. If the download takes too long, this can truncate yourdownloaded file or prevent it from being downloaded.

Solution

To download the file,copy your file toCloud Storage,and then download the file from Cloud Storage.

Considermigrating your files and datato a new user-managed notebooks instance.

After restarting VM, local files can't be referenced from notebook terminal

Issue

Sometimes after restarting a user-managed notebooks instance,local files can't be referenced from within a notebook terminal.

Solution

This is a known issue. To reference your local files from withina notebook terminal, first re-establish your current working directoryusing the following command:

cdPWD

In this command, replacePWD with your current workingdirectory. For example, if your current working directory was/home/jupyter/, use the commandcd /home/jupyter/.

After re-establishing your current working directory, your local filescan be referenced from within the notebook terminal.

Creating user-managed notebooks instances

Creating a user-managed notebooks instance isn't supported.For more information, seeDeprecations.

Starting an instance results in a resource availability error

Issue

You're unable to start an instance because of a resource availability error.

This error can look like the following:

The zoneZONE_NAME doesn't have enough resources available to fulfillthe request. '(resource type:compute)'.

Resource errors occur when you try to start an instance in a zone that can'taccommodate your request due to the current unavailability of Compute Engineresources, such as GPUs or CPUs.

Solution

To proceed, you can try the following:

Change the machine type of your instance.
Migrate your files and data to an instance in a different zone.
Attempt the request again later.
Reduce the amount of resources that you're requesting. For example,start a different instance with less GPUs, disks, vCPUs, or memory.

Upgrading user-managed notebooks instances

This section describes troubleshooting issues withupgrading user-managed notebooks instances.

Unable to upgrade because unable to get instance disk information

Issue

Upgrade isn't supported for single-disk user-managed notebooksinstances.

Solution

You might want tomigrate your user data to a new user-managed notebooksinstance.

Unable to upgrade because instance isn't UEFI compatible

Issue

Vertex AI Workbench depends on UEFI compatibility to complete an upgrade.

User-managed notebooks instances created from some older imagesare not UEFI compatible, and therefore can't be upgraded.

Solution

To verify that your instance is UEFI compatible, type the following command ineitherCloud Shellor any environment where theGoogle Cloud CLI is installed.

gcloudcomputeinstancesdescribeINSTANCE_NAME\--zone=ZONE|greptype

Replace the following:

INSTANCE_NAME: the name of your instance
ZONE: the zone where your instance is located

To verify that the image that you used to create your instance isUEFI compatible, use the following command:

gcloudcomputeimagesdescribeVM_IMAGE_FAMILY\--projectdeeplearning-platform-release|greptype

ReplaceVM_IMAGE_FAMILY with theimagefamily name that you used to create your instance.

If you determine that either your instance or image isn't UEFI compatible, youcan attempt to migrate your user data to a new user-managed notebooksinstance. To do so, complete the following steps:

Verify that the image that you want to use to create your new instance isUEFI compatible. To do so, type the following command in eitherCloud Shellor any environment where theGoogle Cloud CLI is installed.
```
gcloudcomputeimagesdescribeVM_IMAGE_FAMILY\--projectdeeplearning-platform-release--format=json|greptype
```
ReplaceVM_IMAGE_FAMILY with the imagefamily name that you want to use to create your instance.
Migrate your user data to a new user-managed notebooksinstance.

User-managed notebooks instance isn't accessible after upgrade

Issue

If the user-managed notebooks instance isn't accessible after anupgrade, there might have been a failure during the replacement of the boot diskimage.

User-managed notebooks instances that can be upgraded aredual-disk, with one boot disk and one data disk. The upgrade process upgradesthe boot disk to a new image while preserving your data on the data disk.

Solution

Complete the following steps to attach a new valid image to the boot disk.

To store values you'll use to complete this procedure, type thefollowing command in eitherCloud Shellor any environment where theGoogle Cloud CLI is installed.
```
exportINSTANCE_NAME=MY_INSTANCE_NAMEexportPROJECT_ID=MY_PROJECT_IDexportZONE=MY_ZONE
```
Replace the following:
- MY_INSTANCE_NAME: the name of your instance
- MY_PROJECT_ID: your project ID
- MY_ZONE: the zone where your instance is located

Use the following command to stop the instance:

gcloudcomputeinstancesstop$INSTANCE_NAME\--project=$PROJECT_ID--zone=$ZONE

Detach the data disk from the instance.

gcloudcomputeinstancesdetach-disk$INSTANCE_NAME--device-name=data\--project=$PROJECT_ID--zone=$ZONE

Delete the instance's VM.

gcloudcomputeinstancesdelete$INSTANCE_NAME--keep-disks=all--quiet\--project=$PROJECT_ID--zone=$ZONE

Use the Notebooks API to delete the user-managed notebooksinstance.

gcloudnotebooksinstancesdelete$INSTANCE_NAME\--project=$PROJECT_ID--location=$ZONE

Create a user-managed notebooks instance using the samename as your previous instance.
```
gcloudnotebooksinstancescreate$INSTANCE_NAME\--vm-image-project="deeplearning-platform-release"\--vm-image-family=MY_VM_IMAGE_FAMILY\--instance-owners=MY_INSTANCE_OWNER\--machine-type=MY_MACHINE_TYPE\--service-account=MY_SERVICE_ACCOUNT\--accelerator-type=MY_ACCELERATOR_TYPE\--accelerator-core-count=MY_ACCELERATOR_CORE_COUNT\--install-gpu-driver\--project=$PROJECT_ID\--location=$ZONE
```
Replace the following:
- MY_VM_IMAGE_FAMILY: theimage familyname
- MY_INSTANCE_OWNER: your instance owner
- MY_MACHINE_TYPE: themachinetype of your instance's VM
- MY_SERVICE_ACCOUNT: the service accountto use with this instance, or use"default"
- MY_ACCELERATOR_TYPE: the accelerator type;for example,"NVIDIA_TESLA_T4"
- MY_ACCELERATOR_CORE_COUNT: the core count;for example,1

Monitoring health status of user-managed notebooks instances

This section describes how to troubleshoot issues with monitoring health statuserrors.

`docker-proxy-agent` status failure

Follow these steps after adocker-proxy-agent status failure:

Verify that the Inverting Proxy agent is running.If not, go to step 3.
Restart the Inverting Proxy agent.
Re-register with the Inverting Proxy server.

`docker-service` status failure

Follow these steps after adocker-service status failure:

`jupyter-service` status failure

Follow these steps after ajupyter-service status failure:

`jupyter-api` status failure

Follow these steps after ajupyter-api status failure:

Boot disk utilization percent

The boot disk space status is unhealthy if the disk space is greater than85% full.

If your boot disk space status is unhealthy, try the following:

From a terminal session in the user-managed notebooks instanceorusing ssh to connect,check the amount of free disk spaceusing the commanddf -H.
Use the commandfind . -type d -size +100M to help you find large filesthat you might be able to delete, but don't delete them unless youare sure you can safely do so. If you aren't sure, you canget help from support.
If the previous steps don't solve your problem,get support.

Data disk utilization percent

The data disk space status is unhealthy if the disk space is greater than85% full.

If your data disk space status is unhealthy, try the following:

From a terminal session in the user-managed notebooks instanceorusing ssh to connect,check the amount of free disk spaceusing the commanddf -h -T /home/jupyter.
Delete large files to increase the available disk space.Use the commandfind . -type d -size +100M to help you find large files.
If the previous steps don't solve your problem,get support.

Unable to install third-party JupyterLab extension

Issue

Attempting to install a third-party JupyterLab extension results in anError: 500 message.

Solution

Third-party JupyterLab extensions aren't supported in user-managed notebooksinstances.

Restore instance

Issue

Restoring a user-managed notebooks instance after it's been deleted isn't supported.

Solution

To back up the data on your instance, you cansave your notebooks to GitHubor make asnapshot of the disk.

Recover data from an instance

Issue

Recovering data from a user-managed notebooks instance after it'sbeen deleted isn't supported.

Solution

To back up the data on your instance, you cansave your notebooks to GitHubor make asnapshot of the disk

Unable to increase shared memory

Issue

You can't increase shared memory on an existing user-managed notebooksinstance.

Solution

However, you can specify a shared memory size when you create auser-managed notebooks instance by using thecontainer-custom-params metadata key, with a value like the following:

--shm-size=SHARED_MEMORY_SIZEgb

ReplaceSHARED_MEMORY_SIZE with the size that you wantin GB.

Helpful procedures

This section describes procedures that you might find helpful.

Use SSH to connect to your user-managed notebooks instance

Use ssh to connect to your instance by typing the followingcommand in eitherCloud Shellor any environment where theGoogle Cloud CLI is installed.

gcloudcomputessh--projectPROJECT_ID\--zoneZONE\INSTANCE_NAME---L8080:localhost:8080

Replace the following:

PROJECT_ID: Your project ID
ZONE: The Google Cloud zone whereyour instance is located
INSTANCE_NAME: The name of yourinstance

You can also connect to your instance by opening your instance'sCompute Engine detail page, and then clicking theSSH button.

Tip: If you can't use ssh to connect to your instance, you can usegcpdiag to troubleshoot this issue

Re-register with the Inverting Proxy server

To re-register the user-managed notebooks instance with theinternal Inverting Proxy server, you can stop and start the VM from theUser-managed notebooks pageor you canuse ssh to connect toyour user-managed notebooks instance and enter:

cd /opt/deeplearning/binsudo ./attempt-register-vm-on-proxy.sh

Verify the Docker service status

To verify the Docker service status you canuse ssh to connect to youruser-managed notebooks instance and enter:

sudo service docker status

Verify that the Inverting Proxy agent is running

To verify if the notebook Inverting Proxy agent is running,use ssh to connect toyour user-managed notebooks instanceand enter:

# Confirm Inverting Proxy agent Docker container is running (proxy-agent)sudo docker ps# Verify State.Status is running and State.Running is true.sudo docker inspect proxy-agent# Grab logssudo docker logs proxy-agent

Verify the Jupyter service status and collect logs

To verify the Jupyter service status you canuse ssh toconnect toyour user-managed notebooks instance and enter:

sudo service jupyter status

To collect Jupyter service logs:

sudo journalctl -u jupyter.service --no-pager

Verify that the Jupyter internal API is active

The Jupyter API should always run on port 8080. You can verify this byinspecting the instance's syslogs for an entry similar to:

Jupyter Server ... running at:http://localhost:8080

To verify that the Jupyter internal API is active you can alsouse ssh to connect to your user-managed notebooks instanceand enter:

curl http://127.0.0.1:8080/api/kernelspecs

You can also measure the time it takes for the API to respond in case therequests are taking too long:

time curl -V http://127.0.0.1:8080/api/statustime curl -V http://127.0.0.1:8080/api/kernelstime curl -V http://127.0.0.1:8080/api/connections

To run these commands in your Vertex AI Workbench instance, open JupyterLab,and create a new terminal.

Restart the Docker service

To restart the Docker service, you can stop and start the VM from theUser-managed notebooks pageor you canuse ssh to connect toyour user-managed notebooks instance and enter:

sudo service docker restart

Restart the Inverting Proxy agent

To restart the Inverting Proxy agent, you can stop and start the VM from theUser-managed notebooks pageor you canuse ssh to connect toyour user-managed notebooks instance and enter:

sudo docker restart proxy-agent

Restart the Jupyter service

To restart the Jupyter service, you can stop and start the VM from theUser-managed notebooks pageor you canuse ssh to connect toyour user-managed notebooks instance and enter:

sudo service jupyter restart

Restart the Notebooks Collection Agent

The Notebooks Collection Agent service runs a Python process in the backgroundthat verifies the status of the Vertex AI Workbench instance's core services.

To restart the Notebooks Collection Agent service, you can stop and start the VM from theGoogle Cloud consoleor you canuse ssh to connect toyour Vertex AI Workbench instance and enter:

sudo systemctl stop notebooks-collection-agent.service

followed by:

sudo systemctl start notebooks-collection-agent.service

To run these commands in your Vertex AI Workbench instance, open JupyterLab,and create a new terminal.

Modify the Notebooks Collection Agent script

Note: Don't modify the Notebooks Collection Agent script unless it is necessary to troubleshoot or resolve an issue with the instance.

To access and edit the script open a terminal in our instance or use sshto connect to your Vertex AI Workbench instance,and enter:

nano /opt/deeplearning/bin/notebooks_collection_agent.py

After editing the file, remember to save it.

Then, you mustrestart the Notebooks Collection Agent service.

Verify the instance can resolve the required DNS domains

To verify that the instance can resolve the required DNS domains, you canuse ssh to connect to your user-managed notebooks instanceand enter:

host notebooks.googleapis.comhost *.notebooks.cloud.google.comhost *.notebooks.googleusercontent.comhost *.kernels.googleusercontent.com

or:

curl --silent --output /dev/null "https://notebooks.cloud.google.com"; echo $?

If the instance has Dataproc enabled, you can verify that the instanceresolves*.kernels.googleusercontent.com by running:

curl --verbose -H "Authorization: Bearer $(gcloud auth print-access-token)" https://${PROJECT_NUMBER}-dot-${REGION}.kernels.googleusercontent.com/api/kernelspecs | jq .

To run these commands in your Vertex AI Workbench instance, open JupyterLab,and create a new terminal.

Make a copy of the user data on an instance

To store a copy of an instance's user datain Cloud Storage, complete the following steps.

Note: You must have terminal access to your instance.Terminal access is manually set when you create an instance. Theterminal access setting can't be changed after the instance is created.

Create a Cloud Storage bucket (optional)

In the same project where your instance is located, create aCloud Storage bucket where you can store your user data.If you already have a Cloud Storage bucket, skip this step.

Create a Cloud Storage bucket:
```
gcloudstoragebucketscreategs://BUCKET_NAME
```
ReplaceBUCKET_NAME with a bucket name that meets the
bucket naming requirements.

Copy your user data

In your instance'sJupyterLab interface, selectFile >New > Terminal to open a terminal window.For user-managed notebooks instances, you can insteadconnect to your instance's terminalbyusing SSH.
Use thegcloud CLI to copy your user datato a Cloud Storage bucket. The following example commandcopies all of the files from your instance's/home/jupyter/ directoryto a directory in a Cloud Storage bucket.
```
gcloud storage cp /home/jupyter/* gs://BUCKET_NAMEPATH --recursive
```
Replace the following:
- BUCKET_NAME: the name of yourCloud Storage bucket
- PATH: the path to the directorywhere you want to copy your files, for example:/copy/jupyter/

Investigate an instance stuck in provisioning by using gcpdiag

Note: When using Vertex AI with Private Google Access to access Google CloudAPIs and web proxies for outbound access, the instances must beconfigured to bypass any web proxies or other network traffic inspection or filtering devices (for example next generation firewalls) for any hostnames in the domains listed in the Private Google Access documentation.

gcpdiagis an open source tool. It is not an officially supported Google Cloud product.You can use thegcpdiag tool to help you identify and fix Google Cloudproject issues. For more information, see thegcpdiag project on GitHub.

Thisgcpdiag runbook investigates potential causes for aVertex AI Workbench instance to get stuck in provisioning status,including the following areas:

Status: Checks the instance's current status to ensure that it is stuck in provisioning and not stopped or active.
Instance's Compute Engine VM boot disk image: Checks whether the instance was created with a custom container, an officialworkbench-instances image, Deep Learning VM Images, or unsupported images that might cause the instance to get stuck in provisioning status.
Custom scripts: Checks whether the instance is using custom startup or post-startup scripts that change the default Jupyter port or break dependencies that might cause the instance to get stuck in provisioning status.
Environment version: Checks whether the instance is using the latest environment version by checking its upgradability. Earlier versions might cause the instance to get stuck in provisioning status.
Instance's Compute Engine VM performance: Checks the VM's current performance to ensure that it isn't impaired by high CPU usage, insufficient memory, or disk space issues that might disrupt normal operations.
Instance's Compute Engine serial port or system logging: Checks whether the instance has serial port logs, which are analyzed to ensure that Jupyter is running on port127.0.0.1:8080.
Instance's Compute Engine SSH and terminal access: Checks whether the instance's Compute Engine VM is running so that the user can SSH and open a terminal to verify that space usage in 'home/jupyter' is lower than 85%. If no space is left, this might cause the instance to get stuck in provisioning status.
External IP turned off: Checks whether external IP access is turned off. An incorrect networking configuration can cause the instance to get stuck in provisioning status.

Docker

You can rungcpdiag using a wrapper that startsgcpdiag in aDocker container. Docker orPodman must be installed.

Copy and run the following command on your local workstation.

curl https://gcpdiag.dev/gcpdiag.sh >gcpdiag && chmod +x gcpdiag

Execute thegcpdiag command.

./gcpdiag runbook vertex/workbench-instance-stuck-in-provisioning \    --parameter project_id=PROJECT_ID \    --parameter instance_name=INSTANCE_NAME \    --parameter zone=ZONE

Viewavailable parameters for this runbook.

Replace the following:

PROJECT_ID: The ID of the project containing the resource.
INSTANCE_NAME: The name of the target Vertex AI Workbench instance within your project.
ZONE: The zone in which your target Vertex AI Workbench instance is located.

Useful flags:

--universe-domain: If applicable, theTrusted Partner Sovereign Cloud domain hosting the resource
--parameter or-p: Runbook parameters

For a list and description of allgcpdiag tool flags, see thegcpdiag usage instructions.

Permissions errors when using service account roles with Vertex AI

Issue

You get general permissions errors when you use service account roleswith Vertex AI.

These errors can appear in Cloud Logging in either the product componentlogs or audit logs. They may also appear in any combination of theaffected projects.

These issues can be caused by one or both of the following:

Use of theService Account Token Creator role when theService Account User role should have been used, or the other way around.These roles grant different permissions on a service account and aren'tinterchangeable. To learn about the differences between theService Account Token Creator andService Account User roles, seeService account roles.
You've granted a service account permissions across multiple projects,which isn't permitted by default.

Solution

To resolve the issue, try one or more of the following:

Determine whether theService Account Token Creator orService Account User role is needed. To learn more, read theIAM documentation for the Vertex AIservices you are using, as well as any other product integrations thatyou are using.
If you have granted a service account permissions across multiple projects,enable service accounts to be attached across projects by ensuring thatiam.disableCrossProjectServiceAccountUsage.isn't enforced. To ensure thatiam.disableCrossProjectServiceAccountUsageisn't enforced, run the following command:
```
gcloud resource-manager org-policies disable-enforce \  iam.disableCrossProjectServiceAccountUsage \  --project=PROJECT_ID
```

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Troubleshooting Vertex AI Workbench Stay organized with collections Save and categorize content based on your preferences.

Vertex AI Workbench instances

Troubleshooting with AI Tools

Troubleshooting with Cloud Assistance Investigations

Troubleshoot diagnostic files with Gemini CLI

Connecting to and opening JupyterLab

Nothing happens after clicking Open JupyterLab

Can't access the terminal in a Vertex AI Workbench instance

502 error when opening JupyterLab

Notebook is unresponsive

Unable to connect with Vertex AI Workbench instance using SSH

GPU quota has been exceeded

Creating Vertex AI Workbench instances

Instance stays in pending state indefinitely or is stuck in provisioning status

Unable to create an instance within a Shared VPC network

Required permissions

Can't create a Vertex AI Workbench instance with a custom container

Mount shared storage button isn't there

599 error when using Dataproc

Unable to install third-party JupyterLab extension

Unable to edit underlying virtual machine

pip packages aren't available after adding conda environment

Unable to access or copy data of an instance with single user access

Unexpected shutdown

Instance logs show connection or timeout errors

Instance logs show 'Unable to contact Jupyter API' 'ReadTimeoutError'

docker0 address conflicts with VPC addressing

Specified reservations don't exist

Managed notebooks

Connecting to and opening JupyterLab

Nothing happens after clicking Open JupyterLab

Unable to connect with managed notebooks instance using SSH

Can't access the terminal in a managed notebooks instance

502 error when opening JupyterLab

Opening a notebook results in a 524 (A Timeout Occurred) error

Notebook is unresponsive

Migrating to Vertex AI Workbench instances

Can't find a kernel that was in the managed notebooks instance

Different version of framework in migrated instance

GPUs aren't migrated to the new Vertex AI Workbench instance

Migrated instance's machine type is different

GPU quota has been exceeded

Using container images

Container image doesn't appear as a kernel in JupyterLab

Notebook disconnects on long-running job

Using the executor

Package installations not available to the executor

401 or 403 errors when running the notebook code using the executor

exited with a non-zero status of 127 error when using the executor

Invalid service networking configuration error message

Unable to install third-party JupyterLab extension

Unable to access or copy data of an instance with single user access

Unexpected shutdown

Restore instance

Recover data from an instance

Creating managed notebooks instances

Starting an instance results in a resource availability error

No route to host on outbound connections from managed notebooks

User-managed notebooks

Connecting to and opening JupyterLab

Nothing happens after clicking Open JupyterLab

No Inverting Proxy server access to JupyterLab

Unable to connect with user-managed notebooks instance using SSH

Opening a user-managed notebooks instance results in a 403 (Forbidden) error

No JupyterLab access, single user mode enabled

Opening a notebook results in a 504 (Gateway Timeout) error

Opening a notebook results in a 524 (A Timeout Occurred) error

Opening a notebook results in a 598 (Network read timeout) error

Notebook is unresponsive

Migrating to Vertex AI Workbench instances

Can't find R, Beam, or other kernels that were in the user-managed notebooks instance

Can't set up a Dataproc Hub instance in the Vertex AI Workbench instance

Different version of framework in migrated instance

GPUs aren't migrated to the new Vertex AI Workbench instance

Migrated instance's machine type is different

Working with files

File downloading disabled but user can still download files

Downloaded files are truncated or don't complete downloading

After restarting VM, local files can't be referenced from notebook terminal

Troubleshooting Vertex AI Workbench

`pip` packages aren't available after adding conda environment

`docker0` address conflicts with VPC addressing

`exited with a non-zero status of 127` error when using the executor

`No route to host` on outbound connections from managed notebooks

`docker-proxy-agent` status failure

`docker-service` status failure

`jupyter-service` status failure

`jupyter-api` status failure