Troubleshooting Vertex AI Workbench Stay organized with collections Save and categorize content based on your preferences.
This page describes troubleshooting steps that you might find helpful if you runinto problems when you use Vertex AI Workbench.
See alsoTroubleshooting Vertex AI for help using other components of Vertex AI.
To filter this page's content, click a topic:
Vertex AI Workbench instances
This section describes troubleshooting steps for Vertex AI Workbenchinstances.
Troubleshooting with AI Tools
This section discusses how to use AI tools for troubleshooting;
Troubleshooting with Cloud Assistance Investigations
When connecting Vertex AI with other Google Cloud products, you may findGemini Cloud Assist Investigations to behelpful in troubleshooting integration issues. It may also acceleratetroubleshooting on the instance itself. Gemini Cloud Assist lets you draw insights from metrics and logs generated by the instance.
- Stop the instance and follow the
View in Compute Enginelink. - Install theOps Agent (Recommended). This will take a few minutes
- Add aCustom Metadata field
notebook-enable-debugand set this totrue - Restart the instance and reproduce the issue.
- Enable and configure the Cloud Assist Investigations API.
- Create an new investigation and describe the issue in detail using a naturallanguage prompt.
- As you type, a dialog appears that suggests resources to add to theinvestigation. Review this list and be sure to add the instance as a resource as well as any otherresources in this list ofsupported products
- Start the investigation and review the results.
Troubleshoot diagnostic files with Gemini CLI
You may use the results of from the Cloud Assistance Investigation to informfurther AI driven investigation on the diagnostic file from the instance.
- Run the diagnostic tool and specify a Cloud Storage bucket to upload the results.
sudo/opt/deeplearning/bin/diagnostic_tool.sh[--repair][--bucket=$BUCKET]- Download the diagnosting file to your workstation then install and configureGemini CLI.
- Start the application then describe your issue. Include the hypothesis from theCloud Assistance investigation in the context. Ask the model to extend the investigationby reading the contents of the diagnostic file using natural language prompts.
Connecting to and opening JupyterLab
This section describes troubleshooting steps for connecting to and openingJupyterLab.
Nothing happens after clicking Open JupyterLab
Issue
When you clickOpen JupyterLab, nothing happens.
Solution
Verify that your browser doesn't block new tabs from opening automatically.JupyterLab opens in a new browser tab.
Can't access the terminal in a Vertex AI Workbench instance
Issue
If you're unable to access the terminal or can't find the terminal window in thelauncher, it could be because your Vertex AI Workbench instance doesn't have terminal access enabled.
Solution
You must create a new Vertex AI Workbench instance with theTerminal access option enabled. This option can't be changed after instancecreation.
502 error when opening JupyterLab
Issue
A 502 error might mean that your Vertex AI Workbench instance isn't readyyet.
Solution
Wait a few minutes, refresh the Google Cloud console browser tab, and tryagain.
Notebook is unresponsive
Issue
Your Vertex AI Workbench instance isn't running cells or appears to befrozen.
Solution
First try restarting the kernel by clickingKernel from the top menu andthenRestart Kernel. If that doesn't work, you can try the following:
- Refresh the JupyterLab browser page. Unsaved cell output doesn't persist, soyou must run those cells again to regenerate the output.
- Reset your instance.
Unable to connect with Vertex AI Workbench instance using SSH
Issue
You're unable to connect to your instance by using SSH through a terminal window.
Vertex AI Workbench instances useOS Login toenable SSH access. When you create an instance, Vertex AI Workbench enablesOS Login by default by setting the metadata keyenable-oslogin toTRUE. Ifyou're unable to use SSH to connect to your instance, this metadata key mightneed to be set toTRUE.
Solution
Connecting to a Vertex AI Workbench instance by using the Google Cloud consoleisn't supported. If you're unable to connect to your instance by using SSHthrough a terminal window, see the following:
To set the metadata keyenable-oslogin toTRUE, use theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK.
GPU quota has been exceeded
Issue
You're unable to create a Vertex AI Workbench instance with GPUs.
Solution
Determine the number of GPUs available in your project bychecking the quotas page.If GPUs aren't listed on the quotas page, or you require additional GPU quota,you can request a quota increase for Compute Engine GPUs. SeeRequest ahigher quota limit.
Creating Vertex AI Workbench instances
This section describes how to troubleshoot issues related to creating Vertex AI Workbenchinstances.
Instance stays in pending state indefinitely or is stuck in provisioning status
Issue
After creating a Vertex AI Workbench instance, it stays inthe pending state indefinitely. An error like the following might appearin the serial logs:
Could not resolve host: notebooks.googleapis.com
If your instance is stuck in provisioning status, this could be because you havean invalid private networking configuration for your instance.
Solution
Tip: You canusegcpdiag to investigate an instance stuck in provisioning status.Follow the steps in theInstance logs show connection or timeout errorssection.
Unable to create an instance within a Shared VPC network
Issue
Attempting to create an instance within a Shared VPC network results inan error message like the following:
Required 'compute.subnetworks.use' permission for'projects/network-administration/regions/us-central1/subnetworks/v'
Solution
The issue is that theNotebooks Service Accountis attempting to create the instance without the correct permissions.
To ensure that the Notebooks Service Account has the necessary permissions to ensure that the Notebooks Service Account can create a Vertex AI Workbench instance within a Shared VPC network, ask your administrator to grant the Notebooks Service Account the Compute Network User role (roles/compute.networkUser) IAM role on the host project.Important: You must grant this role to the Notebooks Service Account,not to your user account. Failure to grant the role to the correct principal might result in permission errors. For more information about granting roles, seeManage access to projects, folders, and organizations.
This predefined role contains the permissions required to ensure that the Notebooks Service Account can create a Vertex AI Workbench instance within a Shared VPC network. To see the exact permissions that are required, expand theRequired permissions section:
Required permissions
The following permissions are required to ensure that the Notebooks Service Account can create a Vertex AI Workbench instance within a Shared VPC network:
- To use subnetworks:
compute.subnetworks.use
Your administrator might also be able to give the Notebooks Service Account these permissions withcustom roles or otherpredefined roles.
Can't create a Vertex AI Workbench instance with a custom container
Issue
There isn't an option to use a custom container when creating aVertex AI Workbench instance in the Google Cloud console.
Solution
Adding a custom container to a Vertex AI Workbench instance isn'tsupported, and you can't add a custom container by usingthe Google Cloud console.
Adding a conda environment is recommendedinstead of using a custom container.
You can add a custom container to a Vertex AI Workbench instance by usingtheNotebooksAPI,but this capability isn't supported.
Mount shared storage button isn't there
Issue
TheMount shared storage button isn't in theFile Browser tab of theJupyterLab interface.
Solution
Thestorage.buckets.list permission is required for theMount shared storage button to appear in the JupyterLab interface of yourVertex AI Workbench instance. Ask your administrator to grant yourVertex AI Workbench instance's service account thestorage.buckets.list permission on the project.
599 error when using Dataproc
Issue
Attempting to create a Dataproc-enabled instanceresults in an error message like the following:
HTTP 599: Unknown (Error from Gateway: [Timeout while connecting]Exception while attempting to connect to Gateway server url.Ensure gateway url is valid and the Gateway instance is running.)
Solution
In your Cloud DNS configuration, add a Cloud DNS entry for the*.googleusercontent.com domain.
Unable to install third-party JupyterLab extension
Issue
Attempting to install a third-party JupyterLab extension results in anError: 500 message.
Solution
Third-party JupyterLab extensions aren't supported in Vertex AI Workbenchinstances.
Unable to edit underlying virtual machine
Issue
When you try to edit the underlying virtual machine (VM) of a Vertex AI Workbenchinstance, you might get an error message similar to the following:
Current principal doesn't have permission to mutate this resource.Solution
This error occurs because you can't edit the underlying VM of an instance byusing the Google Cloud console or the Compute Engine API.
To edit a Vertex AI Workbench instance's underlying VM, use theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK
pip packages aren't available after adding conda environment
Issue
Yourpip packages aren't available after you add a conda-based kernel.
Solution
To resolve the issue, seeAdd a conda environment and try thefollowing:
Check that you used the
DL_ANACONDA_ENV_HOMEvariable and that it containsthe name of your environment.Check that
pipis located in a path similar toopt/conda/envs/ENVIRONMENT/bin/pip. You can runthewhich pipcommand to get the path.
Unable to access or copy data of an instance with single user access
Issue
The data on an instance with single user access is inaccessible.
For Vertex AI Workbench instances that are set up with single user access,only the specified single user (the owner) can access the data on the instance.
Solution
To access or copy the data when you aren't the owner of the instance, open asupport case.
Unexpected shutdown
Issue
Your Vertex AI Workbench instance shuts down unexpectedly.
Solution
If your instance shuts down unexpectedly, this could be becauseidle shutdown was initiated.
If you enabled idle shutdown, your instance shuts down when there is no kernelactivity for the specified time period. For example, running a cell or newoutput printing to a notebook is activity that resets the idle timeout timer.CPU usage doesn't reset the idle timeout timer.
Instance logs show connection or timeout errors
Issue
Your Vertex AI Workbench instance's logs show connection or timeout errors.
Solution
If you notice connection or timeout errors in the instance's logs make sure thatthe Jupyter server is running on port 8080. Follow the steps in theVerify that the Jupyter internal API is active section.
If you have turned offExternal IP and you are using a private VPCnetwork make sure you have also followed thenetwork configuration options documentation.Consider the following:
You must enable Private Google Access on the chosen subnetwork in the sameregion where your instance is located in the VPC host project.For more information on configuring Private Google Access, see thePrivate Google Access documentation.
If you're using Cloud DNS, the instance must be able to resolve therequired Cloud DNS domains specified in thenetwork configuration options documentation.To verify this, follow the steps in theVerify the instance can resolve the required DNS domains section.
Instance logs show 'Unable to contact Jupyter API' 'ReadTimeoutError'
Issue
Your Vertex AI Workbench instance logs show an error such as:
notebooks_collection_agent. Unable to contact Jupyter API:HTTPConnectionPool(host=\'127.0.0.1\', port=8080):Max retries exceeded ReadTimeoutError(\"HTTPConnectionPool(host=\'127.0.0.1\', port=8080Solution
Follow the steps in theInstance logs show connection or timeout errors section.You can also trymodifying the Notebooks Collection Agent scriptto changeHTTP_TIMEOUT_SESSION to a larger value,for example:60, to help verify whether the request has failed due tothe call taking too long to respond or the requested URL can't be reached.
docker0 address conflicts with VPC addressing
Issue
By default, thedocker0 interface is created with an IP address of172.17.0.1/16. This might conflict with the IP addressing in your VPC network such that the instance is unable to connect to other endpoints with172.17.0.1/16 addresses.
Solution
You can force thedocker0 interface to be created with an IP address thatdoesn't conflict with your VPC network by using the followingpost-startup script and setting the post-startup script behavior torun_once.
#!/bin/bash# Wait for Docker to be fully startedwhile!systemctlis-activedocker;dosleep1done# Stop the Docker servicesystemctlstopdocker# Modify /etc/docker/daemon.jsoncat </etc/docker/daemon.json{"bip":"CUSTOM_DOCKER_IP/16"}EOF# Restart the Docker servicesystemctlstartdocker
Specified reservations don't exist
Issue
The operation to create the instance results in aSpecified reservations donot exist error message. The operation's output might be similar to the following:
{"name":"projects/PROJECT/locations/LOCATION/operations/OPERATION_ID","metadata":{"@type":"type.googleapis.com/google.cloud.notebooks.v2.OperationMetadata","createTime":"2025-01-01T01:00:01.000000000Z","endTime":"2025-01-01T01:00:01.000000000Z","target":"projects/PROJECT/locations/LOCATION/instances/INSTANCE_NAME","verb":"create","requestedCancellation":false,"apiVersion":"v2","endpoint":"CreateInstance"},"done":true,"error":{"code":3,"message":"Invalid value for field 'resource.reservationAffinity': '{ \"consumeReservationType\": \"SPECIFIC_ALLOCATION\", \"key\": \"compute.googleapis.com/reservation-name...'. Specified reservations [projects/PROJECT/zones/ZONE/futureReservations/RESERVATION_NAME] do not exist.","details":[{"@type":"type.googleapis.com/google.rpc.RequestInfo","requestId":"REQUEST_ID"}]}}
Solution
Some Compute Engine machine types require additional parameters at creationsuch as local SSDs or a minimum CPU platform. The instance specification mustinclude these additional fields.
- Vertex AI Workbench instances use automatic minimum CPU platform bydefault. If your reservation sets a specific platform, you need to set the
min_cpu_platformaccordingly when creating Vertex AI Workbench instances. - Vertex AI Workbench instances always set the number of local SSDs to thedefault values according to the machine type.For example,
a2-ultragpu-1galways has 1 local SSD, whilea2-highgpu-1galways has 0 local SSD. When creating reservations to be used forVertex AI Workbench instances, you need to leave the local SSD to itsdefault value.
Managed notebooks
Vertex AI Workbench managed notebooks isdeprecated. On April 14, 2025, support for managed notebooks ended and the ability to create managed notebooks instances was removed. Existing instances will continue to function until March 30, 2026, but patches, updates, and upgrades won't be available. To continue using Vertex AI Workbench, we recommend that youmigrate your managed notebooks instances to Vertex AI Workbench instances.
This section describes troubleshooting steps for managed notebooks.
Connecting to and opening JupyterLab
This section describes troubleshooting issues with connecting to and openingJupyterLab.
Nothing happens after clicking Open JupyterLab
Issue
When you clickOpen JupyterLab, nothing happens.
Solution
Verify that your browser doesn't block new tabs from opening automatically.JupyterLab opens in a new browser tab.
Unable to connect with managed notebooks instance using SSH
Issue
There isn't an option to connect with managed notebooks instancesby using SSH.
Solution
SSH access to managed notebooks instances isn't available.
Can't access the terminal in a managed notebooks instance
Issue
If you're unable to access the terminal or can't find the terminal window in thelauncher, it could be because your managed notebooks instancedoesn't have terminal access enabled.
Solution
You must create a new managed notebooks instance with theTerminal access option enabled. This option can't be changed after instancecreation.
502 error when opening JupyterLab
Issue
A 502 error might mean that your managed notebooks instance isn'tready yet.
Solution
Wait a few minutes, refresh the Google Cloud console browser tab, and tryagain.
Opening a notebook results in a 524 (A Timeout Occurred) error
Issue
A 524 error is usually an indication that the Inverting Proxy agent isn'tconnecting to the Inverting Proxy server or the requests are taking too long onthe backend server side (Jupyter). Common causes of this error includenetworking issues, the Inverting Proxy agent isn't running, or the Jupyterservice isn't running.
Solution
Verify that your managed notebooks instance is started.
Notebook is unresponsive
Issue
managed notebooks instance isn't running cells or appears to befrozen.
Solution
First try restarting the kernel by clickingKernel from the top menu andthenRestart Kernel. If that doesn't work, you can try the following:
- Refresh the JupyterLab browser page. Unsaved cell output doesn't persist, soyou must run those cells again to regenerate the output.
- Reset your instance.
Migrating to Vertex AI Workbench instances
This section describes methods for diagnosing and resolving issues withmigrating from a managed notebooks instance to a Vertex AI Workbenchinstance.
Can't find a kernel that was in the managed notebooks instance
Issue
A kernel that was in your managed notebooks instance doesn'tappear in the Vertex AI Workbench instance that you migrated to.
Custom containers appear as kernels in managed notebooks.The Vertex AI Workbench migration tool doesn't support custom containermigration.
Solution
To resolve this issue,add a conda environmentto your Vertex AI Workbench instance.
Different version of framework in migrated instance
Issue
A framework that was in your managed notebooks instance was adifferent version than the one in the Vertex AI Workbench instance that youmigrated to.
Vertex AI Workbench instances provide a default set of framework versions.The migration tool doesn't add framework versions from your originalmanaged notebooks instance. Seedefault migration toolbehaviors.
Solution
To add a specific version of a framework,add a conda environmentto your Vertex AI Workbench instance.
GPUs aren't migrated to the new Vertex AI Workbench instance
Issue
GPUs that were in your managed notebooks instance aren't in theVertex AI Workbench instance that you migrated to.
Vertex AI Workbench instances support a default set of GPUs. If the GPUs inyour original managed notebooks instance aren't available, yourinstance is migrated without any GPUs.
Solution
After migration, you can add GPUs to your Vertex AI Workbench instance byusing theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK.
Migrated instance's machine type is different
Issue
The machine type of your managed notebooks instance is differentfrom the Vertex AI Workbench instance that you migrated to.
Vertex AI Workbench instances don't support all machine types. If themachine type in your original managed notebooks instance isn'tavailable, your instance is migrated to thee2-standard-4 machine type.
Solution
After migration, you can change the machine type of your Vertex AI Workbenchinstance by using theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK.
GPU quota has been exceeded
Issue
You are unable to create a managed notebooks instance with GPUs.
Solution
Determine the number of GPUs available in your project bychecking the quotas page.If GPUs aren't listed on the quotas page, or you require additional GPU quota, you can request a quota increase. SeeRequest a higher quota limit.
Using container images
This section describes troubleshooting issues with using container images.
Container image doesn't appear as a kernel in JupyterLab
Issue
Container images that don't have a valid kernelspec don't successfully load askernels in JupyterLab.
Solution
Make sure that your container meets our requirements. For more information,see thecustom container requirements.
Notebook disconnects on long-running job
Issue
If you see the following error message when running a job in a notebook, itmight be caused by the request taking too long to load, or the CPU or memoryutilization is high, which can make the Jupyter Service unresponsive.
{"log":"2021/06/29 18:10:33 failure fetching a VM ID: compute: Received 500`internal error`\n","stream":"stderr","time":"2021-06-29T18:10:33.383650241Z"}{"log":"2021/06/29 18:38:26 Websocket failure: failed to read a websocketmessage from the server: read tcp [::1]:40168-\u003e[::1]:8080: use of closednetwork connection\n","stream":"stderr","time":"2021-06-29T18:38:26.057622824Z"}Solution
This issue is caused by running a long-running job within a notebook.To run a job that might take a long time to complete, it's recommended tousethe executor.
Using the executor
This section describes troubleshooting issues with using executor.
Package installations not available to the executor
Note: Make sure that your Virtual Private Cloud allows access to PyPI and VPC Service Controls, and that your organization policy doesn't block access.Issue
The executor runs your notebook code in a separate environment from the kernelwhere you run your notebook file's code. Because of this, some of the packagesyou installed might not be available in the executor's environment.
Solution
To resolve this issue, seeEnsure package installations are available tothe executor.
401 or 403 errors when running the notebook code using the executor
Issue
A 401 or 403 error when you run the executor can mean that the executor isn'table to access resources.
Solution
See the following for possible causes:
The executor runs your notebook code in atenant projectseparate from your managed notebooks instance's project.Therefore, when you access resources through code run by the executor,the executor might not connect to the correct Google Cloud project bydefault. To resolve this issue,use explicit projectselection.
By default, your managed notebooks instance can have accessto resources that exist in the same project, and therefore, when you runyour notebook file's code manually, these resources don't need additionalauthentication. However, because the executor runs in a separate tenantproject, it does not have the same default access. To resolve this issue,authenticate access using service accounts.
The executor can't use end-user credentials to authenticate access toresources, for example, the
gcloud auth logincommand.To resolve this issue,authenticate access using service accounts.
exited with a non-zero status of 127 error when using the executor
Issue
Anexited with a non-zero status of 127 error, or "command not found" error,can happen when you use the executor to run code on a custom container thatdoesn't have thenbexecutor extension installed.
Solution
To ensure that your custom container has thenbexecutor extension, you cancreate a derivative container image from a Deep Learning Containers image.Deep Learning Containers images include thenbexecutor extension.
Invalid service networking configuration error message
Issue
This error might look like the following:
Invalid Service Networking configuration. Couldn't find free blocks in allocated IP ranges.Please use a valid range using: /24 mask or below (/23,/22, etc).This means that no free blocks were found in the allocated IP ranges of yournetwork.
Solution
Use a subnet mask of/24 or lower.Create a bigger allocatedIP address rangeand attach this range bymodifying theprivate service connectionforservicenetworking-googleapis-com.
For more information, seeSet up a network.
Unable to install third-party JupyterLab extension
Issue
Attempting to install a third-party JupyterLab extension results in anError: 500 message.
Solution
Third-party JupyterLab extensions aren't supported in managed notebooks instances.
Unable to access or copy data of an instance with single user access
Issue
The data on an instance with single user access is inaccessible.
Solution
For managed notebooks instances that are set up with single useraccess, only the specified single user (the owner) can access the data on theinstance.
To access or copy the data when you aren't the owner of the instance, open asupport case.
Unexpected shutdown
Issue
Your Vertex AI Workbench instance shuts down unexpectedly.
Solution
If your instance shuts down unexpectedly, this could be becauseidle shutdown was initiated.
If you enabled idle shutdown, your instance shuts down when there is no kernelactivity for the specified time period. For example, running a cell or newoutput printing to a notebook is activity that resets the idle timeout timer.CPU usage doesn't reset the idle timeout timer.
Restore instance
Issue
Restoring a managed notebooks instance after it's been deletedisn't supported.
Solution
To back up the data on your instance, you cansave your notebooks to GitHub.
Recover data from an instance
Issue
Recovering data from a managed notebooks instance after it's beendeleted isn't supported.
Solution
To back up the data on your instance, you cansave your notebooks to GitHub.
Creating managed notebooks instances
Creating a managed notebooks instance isn't supported.For more information, seeDeprecations.
Starting an instance results in a resource availability error
Issue
You're unable to start an instance because of a resource availability error.
This error can look like the following:
The zoneZONE_NAME doesn't have enough resources available to fulfillthe request. '(resource type:compute)'.
Resource errors occur when you try to start an instance in a zone that can'taccommodate your request due to the current unavailability of Compute Engineresources, such as GPUs or CPUs.
Resource errors only apply to the resources you specified in your request at thetime you sent the request, not to all resources in the zone. Resource errorsaren't related to your Compute Engine quota. Resource errors are temporary andcan change frequently based on fluctuating demand.
Solution
To proceed, try the following:
- Change the machine type of your instance.
- Migrate your files and data to an instance in a different zone.
- Attempt the request again later.
- Reduce the amount of resources that you're requesting. For example,start a different instance with less GPUs, disks, vCPUs, or memory.
No route to host on outbound connections from managed notebooks
Issue
Typically, the only routes you can see in the Google Cloud console are those knownto your own VPC as well as the ranges reserved when you completetheVPC Network Peering configuration.
Managed notebooks instances reside in a Google-managed networkand run a modified version of Jupyter in a Docker networking namespace withinthe instance.
The Docker network interface and Linux bridge on this instance may select alocal IP that conflicts with IP ranges being exported over the peering by yourVPC. These are typically in the172.16.0.0/161 and192.168.10.0/24 ranges, respectively.
In these circumstances, outbound connections from the instance to these rangeswill fail with a complaint that is some variation ofNo route to host despiteVPC routes being correctly shared.
Solution
Invokeifconfig in a terminal session and ensure that no IP addresses on anyvirtual interfaces in the instance conflict with IP ranges that yourVPC is exporting to the peering connection.
User-managed notebooks
Vertex AI Workbench user-managed notebooks isdeprecated. On April 14, 2025, support for user-managed notebooks ended and the ability to create user-managed notebooks instances was removed. Existing instances will continue to function until March 30, 2026, but patches, updates, and upgrades won't be available. To continue using Vertex AI Workbench, we recommend that youmigrate your user-managed notebooks instances to Vertex AI Workbench instances.
This section describes troubleshooting steps foruser-managed notebooks.
Connecting to and opening JupyterLab
This section describes troubleshooting issues with connecting to and openingJupyterLab.
Nothing happens after clicking Open JupyterLab
Issue
When you clickOpen JupyterLab, nothing happens.
Solution
Verify that your browser doesn't block new tabs from opening automatically.JupyterLab opens in a new browser tab.
No Inverting Proxy server access to JupyterLab
Issue
You are unable to access JupyterLab.
Vertex AI Workbench uses a Google internal Inverting Proxy server toprovide access to JupyterLab. User-managed notebooks instancesettings, network configuration, and other factors can prevent access toJupyterLab.
Solution
Unable to connect with user-managed notebooks instance using SSH
Issue
You're unable to connect to your instance by using SSH through a terminal window.
User-managed notebooks instances useOS Loginto enable SSH access. When you create an instance, Vertex AI Workbenchenables OS Login by default by setting the metadata keyenable-oslogin toTRUE. If you're unable to use SSH to connect to your instance, this metadatakey might need to be set toTRUE.
Solution
To enable SSH access for user-managed notebooks for users,complete thesteps for configuring OS Login roles on user accounts.
Opening a user-managed notebooks instance results in a 403 (Forbidden) error
Issue
A403 (Forbidden) error when opening a user-managed notebooksinstance often means that there is an access issue.
Solution
To troubleshoot access issues, consider the three ways that access can begranted to a user-managed notebooks instance:
- Single user
- Service account
- Project editors
The access mode is configured during user-managed notebooksinstance creation and it is defined in the notebook metadata:
- Single user:
proxy-mode=mail, proxy-user-mail=user@domain.com - Service account:
proxy-mode=service_account - Project editors:
proxy-mode=project_editors
If you can't access a notebook when you clickOpen JupyterLab, try thefollowing:
Verify that the user accessing the instance has the
iam.serviceAccounts.ActAspermission for the instance's service account. The service account is eitherthe Compute Engine default service account or a service accountthat is specified when the instance is created.If your instance uses single user access with a specified service accountas the single user, seeNo JupyterLab access, single user modeenabled.
The following example shows how to specify a service account when you create aninstance:
gcloud notebooks instances create nb-1 \ --vm-image-family=tf-latest-cpu \ --metadata=proxy-mode=mail,proxy-user-mail=user@domain.com \ --service-account=your_service_account@project_id.iam.gserviceaccount.com \ --location=us-west1-a
When you clickOpen JupyterLab to open a notebook, the notebook opens in anew browser tab. If you are signed in to more than one Google Account, the newtab opens with your default Google Account. If you didn't create youruser-managed notebooks instance with your default Google Account,the new browser tab will show a403 (Forbidden) error.
No JupyterLab access, single user mode enabled
Issue
You are unable to access JupyterLab.
Solution
If a user is unable to access JupyterLab and the instance's access to JupyterLabis set toSingle user only, try the following:
On theUser-managed notebooks page ofthe Google Cloud console,click the name of your instance to open theNotebook details page.
Next toView VM details, clickView in Compute Engine.
On the VM details page, clickEdit.
In theMetadata section,verify that the
proxy-modemetadata entry is set tomail.Verify that the
proxy-user-mailmetadata entry is set to a valid useremail address, not a service account.ClickSave.
On theUser-managed notebooks page ofthe Google Cloud console,initialize the updated metadata bystopping your instanceandstarting the instanceback up again.
Opening a notebook results in a 504 (Gateway Timeout) error
Issue
This is an indication of an internal proxy timeout or a backend server (Jupyter)timeout. This can be seen when:
- The request never reached the internal Inverting Proxy server
- Backend (Jupyter) returns a 504 error.
Solution
Open a Google support case.
Opening a notebook results in a 524 (A Timeout Occurred) error
Issue
The internal Inverting Proxy server hasn't received a response from theInverting Proxy agent for the request within the timeout period. Inverting Proxyagent runs inside your user-managed notebooks instance as aDocker container. A 524 error is usually an indication that the Inverting Proxyagent isn't connecting to the Inverting Proxy server or the requests are takingtoo long on the backend server side (Jupyter). A typical case for this error ison the user side (for example, a networking issue, or the Inverting Proxy agentservice isn't running).
Solution
If you can't access a notebook, verify that your user-managed notebooksinstance is started and try the following:
Option 1: Run thediagnostictool toautomatically check and repair user-managed notebooks coreservices, verify available storage, and generate useful log files. To run thetool in your instance, perform the following steps:
Make sure that your instance is on version M58 or newer.
Run the following command:
sudo/opt/deeplearning/bin/diagnostic_tool.sh[--repair][--bucket=$BUCKET]
Note that the
--repairflag and--bucketflags are optional. The--repairflag will attempt to fix common core service errors, and the--bucketflag will let you specify a Cloud Storage bucket tostore the created log files.The output of this command will display useful status messages foruser-managed notebooks core services and will export logfiles of its findings.
Option 2: Use the following steps to check specific user-managed notebooksrequirements individually.
Verify that the user-managed notebooks instance disk isn't outof space.
Run the following command:
df-h-T/home/jupyter
If theUse% is more than
85%, you need to manuallydelete files from/home/jupyter. As a first step, youcan empty the trash with the following command:sudorm-rf/home/jupyter/.local/share/Trash/*
Verify that the Inverting Proxy agent is running.If the agent is started, try restarting it.
Make sure the Jupyter service is running. If it is,tryrestarting it.
Verify memory utilization in the user-managed notebooks instance.
Run the following command:
free-t-h
If theused memory is more than
85%of thetotal,you should considerchanging themachine type.You caninstall Cloud Monitoringagentto monitor if there is high memory usage inyour user-managed notebooks instance.Seepricing information.
Verify that you are usingDeep Learning VM versionM55 or later. To learn more aboutupgrading, seeUpgrade the environment ofa user-managed notebooks instance.
Opening a notebook results in a 598 (Network read timeout) error
Issue
The Inverting Proxy server hasn't heard from the Inverting Proxy agent at allfor more than 10 minutes. This is a strong indication of an Inverting Proxyagent issue.
Solution
If you can't access a notebook, try the following:
Verify that your user-managed notebooks instance is started.
Verify that the Inverting Proxy agent is running.If the agent is started, try restarting it.
Make sure the Jupyter service is running. If it is,tryrestarting it.
Verify that you are usingDeep Learning VM versionM55 or later. To learn more aboutupgrading, seeUpgrade the environment ofa user-managed notebooks instance.
Notebook is unresponsive
Issue
Your user-managed notebooks instance isn't running cells orappears to be frozen.
Solution
First try restarting the kernel by clickingKernel from the top menu andthenRestart Kernel. If that doesn't work, you can try the following:
- Refresh the JupyterLab browser page. Any unsaved cell output doesn't persist,so you must run those cells again to regenerate the output.
- From a terminal session in the notebook, run the command
topto see if thereare processes consuming the CPU. - From the terminal, check the amount of free disk space using the command
df,or check the available RAM using the commandfree. - Shut your instance down by selecting it from theUser-managed notebooks pageand clickingStop. After it has stopped completely, select it and clickStart.
Migrating to Vertex AI Workbench instances
This section describes methods for diagnosing and resolving issues withmigrating from a user-managed notebooks instance toa Vertex AI Workbench instance.
Can't find R, Beam, or other kernels that were in the user-managed notebooks instance
Issue
A kernel that was in your user-managed notebooks instance doesn'tappear in the Vertex AI Workbench instance that you migrated to.
Some kernels, such as the R and Beam kernels, aren't available inVertex AI Workbench instances by default. Migration of those kernels isn'tsupported.
Solution
To resolve this issue,add a conda environmentto your Vertex AI Workbench instance.
Can't set up a Dataproc Hub instance in the Vertex AI Workbench instance
Issue
Dataproc Hub isn't supported in Vertex AI Workbench instances.
Solution
Continue to use Dataproc Hub in user-managed notebooksinstances.
Different version of framework in migrated instance
Issue
A framework that was in your user-managed notebooks instancewas a different version than the one in the Vertex AI Workbench instancethat you migrated to.
Vertex AI Workbench instances provide a default set of framework versions.The migration tool doesn't add framework versions from your originaluser-managed notebooks instance. Seedefault migration toolbehaviors.
Solution
To add a specific version of a framework,add a conda environmentto your Vertex AI Workbench instance.
GPUs aren't migrated to the new Vertex AI Workbench instance
Issue
GPUs that were in your user-managed notebooks instancearen't in the Vertex AI Workbench instance that you migrated to.
Vertex AI Workbench instances support a default set of GPUs. If the GPUs inyour original user-managed notebooks instance aren't available,your instance is migrated without any GPUs.
Solution
After migration, you can add GPUs to your Vertex AI Workbench instanceby using theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK.
Migrated instance's machine type is different
Issue
The machine type of your user-managed notebooks instanceis different from the Vertex AI Workbench instance that you migrated to.
Vertex AI Workbench instances don't support all machine types.If the machine type inyour original user-managed notebooks instance isn't available,your instance is migrated to thee2-standard-4 machine type.
Solution
After migration, you can change the machine type of yourVertex AI Workbench instance by using theprojects.locations.instances.patchmethod in the Notebooks API or thegcloud workbench instances updatecommand in the Google Cloud SDK.
Working with files
This section describes troubleshooting issues withfiles for user-managed notebooks instances.
File downloading disabled but user can still download files
Issue
For Dataproc Hub user-managed notebooks instances,disabling file downloading from the JupyterLab user interfaceisn't supported. User-managed notebooks instancesthat use the Dataproc Hub framework permit file downloading evenif you don't selectEnable file downloading from JupyterLab UIwhen you create the instance.
Solution
Dataproc Hub user-managed notebooks instances don'tsupport restricting file downloads.
Downloaded files are truncated or don't complete downloading
Issue
When you download files from your user-managed notebooks instance,a timeout setting on the proxy-forwarding agent limits the connection time forthe download to complete. If the download takes too long, this can truncate yourdownloaded file or prevent it from being downloaded.
Solution
To download the file,copy your file toCloud Storage,and then download the file from Cloud Storage.
Considermigrating your files and datato a new user-managed notebooks instance.
After restarting VM, local files can't be referenced from notebook terminal
Issue
Sometimes after restarting a user-managed notebooks instance,local files can't be referenced from within a notebook terminal.
Solution
This is a known issue. To reference your local files from withina notebook terminal, first re-establish your current working directoryusing the following command:
cdPWDIn this command, replacePWD with your current workingdirectory. For example, if your current working directory was/home/jupyter/, use the commandcd /home/jupyter/.
After re-establishing your current working directory, your local filescan be referenced from within the notebook terminal.
Creating user-managed notebooks instances
Creating a user-managed notebooks instance isn't supported.For more information, seeDeprecations.
Starting an instance results in a resource availability error
Issue
You're unable to start an instance because of a resource availability error.
This error can look like the following:
The zoneZONE_NAME doesn't have enough resources available to fulfillthe request. '(resource type:compute)'.
Resource errors occur when you try to start an instance in a zone that can'taccommodate your request due to the current unavailability of Compute Engineresources, such as GPUs or CPUs.
Resource errors only apply to the resources you specified in your request at thetime you sent the request, not to all resources in the zone. Resource errorsaren't related to your Compute Engine quota. Resource errors are temporary andcan change frequently based on fluctuating demand.
Solution
To proceed, you can try the following:
- Change the machine type of your instance.
- Migrate your files and data to an instance in a different zone.
- Attempt the request again later.
- Reduce the amount of resources that you're requesting. For example,start a different instance with less GPUs, disks, vCPUs, or memory.
Upgrading user-managed notebooks instances
This section describes troubleshooting issues withupgrading user-managed notebooks instances.
Unable to upgrade because unable to get instance disk information
Issue
Upgrade isn't supported for single-disk user-managed notebooksinstances.
Solution
You might want tomigrate your user data to a new user-managed notebooksinstance.
Unable to upgrade because instance isn't UEFI compatible
Issue
Vertex AI Workbench depends on UEFI compatibility to complete an upgrade.
User-managed notebooks instances created from some older imagesare not UEFI compatible, and therefore can't be upgraded.
Solution
To verify that your instance is UEFI compatible, type the following command ineitherCloud Shellor any environment where theGoogle Cloud CLI is installed.
gcloudcomputeinstancesdescribeINSTANCE_NAME\--zone=ZONE|greptype
Replace the following:
INSTANCE_NAME: the name of your instanceZONE: the zone where your instance is located
To verify that the image that you used to create your instance isUEFI compatible, use the following command:
gcloudcomputeimagesdescribeVM_IMAGE_FAMILY\--projectdeeplearning-platform-release|greptype
ReplaceVM_IMAGE_FAMILY with theimagefamily name that you used to create your instance.
If you determine that either your instance or image isn't UEFI compatible, youcan attempt to migrate your user data to a new user-managed notebooksinstance. To do so, complete the following steps:
Verify that the image that you want to use to create your new instance isUEFI compatible. To do so, type the following command in eitherCloud Shellor any environment where theGoogle Cloud CLI is installed.
gcloudcomputeimagesdescribeVM_IMAGE_FAMILY\--projectdeeplearning-platform-release--format=json|greptype
Replace
VM_IMAGE_FAMILYwith the imagefamily name that you want to use to create your instance.Migrate your user data to a new user-managed notebooksinstance.
User-managed notebooks instance isn't accessible after upgrade
Issue
If the user-managed notebooks instance isn't accessible after anupgrade, there might have been a failure during the replacement of the boot diskimage.
User-managed notebooks instances that can be upgraded aredual-disk, with one boot disk and one data disk. The upgrade process upgradesthe boot disk to a new image while preserving your data on the data disk.
Solution
Complete the following steps to attach a new valid image to the boot disk.
To store values you'll use to complete this procedure, type thefollowing command in eitherCloud Shellor any environment where theGoogle Cloud CLI is installed.
exportINSTANCE_NAME=MY_INSTANCE_NAMEexportPROJECT_ID=MY_PROJECT_IDexportZONE=MY_ZONE
Replace the following:
MY_INSTANCE_NAME: the name of your instanceMY_PROJECT_ID: your project IDMY_ZONE: the zone where your instance is located
Use the following command to stop the instance:
gcloudcomputeinstancesstop$INSTANCE_NAME\--project=$PROJECT_ID--zone=$ZONE
Detach the data disk from the instance.
gcloudcomputeinstancesdetach-disk$INSTANCE_NAME--device-name=data\--project=$PROJECT_ID--zone=$ZONE
Delete the instance's VM.
gcloudcomputeinstancesdelete$INSTANCE_NAME--keep-disks=all--quiet\--project=$PROJECT_ID--zone=$ZONE
Use the Notebooks API to delete the user-managed notebooksinstance.
gcloudnotebooksinstancesdelete$INSTANCE_NAME\--project=$PROJECT_ID--location=$ZONE
Create a user-managed notebooks instance using the samename as your previous instance.
gcloudnotebooksinstancescreate$INSTANCE_NAME\--vm-image-project="deeplearning-platform-release"\--vm-image-family=MY_VM_IMAGE_FAMILY\--instance-owners=MY_INSTANCE_OWNER\--machine-type=MY_MACHINE_TYPE\--service-account=MY_SERVICE_ACCOUNT\--accelerator-type=MY_ACCELERATOR_TYPE\--accelerator-core-count=MY_ACCELERATOR_CORE_COUNT\--install-gpu-driver\--project=$PROJECT_ID\--location=$ZONE
Replace the following:
MY_VM_IMAGE_FAMILY: theimage familynameMY_INSTANCE_OWNER: your instance ownerMY_MACHINE_TYPE: themachinetype of your instance's VMMY_SERVICE_ACCOUNT: the service accountto use with this instance, or use"default"MY_ACCELERATOR_TYPE: the accelerator type;for example,"NVIDIA_TESLA_T4"MY_ACCELERATOR_CORE_COUNT: the core count;for example,1
Monitoring health status of user-managed notebooks instances
This section describes how to troubleshoot issues with monitoring health statuserrors.
docker-proxy-agent status failure
Follow these steps after adocker-proxy-agent status failure:
Verify that the Inverting Proxy agent is running.If not, go to step 3.
docker-service status failure
Follow these steps after adocker-service status failure:
jupyter-service status failure
Follow these steps after ajupyter-service status failure:
jupyter-api status failure
Follow these steps after ajupyter-api status failure:
Boot disk utilization percent
The boot disk space status is unhealthy if the disk space is greater than85% full.
If your boot disk space status is unhealthy, try the following:
From a terminal session in the user-managed notebooks instanceorusing ssh to connect,check the amount of free disk spaceusing the command
df -H.Use the command
find . -type d -size +100Mto help you find large filesthat you might be able to delete, but don't delete them unless youare sure you can safely do so. If you aren't sure, you canget help from support.If the previous steps don't solve your problem,get support.
Data disk utilization percent
The data disk space status is unhealthy if the disk space is greater than85% full.
If your data disk space status is unhealthy, try the following:
From a terminal session in the user-managed notebooks instanceorusing ssh to connect,check the amount of free disk spaceusing the command
df -h -T /home/jupyter.Delete large files to increase the available disk space.Use the command
find . -type d -size +100Mto help you find large files.If the previous steps don't solve your problem,get support.
Unable to install third-party JupyterLab extension
Issue
Attempting to install a third-party JupyterLab extension results in anError: 500 message.
Solution
Third-party JupyterLab extensions aren't supported in user-managed notebooksinstances.
Restore instance
Issue
Restoring a user-managed notebooks instance after it's been deleted isn't supported.
Solution
To back up the data on your instance, you cansave your notebooks to GitHubor make asnapshot of the disk.
Recover data from an instance
Issue
Recovering data from a user-managed notebooks instance after it'sbeen deleted isn't supported.
Solution
To back up the data on your instance, you cansave your notebooks to GitHubor make asnapshot of the disk
Unable to increase shared memory
Issue
You can't increase shared memory on an existing user-managed notebooksinstance.
Solution
However, you can specify a shared memory size when you create auser-managed notebooks instance by using thecontainer-custom-params metadata key, with a value like the following:
--shm-size=SHARED_MEMORY_SIZEgbReplaceSHARED_MEMORY_SIZE with the size that you wantin GB.
Helpful procedures
This section describes procedures that you might find helpful.
Use SSH to connect to your user-managed notebooks instance
Use ssh to connect to your instance by typing the followingcommand in eitherCloud Shellor any environment where theGoogle Cloud CLI is installed.
gcloudcomputessh--projectPROJECT_ID\--zoneZONE\INSTANCE_NAME---L8080:localhost:8080Replace the following:
PROJECT_ID: Your project IDZONE: The Google Cloud zone whereyour instance is locatedINSTANCE_NAME: The name of yourinstance
You can also connect to your instance by opening your instance'sCompute Engine detail page, and then clicking theSSH button.
Tip: If you can't use ssh to connect to your instance, you canusegcpdiag to troubleshoot this issueRe-register with the Inverting Proxy server
To re-register the user-managed notebooks instance with theinternal Inverting Proxy server, you can stop and start the VM from theUser-managed notebooks pageor you canuse ssh to connect toyour user-managed notebooks instance and enter:
cd /opt/deeplearning/binsudo ./attempt-register-vm-on-proxy.sh
Verify the Docker service status
To verify the Docker service status you canuse ssh to connect to youruser-managed notebooks instance and enter:
sudo service docker status
Verify that the Inverting Proxy agent is running
To verify if the notebook Inverting Proxy agent is running,use ssh to connect toyour user-managed notebooks instanceand enter:
# Confirm Inverting Proxy agent Docker container is running (proxy-agent)sudo docker ps# Verify State.Status is running and State.Running is true.sudo docker inspect proxy-agent# Grab logssudo docker logs proxy-agent
Verify the Jupyter service status and collect logs
To verify the Jupyter service status you canuse ssh toconnect toyour user-managed notebooks instance and enter:
sudo service jupyter status
To collect Jupyter service logs:
sudo journalctl -u jupyter.service --no-pager
Verify that the Jupyter internal API is active
The Jupyter API should always run on port 8080. You can verify this byinspecting the instance's syslogs for an entry similar to:
Jupyter Server ... running at:http://localhost:8080
To verify that the Jupyter internal API is active you can alsouse ssh to connect to your user-managed notebooks instanceand enter:
curl http://127.0.0.1:8080/api/kernelspecs
You can also measure the time it takes for the API to respond in case therequests are taking too long:
time curl -V http://127.0.0.1:8080/api/statustime curl -V http://127.0.0.1:8080/api/kernelstime curl -V http://127.0.0.1:8080/api/connectionsTo run these commands in your Vertex AI Workbench instance, open JupyterLab,and create a new terminal.
Restart the Docker service
To restart the Docker service, you can stop and start the VM from theUser-managed notebooks pageor you canuse ssh to connect toyour user-managed notebooks instance and enter:
sudo service docker restart
Restart the Inverting Proxy agent
To restart the Inverting Proxy agent, you can stop and start the VM from theUser-managed notebooks pageor you canuse ssh to connect toyour user-managed notebooks instance and enter:
sudo docker restart proxy-agent
Restart the Jupyter service
To restart the Jupyter service, you can stop and start the VM from theUser-managed notebooks pageor you canuse ssh to connect toyour user-managed notebooks instance and enter:
sudo service jupyter restart
Restart the Notebooks Collection Agent
The Notebooks Collection Agent service runs a Python process in the backgroundthat verifies the status of the Vertex AI Workbench instance's core services.
To restart the Notebooks Collection Agent service, you can stop and start the VM from theGoogle Cloud consoleor you canuse ssh to connect toyour Vertex AI Workbench instance and enter:
sudo systemctl stop notebooks-collection-agent.service
followed by:
sudo systemctl start notebooks-collection-agent.service
To run these commands in your Vertex AI Workbench instance, open JupyterLab,and create a new terminal.
Modify the Notebooks Collection Agent script
Note: Don't modify the Notebooks Collection Agent script unless it is necessary to troubleshoot or resolve an issue with the instance.To access and edit the script open a terminal in our instance oruse sshto connect to your Vertex AI Workbench instance,and enter:
nano /opt/deeplearning/bin/notebooks_collection_agent.py
After editing the file, remember to save it.
Then, you mustrestart the Notebooks Collection Agent service.
Verify the instance can resolve the required DNS domains
To verify that the instance can resolve the required DNS domains, you canuse ssh to connect to your user-managed notebooks instanceand enter:
host notebooks.googleapis.comhost *.notebooks.cloud.google.comhost *.notebooks.googleusercontent.comhost *.kernels.googleusercontent.comor:
curl --silent --output /dev/null "https://notebooks.cloud.google.com"; echo $?If the instance has Dataproc enabled, you can verify that the instanceresolves*.kernels.googleusercontent.com by running:
curl --verbose -H "Authorization: Bearer $(gcloud auth print-access-token)" https://${PROJECT_NUMBER}-dot-${REGION}.kernels.googleusercontent.com/api/kernelspecs | jq .To run these commands in your Vertex AI Workbench instance, open JupyterLab,and create a new terminal.
Make a copy of the user data on an instance
To store a copy of an instance's user datain Cloud Storage, complete the following steps.
Note: You must have terminal access to your instance.Terminal access is manually set when you create an instance. Theterminal access setting can't be changed after the instance is created.Create a Cloud Storage bucket (optional)
In the same project where your instance is located, create aCloud Storage bucket where you can store your user data.If you already have a Cloud Storage bucket, skip this step.
- Create a Cloud Storage bucket:
Replacegcloudstoragebucketscreategs://BUCKET_NAME
BUCKET_NAMEwith a bucket name that meets thebucket naming requirements.
Copy your user data
In your instance'sJupyterLab interface, selectFile >New > Terminal to open a terminal window.For user-managed notebooks instances, you can insteadconnect to your instance's terminalbyusing SSH.
Use thegcloud CLI to copy your user datato a Cloud Storage bucket. The following example commandcopies all of the files from your instance's
/home/jupyter/directoryto a directory in a Cloud Storage bucket.gcloud storage cp /home/jupyter/* gs://BUCKET_NAMEPATH --recursive
Replace the following:
BUCKET_NAME: the name of yourCloud Storage bucketPATH: the path to the directorywhere you want to copy your files, for example:/copy/jupyter/
Investigate an instance stuck in provisioning by using gcpdiag
Note: When using Vertex AI with Private Google Access to access Google CloudAPIs and web proxies for outbound access, the instances must beconfigured to bypass any web proxies or other network traffic inspection or filtering devices (for example next generation firewalls) for any hostnames in the domains listed in thePrivate Google Access documentation.gcpdiagis an open source tool. It is not an officially supported Google Cloud product.You can use thegcpdiag tool to help you identify and fix Google Cloudproject issues. For more information, see thegcpdiag project on GitHub.
gcpdiag runbook investigates potential causes for aVertex AI Workbench instance to get stuck in provisioning status,including the following areas:- Status: Checks the instance's current status to ensure that it is stuck in provisioning and not stopped or active.
- Instance's Compute Engine VM boot disk image: Checks whether the instance was created with a custom container, an official
workbench-instancesimage, Deep Learning VM Images, or unsupported images that might cause the instance to get stuck in provisioning status. - Custom scripts: Checks whether the instance is using custom startup or post-startup scripts that change the default Jupyter port or break dependencies that might cause the instance to get stuck in provisioning status.
- Environment version: Checks whether the instance is using the latest environment version by checking its upgradability. Earlier versions might cause the instance to get stuck in provisioning status.
- Instance's Compute Engine VM performance: Checks the VM's current performance to ensure that it isn't impaired by high CPU usage, insufficient memory, or disk space issues that might disrupt normal operations.
- Instance's Compute Engine serial port or system logging: Checks whether the instance has serial port logs, which are analyzed to ensure that Jupyter is running on port
127.0.0.1:8080. - Instance's Compute Engine SSH and terminal access: Checks whether the instance's Compute Engine VM is running so that the user can SSH and open a terminal to verify that space usage in 'home/jupyter' is lower than 85%. If no space is left, this might cause the instance to get stuck in provisioning status.
- External IP turned off: Checks whether external IP access is turned off. An incorrect networking configuration can cause the instance to get stuck in provisioning status.
Docker
You can rungcpdiag using a wrapper that startsgcpdiag in aDocker container. Docker orPodman must be installed.
- Copy and run the following command on your local workstation.
curl https://gcpdiag.dev/gcpdiag.sh >gcpdiag && chmod +x gcpdiag
- Execute the
gcpdiagcommand../gcpdiag runbook vertex/workbench-instance-stuck-in-provisioning \ --parameter project_id=PROJECT_ID \ --parameter instance_name=INSTANCE_NAME \ --parameter zone=ZONE
Viewavailable parameters for this runbook.
Replace the following:
- PROJECT_ID: The ID of the project containing the resource.
- INSTANCE_NAME: The name of the target Vertex AI Workbench instance within your project.
- ZONE: The zone in which your target Vertex AI Workbench instance is located.
Useful flags:
--universe-domain: If applicable, theTrusted Partner Sovereign Cloud domain hosting the resource--parameteror-p: Runbook parameters
For a list and description of allgcpdiag tool flags, see thegcpdiag usage instructions.
Permissions errors when using service account roles with Vertex AI
Issue
You get general permissions errors when you use service account roleswith Vertex AI.
These errors can appear in Cloud Logging in either the product componentlogs or audit logs. They may also appear in any combination of theaffected projects.
These issues can be caused by one or both of the following:
Use of the
Service Account Token Creatorrole when theService Account Userrole should have been used, or the other way around.These roles grant different permissions on a service account and aren'tinterchangeable. To learn about the differences between theService Account Token CreatorandService Account Userroles, seeService account roles.You've granted a service account permissions across multiple projects,which isn't permitted by default.
Solution
To resolve the issue, try one or more of the following:
Determine whether the
Service Account Token CreatororService Account Userrole is needed. To learn more, read theIAM documentation for the Vertex AIservices you are using, as well as any other product integrations thatyou are using.If you have granted a service account permissions across multiple projects,enable service accounts to be attached across projects by ensuring that
iam.disableCrossProjectServiceAccountUsage.isn't enforced. To ensure thatiam.disableCrossProjectServiceAccountUsageisn't enforced, run the following command:gcloud resource-manager org-policies disable-enforce \ iam.disableCrossProjectServiceAccountUsage \ --project=PROJECT_ID
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.