Running instances with GPU accelerators

This page describes how to use NVIDIA graphics processing unit (GPU) hardwareaccelerators on Container-Optimized OS virtual machine (VM) instances.

Overview

By using Compute Engine, you can create VM instances runningContainer-Optimized OS that have GPUs attached. You can only use twomachine families when running GPUs on Compute Engine: accelerator-optimizedand N1 general-purpose.

For accelerator-optimized machine types, each machine type has a specificmodel of NVIDIA GPUs attached.
- For A4 accelerator-optimized machine types,NVIDIA B200 GPUs are attached.
- For A3 Ultra accelerator-optimized machine types,NVIDIA H200 141GB GPUs are attached.
- For A3 accelerator-optimized machine types,NVIDIA H100 80GB GPUs are attached.
- For A2 accelerator-optimized machine types,NVIDIA A100 GPUs are attached.These are available in both A100 40GB and A100 80GB options.
- For G2 accelerator-optimized machine types,NVIDIA L4 GPUs are attached.
For N1 general-purpose machine types, you can attach the following GPUs:

GPUs provide compute power to drive deep-learning tasks such as image recognitionand natural language processing, as well as other compute-intensive tasks suchas video transcoding and image processing.

Google Cloud provides a seamless experience for you to run your GPU workloadswithin containers on Container-Optimized OS VM instances so thatyou can benefit from other Container-Optimized OS features such assecurity and reliability.

To learn more about the use cases for GPUs, seeCloud GPUs.

To learn about using GPUs on Google Kubernetes Engine (GKE), seeRunning GPUs on GKE.

Requirements

Running GPUs on Container-Optimized OS VM instances has the followingrequirements:

Container-Optimized OS x86 images: only x86-based Container-Optimized OSimages support running GPUs. Arm-based Container-Optimized OS images don'tsupport the feature.
Container-Optimized OS version: To run GPUs on Container-Optimized OS VMinstances, theContainer-Optimized OS release milestonemust be a LTS milestone and the milestone number must be 85 or higher.
GPU quota: You must haveCompute Engine GPU quotain your chosen zone before you can create Container-Optimized OS VMinstances with GPUs. To ensure that you have enough GPU quota in your project,seeQuotas in the Google Cloud console.
If you require additional GPU quota, you mustrequest GPU quota in the Google Cloud console.If you have an established billing account, your project automatically receivesGPU quota after you submit the quota request.
Note: By default,free trial accountsdon't receive GPU quota.
NVIDIA GPU drivers: You must install NVIDIA GPU drivers by yourself onyour Container-Optimized OS VM instances. Thissectionexplains how to install the drivers on Container-Optimized OS VMinstances.

Create a VM

The following sections explain how to run GPUs on Container-Optimized OSVMs.

First, you need a Container-Optimized OS VM instance with GPUs.The method used to create a VM depends on the GPU model selected.

To create a Container-Optimized OS VM that has attached NVIDIA H100,A100, or L4 GPUs, seeCreate an accelerator-optimized VM.
To create a Container-Optimized OS VM that has attached NVIDIA T4, P4,P100, or V100 GPUs, seeCreate an N1 VM that has attached GPUs.

You can alsoadd GPUsto existing Container-Optimized OS VM instances.

When you create VMs, remember to choose images or image families fromthecos-cloud image project.

To check all GPUs attached to your current Container-Optimized OS VMinstances, run the following command:

gcloud compute instances describeINSTANCE_NAME \    --project=PROJECT_ID \    --zoneZONE \    --format="value(guestAccelerators)"

Replace the following:

INSTANCE_NAME: Thenameof the new VM instance.
PROJECT_ID: TheID of your project.
zone: Thezone for the VM instance.

Install NVIDIA GPU device drivers

After you create an instance with one or more GPUs, your system requires devicedrivers so that your applications can access the device. This guide shows theways to install NVIDIA proprietary drivers on Container-Optimized OS VMinstances.

Container-Optimized OS provides a built-in utilitycos-extensions tosimplify the NVIDIA driver installation process. By running the utility, usersagree to accept the NVIDIA license agreement.

Identify GPU driver versions

Each version of Container-Optimized OS image has a list of supportedNVIDIA GPU driver versions for each GPU type, along with a default driver foreach type.For a complete list of supported versions,see therelease notes of themajor Container-Optimized OS LTS milestones.

Note: When using different GPU types on the same Container-Optimized OS version, the GPU driver version associated with the same label may differ. For example, in the supported GPU driver version list for Container-Optimized OS versioncos-105-17412-448-12, theNVIDIA L4 has aDefault GPU driver version of535.183.01, whereas theNVIDIA P100 has aDefault GPU driver version of470.256.02.

You may also check all the GPU driver versions supported by the GPU on yourContainer-Optimized OS VM instance by running the following command:

sudocos-extensionslist

Identify the required CUDA toolkit version

If your applications useCUDA,install NVIDIA's CUDA toolkit in your containers. Each version of CUDArequires a minimum GPU driver version or a later version. To check the minimumGPU driver version required for your version of CUDA, seeCUDA Toolkit and Compatible Driver Versions.Ensure that the Container-Optimized OS version you are usinghas the correct GPU driver version for the version of CUDA you are using.

Install the driver

You can install GPUs by using either shell commands, startup scripts, orcloud-init. All three methods use thesudo cos-extensions install gpu commandto install the default GPU driver for your Container-Optimized OS LTS version.

Note: Before attempting to install the drivers, ensure that thegcr-online.targetanddocker.socket services are active.To run the installation scripts, thecos-extensions utility requiresthehttps://www.googleapis.com/auth/devstorage.read_onlyscope for communicating with thegcr.io (cos-cloud).Without this scope, downloading of the driver and its dependencies will fail.This scope is one of the defaultscopes,which is typically added when you create a VM.

Shell

After youconnect to your Container-Optimized OS VM instances,you can run the following command manually to install drivers:

sudocos-extensionsinstallgpu

Note: You need to run the preceding command on every VM reboot to configureGPU drivers.

Startup scripts

You can also install GPU drivers through startup scripts. You canprovide the startup scriptwhen you create VM instances orapply the script to running VM instancesand then reboot the VMs. This lets you install drivers without connectingto the VMs. It also makes sure the GPU drivers are configured on everyVM reboot.

The following is an example startup script to install drivers:

#! /bin/bashsudo cos-extensions install gpu

Cloud-init

Cloud-init is similar to startup scripts but more powerful. The followingexample shows how to install GPU driver through cloud-init:

#cloud-configruncmd:-cos-extensions install gpu

Using cloud-init lets you specify the dependencies so that your GPUapplications will only run after the driver has been installed. See theEnd-to-end: Running a GPU application on Container-Optimized OSsection for more details.

For more information about how to use cloud-init on Container-Optimized OS VMinstances, see thecreating and configuring instancespage.

In some cases the default driver included with Container-Optimized OSdoesn't meet the minimum driver requirements of your CUDA toolkit or yourGPU model. See theRequired NVIDIA driverversions for the version requirements for specific types of GPUs.

To install a specific GPU driver version, run the following command:

sudo cos-extensions install gpu -- -version=DRIVER_VERSION

ReplaceDRIVER_VERSION with one of the followingoptions:

default: Installs the default driver designated by theContainer-Optimized OS release. This version receives bug fixes andsecurity updates.
latest: Installs the latest driver available in theContainer-Optimized OS release. Be aware that this might introducecompatibility changes due to potential major version updates across COSreleases.
The full version: Use this to pin to a specific version for workloadssensitive to driver changes. For example, specify version535.183.01.
NVIDIA driver branch: Installs the latest stable driver within a specificNVIDIA branch to stay current with security updates and bug fixes within thatbranch. For example, specify branchR535. This option is available startingfromcos-gpu-installer:v2.2.1.

To see the available versions for each of those options, run the command toIdentify GPU driver versions.

Pass parameters to the kernel modules

You can pass specific parameters to the NVIDIA kernel module upon installation using the--module-arg flag. This flag is useful for enabling or disabling certain driver features. The flag can be used multiple times to pass several arguments.

For example, on a COS VM, you could use the following command to install the NVIDIA driver and load thenvidia.ko kernel module with theNVreg_EnableGpuFirmware=0 parameter.

sudo cos-extensions install gpu -- --module-arg nvidia.NVreg_EnableGpuFirmware=0

Preload the driver

You can preload the GPU driver on your Container-Optimized OS instance even when no GPU device is attached. This is useful for preparing environments or testing configurations before attaching physical GPU hardware.

To preload the GPU driver, run the following command:

sudo cos-extensions install gpu -- -no-verify -target-gpu=GPU_DEVICE

This command is supported starting fromcos-gpu-installer:v2.3.0. The following flags apply:

-no-verify: Downloads and prepares the driver files but skips kernel module loading and installation verification.
-target-gpu: Specifies the GPU device to ensure the correct driver is preloaded, preventing compatibility issues when the GPU device is later attached.ReplaceGPU_DEVICE with a specific GPU model (for example,NVIDIA_L4) listed in theOverview.If-target-gpu is not specified, the default GPU driver will be preloaded.

Verify the installation

You can run the following commands on your Container-Optimized OS VMinstances to manually verify the installation of the GPU drivers. The outputfrom the command shows the GPU devices information, such as devices state anddriver version.

# Make the driver installation path executable by re-mounting it.sudomount--bind/var/lib/nvidia/var/lib/nvidiasudomount-oremount,exec/var/lib/nvidia/var/lib/nvidia/bin/nvidia-smi

Configure containers to consume GPUs

After the GPU drivers are installed, you can configure containers toconsume GPUs. The following example shows you how to run a CUDAapplication in a Docker container that consumes/dev/nvidia0:

dockerrun\--volume/var/lib/nvidia/lib64:/usr/local/nvidia/lib64\--volume/var/lib/nvidia/bin:/usr/local/nvidia/bin\--device/dev/nvidia0:/dev/nvidia0\--device/dev/nvidia-uvm:/dev/nvidia-uvm\--device/dev/nvidiactl:/dev/nvidiactl\gcr.io/google_containers/cuda-vector-add:v0.1

You can run your containers through cloud-init to specifythe dependency between driver installation and your containers. see theEnd-to-end: Running a GPU application on Container-Optimized OSsection for more details.

End-to-end: Running a GPU application on Container-Optimized OS

The following end-to-end example shows you how to use cloud-init to configureContainer-Optimized OS VM instances that provision a GPU applicationcontainermyapp:latest after the GPU driver has been installed:

#cloud-configusers:-name:myuseruid:2000write_files:-path:/etc/systemd/system/install-gpu.servicepermissions:0644owner:rootcontent:|[Unit]Description=Install GPU driversWants=gcr-online.target docker.socketAfter=gcr-online.target docker.socket[Service]User=rootType=oneshotExecStart=cos-extensions install gpuStandardOutput=journal+consoleStandardError=journal+console-path:/etc/systemd/system/myapp.servicepermissions:0644owner:rootcontent:|[Unit]Description=Run a myapp GPU application containerRequires=install-gpu.serviceAfter=install-gpu.service[Service]User=rootType=oneshotRemainAfterExit=trueExecStart=/usr/bin/docker run --rm -u 2000 --name=myapp --device /dev/nvidia0:/dev/nvidia0 myapp:latestStandardOutput=journal+consoleStandardError=journal+consoleruncmd:-systemctl daemon-reload-systemctl start install-gpu.service-systemctl start myapp.service

About the NVIDIA CUDA-X libraries

CUDA® is NVIDIA'sparallel computing platform and programming model for GPUs. To use CUDAapplications, the libraries must be present inthe image you are using. You can do any of the following to add the NVIDIACUDA-X libraries:

Use an image with the NVIDIA CUDA-X libraries pre-installed. For example, youcan useGoogle's Deep Learning Containers.These containers pre-install the key data science frameworks, the NVIDIA CUDA-Xlibraries, and tools. Alternatively,NVIDIA's CUDA image contains the NVIDIA CUDA-X libraries only.
Build and use your own image. In this case, include/usr/local/cuda-XX.X/lib64, which contains the NVIDIA CUDA-X libraries, and/usr/local/nvidia/lib64, which contains the NVIDIA device drivers, intheLD_LIBRARY_PATH environment variable. For/usr/local/cuda-XX.X/lib64, the name of the directory depends on the versionof the image you used. For example, the NVIDIA CUDA-X libraries and debugutilities in Docker containers can be at/usr/local/cuda-11.0/lib64 and/usr/local/nvidia/bin, respectively.

Security

Just like other kernel modules on Container-Optimized OS, GPU drivers arecryptographically signed and verified by keys that are built into theContainer-Optimized OS kernel. Unlike some other distros,Container-Optimized OS does not allow users to enroll their Machine OwnerKey (MOK) and use the keys to sign custom kernel modules. This is to ensure theintegrity of the Container-Optimized OS kernel and reduce the attacksurface.

Restrictions

Container-Optimized OS version restrictions

Only Container-Optimized OS LTS release milestone 85 and later support thecos-extensions utility mentioned in theInstalling NVIDIA GPU device driverssection. For earlier Container-Optimized OS release milestones, use thecos-gpu-installer open source tool to manually install GPU drivers.

VM instances restrictions

VM instances with GPUs have specific restrictions that make them behavedifferently than other instance types. For more information, see theCompute EngineGPU restrictions page.

Quota and availability

GPUs are available in specificregions and zones.When yourequest GPU quota, consider theregions in which you intend to run your Container-Optimized OS VM instances.

For a complete list of applicable regions and zones, seeGPUs on Compute Engine.You can also see GPUs available in your zone using the Google Cloud CLI.

gcloudcomputeaccelerator-typeslist

Pricing

For GPU pricing information, see theCompute Engine pricing page.

Supportability

Each Container-Optimized OSrelease versionhas at least one supported NVIDIA GPU driver version. TheContainer-Optimized OS team qualifies the supported GPU drivers againstthe Container-Optimized OS version before release to make sure they arecompatible. New versions of the NVIDIA GPU drivers may be made available fromtime-to-time. Some GPU driver versions won't qualify forContainer-Optimized OS, and the qualification timeline is not guaranteed.

When the Container-Optimized OS team releases a new version on arelease milestonewe try to support the latest GPU driver version on the correspondingdriver branch. This is to fix CVEs discovered in GPU drivers as soon as possible.

If a Container-Optimized OS customer identifies an issue that's related tothe NVIDIA GPU drivers, the customer must work directly with NVIDIA for support.If the issue is not driver specific, then users can open a request with Cloud Customer Care.

What's next

Learn more aboutrunning containers on a Container-Optimized OS VM instance.
Learn more aboutGPUs on Compute Engine.
Learn more about requestingGPU quota.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Running instances with GPU accelerators Stay organized with collections Save and categorize content based on your preferences.