Movatterモバイル変換


[0]ホーム

URL:


Amazon ECS task definitions for GPU workloads - Amazon Elastic Container Service
DocumentationAmazon ECSDeveloper Guide
ConsiderationsShare GPUsWhat to do if you need a P2 instance

Amazon ECS task definitions for GPU workloads

Amazon ECS supports workloads that use GPUs, when you create clusters with container instancesthat support GPUs. Amazon EC2 GPU-based container instances that use the p2, p3, p5, g3, g4, andg5 instance types provide access to NVIDIA GPUs. For more information, seeLinux Accelerated ComputingInstances in theAmazon EC2 Instance Types guide.

Amazon ECS provides a GPU-optimized AMI that comes with pre-configured NVIDIA kernel driversand a Docker GPU runtime. For more information, seeAmazon ECS-optimized Linux AMIs.

You can designate a number of GPUs in your task definition for task placementconsideration at a container level. Amazon ECS schedules to available container instances thatsupport GPUs and pin physical GPUs to proper containers for optimal performance.

The following Amazon EC2 GPU-based instance types are supported. For more information, seeAmazon EC2 P2 Instances,Amazon EC2 P3 Instances,Amazon EC2 P4d Instances,Amazon EC2 P5 Instances,Amazon EC2 G3 Instances,Amazon EC2 G4 Instances,Amazon EC2 G5 Instances,Amazon EC2 G6 Instances, andAmazon EC2 G6e Instances.

Instance type GPUs GPU memory (GiB) vCPUs Memory (GiB)

p3.2xlarge

1

16

8

61

p3.8xlarge

4

64

32

244

p3.16xlarge

8

128

64

488

p3dn.24xlarge

8

256

96

768

p4d.24xlarge

8320961152
p5.48xlarge86401922048

g3s.xlarge

1

8

4

30.5

g3.4xlarge

1

8

16

122

g3.8xlarge

2

16

32

244

g3.16xlarge

4

32

64

488

g4dn.xlarge

1

16

4

16

g4dn.2xlarge

1

16

8

32

g4dn.4xlarge

1

16

16

64

g4dn.8xlarge

1

16

32

128

g4dn.12xlarge

4

64

48

192

g4dn.16xlarge

1

16

64

256

g5.xlarge

1

24

4

16

g5.2xlarge

1

24

8

32

g5.4xlarge

1

24

16

64

g5.8xlarge

1

24

32

128

g5.16xlarge

1

24

64

256

g5.12xlarge

4

96

48

192

g5.24xlarge

4

96

96

384

g5.48xlarge

8

192

192

768

g6.xlarge124416
g6.2xlarge124832
g6.4xlarge1241664
g6.8xlarge12432128
g6.16.xlarge12464256
g6.12xlarge49648192
g6.24xlarge49696384
g6.48xlarge8192192768
g6.metal8192192768
gr6.4xlarge12416128
g6e.xlarge148432
g6e.2xlarge148864
g6e.4xlarge14816128
g6e.8xlarge14832256
g6e16.xlarge14864512
g6e12.xlarge419248384
g6e24.xlarge419296768
g6e48.xlarge83841921536
gr6.8xlarge12432256

You can retrieve the Amazon Machine Image (AMI) ID for Amazon ECS-optimized AMIs by queryingthe AWS Systems Manager Parameter Store API. Using this parameter, you don't need to manually look upAmazon ECS-optimized AMI IDs. For more information about the Systems Manager Parameter Store API, seeGetParameter. The user that youuse must have thessm:GetParameter IAM permission to retrieve theAmazon ECS-optimized AMI metadata.

aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux-2/gpu/recommended --regionus-east-1

Considerations

We recommend that you consider the following before you begin working with GPUs onAmazon ECS.

Share GPUs

When you want to share GPUs, you need to configure the following.

  1. Remove GPU resource requirements from your task definitions so that Amazon ECS doesnot reserve any GPUs that should be shared.

  2. Add the following user data to your instances when you want to share GPUs.This will make nvidia the default Docker container runtime on the containerinstance so that all Amazon ECS containers can use the GPUs. For more information seeRuncommands when you launch an EC2 instance with user data input in theAmazon EC2 User Guide.

    const userData = ec2.UserData.forLinux(); userData.addCommands( 'sudo rm /etc/sysconfig/docker', 'echo DAEMON_MAXFILES=1048576 | sudo tee -a /etc/sysconfig/docker', 'echo OPTIONS="--default-ulimit nofile=32768:65536 --default-runtime nvidia" | sudo tee -a /etc/sysconfig/docker', 'echo DAEMON_PIDFILE_TIMEOUT=10 | sudo tee -a /etc/sysconfig/docker', 'sudo systemctl restart docker',);
  3. Set theNVIDIA_VISIBLE_DEVICES environment variable on yourcontainer. You can do this by specifying the environment variable in your taskdefinition. For information on the valid values, seeGPU Enumeration on the NVIDIA documentation site.

What to do if you need a P2 instance

If you need to use P2 instance, you can use one of the following options to continueusing the instances.

You must modify the instance user data for both options. For more information seeRuncommands when you launch an EC2 instance with user data input in theAmazon EC2 User Guide.

Use the last supported GPU-optimized AMI

You can use the20230906 version of the GPU-optimized AMI, and add thefollowing to the instance user data.

Replace cluster-name with the name of your cluster.

#!/bin/bashecho "exclude=*nvidia* *cuda*" >> /etc/yum.confecho "ECS_CLUSTER=cluster-name" >> /etc/ecs/ecs.config

Use the latest GPU-optimized AMI, and update the userdata

You can add the following to the instance user data. This uninstalls the Nvidia535/Cuda12.2 drivers, and then installs the Nvidia 470/Cuda11.4 drivers and fixes theversion.

#!/bin/bashyum remove -y cuda-toolkit* nvidia-driver-latest-dkms*tmpfile=$(mktemp)cat >$tmpfile <<EOF[amzn2-nvidia]name=Amazon Linux 2 Nvidia repositorymirrorlist=\$awsproto://\$amazonlinux.\$awsregion.\$awsdomain/\$releasever/amzn2-nvidia/latest/\$basearch/mirror.listpriority=20gpgcheck=1gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pubenabled=1exclude=libglvnd-*EOFmv $tmpfile /etc/yum.repos.d/amzn2-nvidia-tmp.repoyum install -y system-release-nvidia cuda-toolkit-11-4 nvidia-driver-latest-dkms-470.182.03yum install -y libnvidia-container-1.4.0 libnvidia-container-tools-1.4.0 nvidia-container-runtime-hook-1.4.0 docker-runtime-nvidia-1echo "exclude=*nvidia* *cuda*" >> /etc/yum.confnvidia-smi

Create your own P2 compatible GPU-optimizedAMI

You can create your own custom Amazon ECS GPU-optimized AMI that is compatible with P2instances, and then launch P2 instances using the AMI.

Task definition use cases
Use GPUs with Amazon ECS Managed Instances

[8]
ページ先頭

©2009-2025 Movatter.jp