- Notifications
You must be signed in to change notification settings - Fork687
Closed
Labels
Description
We are using a ephemeralarm64 builders and are intermittently having builds stall due to a lack of capacity, even withon-demand instances.
Every time this happens, the build permanently stalls and has to be manually cancelled.
From a bit of digging it looks like addingInsufficientInstanceCapacity tothe list of what's considered a "Scaling Error" should fix this.
Redacted CloudWatch log:
2023-02-06T10:27:56.604+11:002023-02-05 23:27:56.494 WARN [runners:34ecb39a-ae35-5506-b158-efc0938129a8 index.js:120365 createRunner] No instances created by fleet request. Check configuration! Response:2023-02-06T10:27:56.604+11:00{2023-02-06T10:27:56.604+11:00FleetId: 'fleet-92368284-5b0d-44bc-0e18-af80f19be5e5',2023-02-06T10:27:56.604+11:00Errors: [2023-02-06T10:27:56.604+11:00{2023-02-06T10:27:56.604+11:00LaunchTemplateAndOverrides: {2023-02-06T10:27:56.604+11:00LaunchTemplateSpecification: {2023-02-06T10:27:56.604+11:00LaunchTemplateId: 'lt-REDACTED',2023-02-06T10:27:56.604+11:00Version: '4'2023-02-06T10:27:56.604+11:00},2023-02-06T10:27:56.604+11:00Overrides: {2023-02-06T10:27:56.604+11:00InstanceType: 'c6gd.8xlarge',2023-02-06T10:27:56.604+11:00SubnetId: 'subnet-REDACTED'2023-02-06T10:27:56.604+11:00}2023-02-06T10:27:56.604+11:00},2023-02-06T10:27:56.604+11:00Lifecycle: 'on-demand',2023-02-06T10:27:56.604+11:00ErrorCode: 'InsufficientInstanceCapacity',2023-02-06T10:27:56.604+11:00ErrorMessage: 'We currently do not have sufficient c6gd.8xlarge capacity in the Availability Zone you requested (REDACTED). Our system will be working on provisioning additional capacity. You can currently get c6gd.8xlarge capacity by not specifying an Availability Zone in your request or choosing us-west-2b, us-west-2c, us-west-2d.'2023-02-06T10:27:56.604+11:00}2023-02-06T10:27:56.604+11:00],2023-02-06T10:27:56.604+11:00Instances: []2023-02-06T10:27:56.604+11:00}2023-02-06T10:27:56.622+11:002023-02-05 23:27:56.621 WARN [runners:34ecb39a-ae35-5506-b158-efc0938129a8 index.js:120384 createRunner] Create fleet failed, error not recognized as scaling error.2023-02-06T10:27:56.622+11:00[2023-02-06T10:27:56.622+11:00{2023-02-06T10:27:56.622+11:00LaunchTemplateAndOverrides: {2023-02-06T10:27:56.622+11:00LaunchTemplateSpecification: {2023-02-06T10:27:56.622+11:00LaunchTemplateId: 'lt-REDACTED',2023-02-06T10:27:56.622+11:00Version: '4'2023-02-06T10:27:56.622+11:00},2023-02-06T10:27:56.622+11:00Overrides: {2023-02-06T10:27:56.622+11:00InstanceType: 'c6gd.8xlarge',2023-02-06T10:27:56.622+11:00SubnetId: 'subnet-REDACTED'2023-02-06T10:27:56.622+11:00}2023-02-06T10:27:56.622+11:00},2023-02-06T10:27:56.622+11:00Lifecycle: 'on-demand',2023-02-06T10:27:56.622+11:00ErrorCode: 'InsufficientInstanceCapacity',2023-02-06T10:27:56.622+11:00ErrorMessage: 'We currently do not have sufficient c6gd.8xlarge capacity in the Availability Zone you requested (REDACTED). Our system will be working on provisioning additional capacity. You can currently get c6gd.8xlarge capacity by not specifying an Availability Zone in your request or choosing REDACTED.'2023-02-06T10:27:56.622+11:00}2023-02-06T10:27:56.622+11:00]2023-02-06T10:27:56.622+11:002023-02-05 23:27:56.622 WARN [scale-runners:34ecb39a-ae35-5506-b158-efc0938129a8 index.js:120511 Runtime.handler] Ignoring error: Create fleet failed, no instance created. {"runnerType":"Org","runnerOwner":"gravitational","event":"workflow_job","id":"11123528452"}2023-02-06T10:27:56.623+11:00END RequestId: 34ecb39a-ae35-5506-b158-efc0938129a82023-02-06T10:27:56.623+11:00 REPORT RequestId: 34ecb39a-ae35-5506-b158-efc0938129a8Duration: 2269.85 msBilled Duration: 2270 msMemory Size: 512 MBMax Memory Used: 219 MBREPORT RequestId: 34ecb39a-ae35-5506-b158-efc0938129a8 Duration: 2269.85 ms Billed Duration: 2270 ms Memory Size: 512 MB Max Memory Used: 219 MB