Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit5b72a43

Browse files
chore: improve CI reliability (#16169)
We have an effort underway to replace `dbmem` (#15109), and consequentlywe've begun running our full test-suite (with Postgres) on all supportedOSs - Windows, MacOS, and Linux, since#15520.Since this change, we've seen a marked decrease in the success rate ofour builds on `main` (note how the Windows/MacOS failures account forthe vast majority of failed builds):![image](https://github.com/user-attachments/assets/a02c15b7-037d-428a-a600-2aed60553ac0)We're still investigating why these OSs are a lot less reliable. It'slikely that the VMs on which the builds are run have differentcharacteristics from our Ubuntu runners such as disk I/O, networklatency, or something else.**In the meantime, we need to start trusting CI failures in `main`again, as the current failures are too noisy / vague for us tocorrect.**We've also considered hosting our own runners where possible so we canget OS-level observability to rule out some possibilities.See the [meetingnotes](https://www.notion.so/coderhq/CI-Investigation-Call-Notes-17dd579be59280d8897cc9fe4bb46695?pvs=6&utm_content=17dd579b-e592-80d8-897c-c9fe4bb46695&utm_campaign=T1ZPT2FL0&n=slack&n=slack_link_unfurl)where we linked into this for more detail.This PR introduces several changes:1. Moves the full test-suite with Postgres on Windows/MacOS to the`nightly-gauntlet` workflowtradeoff: this means that any regressions may be more difficult todiscover since we merge to main several times a day2. Run only the CLI test-suite on each PR / merge to `main` onWindows/MacOS3. `test-go` is still running the full test-suite against all OSs(including the CLI ones), but will soon be removed once#15109 iscompleted since it uses `dbmem`4. Changes `nightly-gauntlet` to run at 4AM: we've seen severalinstances of the runner being stopped externally, and we're _guessing_this may have something to do with the midnight UTC execution time, whenother cron jobs may run5. Removes the existing `nightly-gauntlet` jobs since they haven'tpassed in a long time, indicating that nobody cares enough to fix themand they don't provide diagnostic value; we can restore them later ifnecessaryI've manually run both these new workflows successfully:- `ci`:https://github.com/coder/coder/actions/runs/12825874176/job/35764724907- `nightly-gauntlet`:https://github.com/coder/coder/actions/runs/12825539092---------Signed-off-by: Danny Kopping <danny@coder.com>Co-authored-by: Muhammad Atif Ali <atif@coder.com>
1 parent738a7f6 commit5b72a43

File tree

3 files changed

+121
-74
lines changed

3 files changed

+121
-74
lines changed

‎.github/workflows/ci.yaml‎

Lines changed: 56 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -378,8 +378,62 @@ jobs:
378378
with:
379379
api-key:${{ secrets.DATADOG_API_KEY }}
380380

381+
# We don't run the full test-suite for Windows & MacOS, so we just run the CLI tests on every PR.
382+
# We run the test suite in test-go-pg, including CLI.
383+
test-cli:
384+
runs-on:${{ matrix.os == 'macos-latest' && github.repository_owner == 'coder' && 'depot-macos-latest' || matrix.os == 'windows-2022' && github.repository_owner == 'coder' && 'windows-latest-16-cores' || matrix.os }}
385+
needs:changes
386+
if:needs.changes.outputs.go == 'true' || needs.changes.outputs.ci == 'true' || github.ref == 'refs/heads/main'
387+
strategy:
388+
matrix:
389+
os:
390+
-macos-latest
391+
-windows-2022
392+
steps:
393+
-name:Harden Runner
394+
uses:step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f# v2.10.2
395+
with:
396+
egress-policy:audit
397+
398+
-name:Checkout
399+
uses:actions/checkout@eef61447b9ff4aafe5dcd4e0bbf5d482be7e7871# v4.2.1
400+
with:
401+
fetch-depth:1
402+
403+
-name:Setup Go
404+
uses:./.github/actions/setup-go
405+
406+
-name:Setup Terraform
407+
uses:./.github/actions/setup-tf
408+
409+
# Sets up the ImDisk toolkit for Windows and creates a RAM disk on drive R:.
410+
-name:Setup ImDisk
411+
if:runner.os == 'Windows'
412+
uses:./.github/actions/setup-imdisk
413+
414+
-name:Test CLI
415+
env:
416+
TS_DEBUG_DISCO:"true"
417+
LC_CTYPE:"en_US.UTF-8"
418+
LC_ALL:"en_US.UTF-8"
419+
shell:bash
420+
run:|
421+
# By default Go will use the number of logical CPUs, which
422+
# is a fine default.
423+
PARALLEL_FLAG=""
424+
425+
make test-cli
426+
427+
-name:Upload test stats to Datadog
428+
timeout-minutes:1
429+
continue-on-error:true
430+
uses:./.github/actions/upload-datadog
431+
if:success() || failure()
432+
with:
433+
api-key:${{ secrets.DATADOG_API_KEY }}
434+
381435
test-go-pg:
382-
runs-on:${{ matrix.os == 'ubuntu-latest' && github.repository_owner == 'coder' && 'depot-ubuntu-22.04-4' || matrix.os== 'macos-latest' && github.repository_owner == 'coder' && 'depot-macos-latest' || matrix.os == 'windows-2022' && github.repository_owner == 'coder' && 'windows-latest-16-cores' || matrix.os}}
436+
runs-on:${{ matrix.os == 'ubuntu-latest' && github.repository_owner == 'coder' && 'depot-ubuntu-22.04-4' || matrix.os }}
383437
needs:changes
384438
if:needs.changes.outputs.go == 'true' || needs.changes.outputs.ci == 'true' || github.ref == 'refs/heads/main'
385439
# This timeout must be greater than the timeout set by `go test` in
@@ -391,8 +445,6 @@ jobs:
391445
matrix:
392446
os:
393447
-ubuntu-latest
394-
-macos-latest
395-
-windows-2022
396448
steps:
397449
-name:Harden Runner
398450
uses:step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f# v2.10.2
@@ -423,39 +475,11 @@ jobs:
423475
LC_ALL:"en_US.UTF-8"
424476
shell:bash
425477
run:|
426-
# if macOS, install google-chrome for scaletests
427-
# As another concern, should we really have this kind of external dependency
428-
# requirement on standard CI?
429-
if [ "${{ matrix.os }}" == "macos-latest" ]; then
430-
brew install google-chrome
431-
fi
432-
433478
# By default Go will use the number of logical CPUs, which
434479
# is a fine default.
435480
PARALLEL_FLAG=""
436481
437-
# macOS will output "The default interactive shell is now zsh"
438-
# intermittently in CI...
439-
if [ "${{ matrix.os }}" == "macos-latest" ]; then
440-
touch ~/.bash_profile && echo "export BASH_SILENCE_DEPRECATION_WARNING=1" >> ~/.bash_profile
441-
fi
442-
443-
if [ "${{ runner.os }}" == "Linux" ]; then
444-
make test-postgres
445-
elif [ "${{ runner.os }}" == "Windows" ]; then
446-
# Create a temp dir on the R: ramdisk drive for Windows. The default
447-
# C: drive is extremely slow: https://github.com/actions/runner-images/issues/8755
448-
mkdir -p "R:/temp/embedded-pg"
449-
go run scripts/embedded-pg/main.go -path "R:/temp/embedded-pg"
450-
# Reduce test parallelism, mirroring what we do for race tests.
451-
# We'd been encountering issues with timing related flakes, and
452-
# this seems to help.
453-
DB=ci gotestsum --format standard-quiet -- -v -short -count=1 -parallel 4 -p 4 ./...
454-
else
455-
go run scripts/embedded-pg/main.go
456-
# Reduce test parallelism, like for Windows above.
457-
DB=ci gotestsum --format standard-quiet -- -v -short -count=1 -parallel 4 -p 4 ./...
458-
fi
482+
make test-postgres
459483
460484
-name:Upload test stats to Datadog
461485
timeout-minutes:1

‎.github/workflows/nightly-gauntlet.yaml‎

Lines changed: 61 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,27 @@
33
name:nightly-gauntlet
44
on:
55
schedule:
6-
# Every day atmidnight
7-
-cron:"00 * **"
6+
# Every day at4AM
7+
-cron:"04 * *1-5"
88
workflow_dispatch:
99

1010
permissions:
1111
contents:read
1212

1313
jobs:
14-
go-race:
15-
# While GitHub's toaster runners are likelier to flake, we want consistency
16-
# between this environment and the regular test environment for DataDog
17-
# statistics and to only show real workflow threats.
18-
runs-on:${{ github.repository_owner == 'coder' && 'depot-ubuntu-22.04-8' || 'ubuntu-latest' }}
19-
# This runner costs 0.016 USD per minute,
20-
# so 0.016 * 240 = 3.84 USD per run.
21-
timeout-minutes:240
14+
test-go-pg:
15+
runs-on:${{ matrix.os == 'macos-latest' && github.repository_owner == 'coder' && 'depot-macos-latest' || matrix.os == 'windows-2022' && github.repository_owner == 'coder' && 'windows-latest-16-cores' || matrix.os }}
16+
if:github.ref == 'refs/heads/main'
17+
# This timeout must be greater than the timeout set by `go test` in
18+
# `make test-postgres` to ensure we receive a trace of running
19+
# goroutines. Setting this to the timeout +5m should work quite well
20+
# even if some of the preceding steps are slow.
21+
timeout-minutes:25
22+
strategy:
23+
matrix:
24+
os:
25+
-macos-latest
26+
-windows-2022
2227
steps:
2328
-name:Harden Runner
2429
uses:step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f# v2.10.2
@@ -27,58 +32,72 @@ jobs:
2732

2833
-name:Checkout
2934
uses:actions/checkout@eef61447b9ff4aafe5dcd4e0bbf5d482be7e7871# v4.2.1
35+
with:
36+
fetch-depth:1
3037

3138
-name:Setup Go
3239
uses:./.github/actions/setup-go
3340

3441
-name:Setup Terraform
3542
uses:./.github/actions/setup-tf
3643

37-
-name:Run Tests
38-
run:|
39-
# -race is likeliest to catch flaky tests
40-
# due to correctness detection and its performance
41-
# impact.
42-
gotestsum --junitfile="gotests.xml" -- -timeout=240m -count=10 -race ./...
44+
# Sets up the ImDisk toolkit for Windows and creates a RAM disk on drive R:.
45+
-name:Setup ImDisk
46+
if:runner.os == 'Windows'
47+
uses:./.github/actions/setup-imdisk
4348

44-
-name:Upload test results to DataDog
45-
uses:./.github/actions/upload-datadog
46-
if:always()
47-
with:
48-
api-key:${{ secrets.DATADOG_API_KEY }}
49+
-name:Test with PostgreSQL Database
50+
env:
51+
POSTGRES_VERSION:"13"
52+
TS_DEBUG_DISCO:"true"
53+
LC_CTYPE:"en_US.UTF-8"
54+
LC_ALL:"en_US.UTF-8"
55+
shell:bash
56+
run:|
57+
# if macOS, install google-chrome for scaletests
58+
# As another concern, should we really have this kind of external dependency
59+
# requirement on standard CI?
60+
if [ "${{ matrix.os }}" == "macos-latest" ]; then
61+
brew install google-chrome
62+
fi
4963
50-
go-timing:
51-
# We run these tests with p=1 so we don't need a lot of compute.
52-
runs-on:${{ github.repository_owner == 'coder' && 'depot-ubuntu-22.04' || 'ubuntu-latest' }}
53-
timeout-minutes:10
54-
steps:
55-
-name:Harden Runner
56-
uses:step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f# v2.10.2
57-
with:
58-
egress-policy:audit
64+
# By default Go will use the number of logical CPUs, which
65+
# is a fine default.
66+
PARALLEL_FLAG=""
5967
60-
-name:Checkout
61-
uses:actions/checkout@eef61447b9ff4aafe5dcd4e0bbf5d482be7e7871# v4.2.1
68+
# macOS will output "The default interactive shell is now zsh"
69+
# intermittently in CI...
70+
if [ "${{ matrix.os }}" == "macos-latest" ]; then
71+
touch ~/.bash_profile && echo "export BASH_SILENCE_DEPRECATION_WARNING=1" >> ~/.bash_profile
72+
fi
6273
63-
-name:Setup Go
64-
uses:./.github/actions/setup-go
74+
if [ "${{ runner.os }}" == "Windows" ]; then
75+
# Create a temp dir on the R: ramdisk drive for Windows. The default
76+
# C: drive is extremely slow: https://github.com/actions/runner-images/issues/8755
77+
mkdir -p "R:/temp/embedded-pg"
78+
go run scripts/embedded-pg/main.go -path "R:/temp/embedded-pg"
79+
else
80+
go run scripts/embedded-pg/main.go
81+
fi
6582
66-
-name:Run Tests
67-
run:|
68-
gotestsum --junitfile="gotests.xml" -- --tags="timing" -p=1 -run='_Timing/' ./...
83+
# Reduce test parallelism, mirroring what we do for race tests.
84+
# We'd been encountering issues with timing related flakes, and
85+
# this seems to help.
86+
DB=ci gotestsum --format standard-quiet -- -v -short -count=1 -parallel 4 -p 4 ./...
6987
70-
-name:Upload test results to DataDog
88+
-name:Upload test stats to Datadog
89+
timeout-minutes:1
90+
continue-on-error:true
7191
uses:./.github/actions/upload-datadog
72-
if:always()
92+
if:success() || failure()
7393
with:
7494
api-key:${{ secrets.DATADOG_API_KEY }}
7595

7696
notify-slack-on-failure:
7797
needs:
78-
-go-race
79-
-go-timing
98+
-test-go-pg
8099
runs-on:ubuntu-latest
81-
if:failure()
100+
if:failure() && github.ref == 'refs/heads/main'
82101

83102
steps:
84103
-name:Send Slack notification

‎Makefile‎

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -807,6 +807,10 @@ test:
807807
$(GIT_FLAGS) gotestsum --format standard-quiet -- -v -short -count=1 ./...
808808
.PHONY: test
809809

810+
test-cli:
811+
$(GIT_FLAGS) gotestsum --format standard-quiet -- -v -short -count=1 ./cli/...
812+
.PHONY: test-cli
813+
810814
# sqlc-cloud-is-setup will fail if no SQLc auth token is set. Use this as a
811815
# dependency for any sqlc-cloud related targets.
812816
sqlc-cloud-is-setup:

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp