Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

chore: cache terraform providers between CI test runs#17373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
hugodutka merged 4 commits intomainfromhugodutka/terraform-providers-ci-cache
Apr 28, 2025

Conversation

hugodutka
Copy link
Contributor

@hugodutkahugodutka commentedApr 12, 2025
edited
Loading

Addressescoder/internal#322.

This PR starts caching Terraform providers used byTestProvision inprovisioner/terraform/provision_test.go. The goal is to improve the reliability of this test by cutting down on the number of network calls to external services. It leverages GitHub Actions cache, whichon depot runners is persisted for 14 days by default.

Other than the aforementionedTestProvision, I couldn't find any other tests which depend on external terraform providers.

johnstcn and matifali reacted with heart emoji
@hugodutkahugodutkaforce-pushed thehugodutka/terraform-providers-ci-cache branch 12 times, most recently from9a5b5a1 to8885000CompareApril 13, 2025 19:47
@github-actionsgithub-actionsbot added the staleThis issue is like stale bread. labelApr 21, 2025
@johnstcnjohnstcn removed the staleThis issue is like stale bread. labelApr 22, 2025
@hugodutkahugodutkaforce-pushed thehugodutka/terraform-providers-ci-cache branch 11 times, most recently from9d41243 to2ab657cCompareApril 22, 2025 12:05
@hugodutkahugodutkaforce-pushed thehugodutka/terraform-providers-ci-cache branch from2ab657c to3a9e0b3CompareApril 22, 2025 12:17
@hugodutkahugodutka marked this pull request as ready for reviewApril 22, 2025 13:13
file := templateFiles[fileName]
_, err := hasher.Write([]byte(fileName))
require.NoError(t, err)
_, err = hasher.Write([]byte(file))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You need to include some sort of delimiter ordogfood: stuff will hash the same asdog: foodstuff

hugodutka reacted with thumbs up emoji

for fileName, file := range templateFiles {
filePath := filepath.Join(filesDir, fileName)
if _, err := os.Stat(filePath); os.IsNotExist(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

didn't we just check that the directory doesn't already exist? How could the file already exist if the directory doesn't?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You're right, this check is redundant. I'll remove it.

// Each test gets a unique cache dir based on its name and template files.
// This ensures that tests can download providers in parallel and that they
// will redownload providers if the template files change.
hash := hashTemplateFilesAndTestName(t, t.Name(), templateFiles)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Providers are versioned, so I don't think each test or each version of each test needs its own unique mirror.

It results in multiple copies of the providers being downloaded on each system, and there doesn't seem to be any mechanism to clean up old versions of tests. The CI caches expire after 14 days, but people's personal development environments won't get cleaned up.

terraform providers mirror can be run multiple times on the same target directory and it will merge the results:

You can run terraform providers mirror again on an existing mirror directory to update it with new packages.

https://developer.hashicorp.com/terraform/cli/commands/providers/mirror

So, what do you think of doing two passes over the test cases: one serially that sets of the mirror with the totality of the modules we need, then a second, parallel pass that runs the tests?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Doing a serial pass is pretty slow if you haven't cached the providers beforehand. I just tried it, and it takes 27 seconds to cache the providers serially in my Coder workspace. In contrast, doing it in parallel only takes 5 seconds.

The cache size across all tests is currently around 70 MB - probably not a big deal. That said, it could balloon if someone starts iterating on a test locally. To avoid that, maybe we should add cleanup logic instead of doing a serial pass?

We already know the paths to provider directories based on test hashes, so at the start ofTestProvision, we could check which ones are needed and remove the rest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Cleanup logic would also be fine here, so we don't blow up people's cache.

hugodutka reacted with thumbs up emoji
with:
path: ${{ inputs.cache-path }}
# The key doesn't need to include an OS name. The action already takes
# that into account: https://github.com/actions/cache/tree/5a3ec84eff668545956fd18022155c47e93e2684?tab=readme-ov-file#cache-version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think we're entirely out of the woods here. While the different paths on Windows vs Linux might mean that those caches are separate, different architectures on the same OS don't seem like they'd have a different cache version.

Also, the paths might not be so different on macOS vs Linux, for example.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmm, you're right about different architectures on the same OS, I haven't thought about that. I'll generate unique cache key prefixes based on os and architecture.

# job is running on the main branch.
# Without it, PRs would create one-use caches that would linger until
# expiration and we'd be charged for them. I evaluated a couple of options
# for limiting the cache to the main branch, and forking was the simplest.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can't we handle this use case by using explicitrestore andsave steps? I.e. we restore in all cases, but only save onmain?

https://github.com/actions/cache/blob/5a3ec84eff668545956fd18022155c47e93e2684/caching-strategies.md#saving-cache

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I considered it but the GitHub Actions syntax makes it unwieldy. I wanted to package all the cache steps into a single action to avoid code duplication across the 5 workflows that need the cache. I tried doing it withrestore andsave, but found thatI cannot add adefer-likesave statement in a composite action. So I'd need to add 3 additional steps to each of the workflows: one to generate the cache keys, onerestore, onesave. That's a lot of code duplication - I believe the current approach is cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Forking the entire repo seems like a lot of code duplication! It's just better hidden, and means we have yet another fork to manage.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's a good point. I'll refactor it to use the more verbose restore and save approach.

# This prevents PRs from rebuilding the cache on the first day of the month.
restore-keys: |
${{ inputs.key-prefix }}-${{ steps.dates.outputs.year-month }}-
${{ github.ref != 'refs/heads/main' && format('{0}-{1}-', inputs.key-prefix, steps.dates.outputs.prev-year-month) || '' }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't get why we need to have this special case for PRs. If we are preventing PRs from saving to the cache anyway, why do we need to restrict only PRs from restoring last month's cache if this month is not available?

Also, why are we even bothering to match on month? If today's cache isn't available, doesn't the cache action automatically match the most recent available cache based on thekey-prefix alone?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This approach lets us reset the cache completely at the start of each month. We upload an updated cache daily, and each update is based on the previous day's cache. Without the monthly cut-off onmain, any file added to the cache would persist indefinitely. The monthly reset ensures that old or obsolete files do not linger in the cache forever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If you implement cleanup in the test code, which I think is a good idea, then we'll only cache what we need daily, and a monthly reset won't be required.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm hesitant to get rid of the monthly clean-up for two reasons:

  1. If other tests start using the cache and don't clean up properly after themselves, the cache will grow.
  2. If a Terraform provider that one of the tests depends on becomes permanently unavailable, we won't be able to detect it without a cache reset.

I don't think matching on the month complicates the code too much, so I'd be for keeping it.

@hugodutkahugodutkaforce-pushed thehugodutka/terraform-providers-ci-cache branch 3 times, most recently from2c226f2 tod56aaa7CompareApril 24, 2025 12:03
@hugodutka
Copy link
ContributorAuthor

@spikecurtis I addressed your feedback:

  • started using a delimiter when calculating test hashes
  • skipped the os.Stat check
  • started cleaning up unused test caches
  • started using cache key prefixes that include the OS and the Arch
  • refactored caching to use separate restore and save steps

@hugodutkahugodutka merged commitb47d54d intomainApr 28, 2025
30 of 31 checks passed
@hugodutkahugodutka deleted the hugodutka/terraform-providers-ci-cache branchApril 28, 2025 08:57
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsApr 28, 2025
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@spikecurtisspikecurtisspikecurtis approved these changes

Assignees

@hugodutkahugodutka

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@hugodutka@spikecurtis@johnstcn

[8]ページ先頭

©2009-2025 Movatter.jp