NotificationsYou must be signed in to change notification settings
Fork1k
Star11.2k

chore: retry TestAgent_Dial subtests#19387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

deansheather merged 5 commits intomainfromdean/flake-agent-dial

Aug 18, 2025

Merged

chore: retry TestAgent_Dial subtests#19387

deansheather merged 5 commits intomainfromdean/flake-agent-dial

Aug 18, 2025

Conversation

Copy link

Member

deansheather commentedAug 18, 2025

Adds a new wrapper function testutil.RunRetry that will run the provided function multiple times until the test succeeds or the limit is reached. To accomplish this without failing the parent test, we use a fake testing.TB implementation that swallows failures until the final attempt.

Updates the TestAgent_Dial subtests to use this new wrapper. I believe the failures are coming from dropped UDP packets due to high load on the CI runner.

Closescoder/internal#595

chore: retry TestAgent_Dial subtests

a7259af

Adds a new wrapper function testutil.RunRetry that will run the providedfunction multiple times until the test succeeds or the limit is reached.To accomplish this without failing the parent test, we use a faketesting.TB implementation that swallows failures until the finalattempt.Updates the TestAgent_Dial subtests to use this new wrapper. I believethe failures are coming from dropped UDP packets due to high load on theCI runner.

deansheather requested review fromethanndickson andmafredri

August 18, 2025 05:12

github-actionsbot assigneddeansheather

Aug 18, 2025

ethanndickson approved these changes

Aug 18, 2025

View reviewed changes

Copy link

Member

ethanndickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Your flake hypothesis sounds very much plausible, and this solution seems fine 👍 Just two minor comments.

testutil/t.go OutdatedShow resolvedHide resolved

testutil/t.goShow resolvedHide resolved

PR comments

fc53b59

deansheather requested a review fromethanndickson

August 18, 2025 06:32

fixup! PR comments

a6acb60

ethanndickson approved these changes

Aug 18, 2025

View reviewed changes

mafredri reviewed

Aug 18, 2025

View reviewed changes

Copy link

Member

mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think having the fakeT implementation will be a very useful addition to the testing package, thanks!

Although, I wonder if it's the right solution here. We could also fake the network stack instead, but at the same time we obviously lose a bit of realism. So just to be clear, and considering that, I'm fine with this solution.

I left a few suggestions, and I think we should definitely change the ctx passing inRunRetry, but the other part of that suggestion is optional/for your consideration.

testutil/retry.go OutdatedShow resolvedHide resolved

testutil/retry.goShow resolvedHide resolved

testutil/retry.go OutdatedShow resolvedHide resolved

testutil/retry.go Outdated

		t.mu.Lock()
		defer t.mu.Unlock()
		t.failed = true
		t.T.Log("WARN: t.Fail called in testutil.RunRetry closure")

Copy link

Member

mafredriAug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggestion: We could give a hint here, like: rewrite test with error+early return if needed.

Copy link

MemberAuthor

deansheatherAug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not 100% sure what you mean by this comment. I refactored the handler a bit though, so let me know if you still want changes.

Copy link

Member

mafredriAug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I simply meant the instances ofWARN: XXX called in testutil.RunRetry closure message may be a bit confusing when followed byruntime.Goexit. So adding a little tip there how the user could rewrite their retrying test may be beneficial. But feel free to ignore.

Copy link

MemberAuthor

deansheatherAug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmmm, in the stdlib testing package I don't think failing logs at all, so this is most certainly an improvement over that at least. Other than the log message getting added this matches the behavior of stdlib now