-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(testutil): add lazy timeout context with location-based reset #21120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9896675 to
e6e4582
Compare
195362e to
c9238e2
Compare
It's common to create a context early in a test body, then do setup work unrelated to that context. By the time the context is actually used, it may have already timed out. This was detected as test failures in #21091. The new Context() function returns a context that resets its timeout when accessed from new lines in the test file. The timeout does not begin until the context is first used (lazy initialization). This is useful for integration tests that pass contexts through many subsystems, where each subsystem should get a fresh timeout window. Key behaviors: - Timer starts on first Done(), Deadline(), or Err() call - Value() does not trigger initialization (used for tracing/logging) - Each unique line in a _test.go file gets a fresh timeout window - Same-line access (e.g., in loops) does not reset - Expired contexts cannot be resurrected Limitations: - Wrapping with a child context (e.g., context.WithCancel) prevents resets since the child's methods don't call through to the parent - Storing the Done() channel prevents resets on subsequent accesses The original fixed-timeout behavior is available via ContextFixed().
c9238e2 to
166e3b0
Compare
|
We can't change the timeout for each individual call which is unfortunate. I know a lot of tests just use a single long context for the entire test, but some tests do have multiple contexts with different timeouts for different calls. Also, the behavior of resetting on every new call means we can't set test-wide timeouts anymore (except with the Maybe we need a different API to solve these things? Here's something off the top of my head: func TestBlah(t *testing.T) {
xctx := testutil.ContextProvider(t, testutil.WaitLong) // full test timeout, lazily creates a parent context when we hit the first call
// ...
err := MyFunction(xctx.New(testutil.WaitMedium), param) // makes a new "child" context from the parent
}
func testutil.ContextProvider(t *testing.T, testTimeout time.Duration) *testutil.ParentContext {
return &testutil.ParentContext{
testTimeout: testTimeout,
}
}
type testutil.ParentContext struct {
testTimeout time.Duration
// todo: synchronization fields...
ctx context.Context
}
func (c *testutil.ParentContext) New(timeout time.Duration) context.Context {
c.ensureInitialized() // ensures c.ctx is set
return context.WithTimeout(c.ctx, timeout)
}
// todo: Other methods as needed, e.g. NewWithCancel, NewWithDeadline^ Which would just be a very light wrapper around the functions in the context package. It's a shame that we have tests that initialize the context early followed by heavy non-context consuming startup cost. Does this mostly just stem from tests using coderdtest? |
deansheather
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment above oops
|
Thanks for taking a look and the discussion @deansheather, I didn't make it clear in the ticket but this is more of a design/proposal to spark discussion.
This is a valid concern. The reasoning is that in a massively parallel test suite, it's hard to reason about what length of timeout is actually valid, and rather move the thought process to "maximum time we want this test to sit idle". Do keep in mind that call-site is tracked, so you do get an approximate timeout for the test, even though it's not exactly 15 seconds, perhaps more like 25 seconds. Granted, this implementation is not a great way to shift the thinking from "max test time" to "max idle time".
This is an interesting alternative, and one we could probably enforce/lint. (It's very hard to lint the referenced bug-case without also causing false positives, which sparked this PR.) Taking some inspiration from you proposal, I could see an more ergonomic alternative that also functions as a drop-in. package testutil
var _ context.Context = (*TestContext)(nil)
func Context(...) *TestContext { ... } // always lazy start (or option?)
type TestContext struct{}
func (tc *TestContext) Cancel() { ... } // we were missing this previously
func (tc *TestContext) New(timeout) *TestContext { ... } // make timeout optional?
func (tc *TestContext) Reset() *TestContext { ... } // resets the timer
func (tc *TestContext) Join(ctx ...context.Context) *TestContext { ... } // sometimes you have two independent contexts, but can only pass one along
// more?This would still allow the most basic use to be |
|
My opinion is that this is a lot of complexity for a relatively narrow benefit.
This trades one thing that is hard to reason about for something that is much harder. Speaking about the original lazy context, when you make a call in the test function and pass the context, it is very difficult to know whether it will trigger a reset. If the call synchronously "uses" the context, then it is reset, but if it spawns a goroutine that "uses" the context, then it is not. Even if the resets are explicit, like in your second proposal, the question of what duration to use is still hard to reason about. Like, if your test has steps A, B, C, and D with resets in between, with the current, basic context you have to estimate the sum of the durations. With resets you have to estimate the max. Have we actually made life easier for test authors? Dubious, IMO. |
|
@spikecurtis thanks for taking a look and for the feedback. You raise valid concerns and I honestly don’t disagree with you on any point. It seems the only concept here that might hold some value is the delayed start, but is it worth the decrease in predictability? Probably not. I briefly considered flipping the logic, error of context is not used within 1/2 or 2/3 of the timeout. But now we’re just breaking valid use-cases. Maybe something takes a context but doesn’t use it and takes long enough to process you hit the error. Rejected. One more option comes to mind, swapping out the implementation via build tag allowing you to run a form of ”lint” on the test-suite, where we could enforce the error case. But again, having this code in the code base simply might not be worth catching the few weird flakes we run into. This was a fun thought experiment, thanks for participating! Unless you guys have any better ideas, I’m going to close this one out. |
It's common to create a context early in a test body, then do setup work
unrelated to that context. By the time the context is actually used, it
may have already timed out. This was detected as test failures in #21091.
The new
Contextfunction returns a context that resets its timeout whenaccessed from new lines in the test file. The timeout does not begin until
the context is first used (lazy initialization).
This is useful for integration tests that pass contexts through various
subsystems, where each subsystem should get a fresh timeout window.
Key behaviors:
Done,Deadline, orErrcallValuedoes not trigger initializationLimitations:
context.WithCancel) preventsreset since the childs methods shadow the parent
Donechannel circumvents resetsThe original fixed-timeout behavior is available via
ContextFixed.Arguably, it might suffice to simplify this implementation to only
implement the start timer on first use functionality. But given our
massively parallel test-suite, I'm biased towards enabling the reset as
well.