Skip to content

Conversation

@spikecurtis
Copy link
Contributor

@spikecurtis spikecurtis commented Dec 9, 2025

fixes: coder/internal#1179

The problem in that flake is that dRPC doensn't consistently return context.Canceled if you make an RPC call and then cancel it: sometimes it returns EOF.

Without this PR, if we get an EOF on one of the routines that uses the agentapi connection, we tear down the whole connection and reconnect to coderd --- even if we are in the middle of a graceful shutdown.

What happened in the linked flake is that writing stats failed with EOF, which then caused us to reconnect and write the lifecycle "SHUTTING DOWN" twice.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@spikecurtis spikecurtis requested a review from mafredri December 9, 2025 12:35
@spikecurtis spikecurtis marked this pull request as ready for review December 9, 2025 12:36
@spikecurtis spikecurtis force-pushed the spike/internal-1179-agent-shutdown-flake branch from 0f3955f to 33c6d54 Compare December 9, 2025 12:37
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable 👍🏻

@spikecurtis spikecurtis merged commit ce9e7ad into main Dec 9, 2025
31 checks passed
@spikecurtis spikecurtis deleted the spike/internal-1179-agent-shutdown-flake branch December 9, 2025 13:32
@github-actions github-actions bot locked and limited conversation to collaborators Dec 9, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flake: TestAgent_Lifecycle/ShutdownTimeout

3 participants