-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
coder/coder
#21219Labels
Description
CI Failure Details
- CI Run Link: https://github.com/coder/coder/actions/runs/20016436160
- Job: test-go-pg (macos-latest)
- Timestamp: 2025-12-08T04:25Z (same minute as Slack alert)
- Commit: 25400fedca9661de43031d6262cb47ee342da03a by Jake Howell
Failing Test
- Package: github.com/coder/coder/v2/coderd
- Test: TestTasks/UpdateInput/TaskStatusError
- Location: coderd/aitasks_test.go:819 (macOS)
Error Evidence
=== FAIL: coderd TestTasks/UpdateInput/TaskStatusError (2.59s)
aitasks_test.go:819:
Error: Received unexpected error:
PATCH http://127.0.0.1:54206/api/v2/workspacebuilds/9c22fa2d-a7ba-4f32-a94b-8edbb785588d/cancel?expect_status=: unexpected status code 400: Job has already completed!
Test: TestTasks/UpdateInput/TaskStatusError
More context (logs show the build transitioned and completed quickly, then cancel was attempted):
- Build created and reached succeeded, then cancel attempt returned 400 Job has already completed!
Root Cause Classification
- Flaky Test (timing-dependent)
- The test sets cancelTransition=true and then issues CancelWorkspaceBuild expecting to cancel the in-flight START transition. On macOS CI, the transition can complete before cancel executes, producing 400 "Job has already completed!".
- Not infra, not a race, not a process crash.
Duplicate Search (coder/internal)
- Queried: "TestTasks/UpdateInput/TaskStatusError", "Job has already completed!", "aitasks_test.go", "TestTasks"
- Found related but different closed issue: flake: TestTasks/Logs/UpstreamError (flake: TestTasks/Logs/UpstreamError #1067)
- No existing issue for UpdateInput/TaskStatusError; this appears new/different failure mode.
Precise Assignment Analysis (test blame)
- The failing subtest lives under TestTasks -> UpdateInput -> "TaskStatusError" block.
- History via commit diffs:
- Added the UpdateInput test block (including cancelTransition logic) in 82f525baf36a2341bc92c2f6b6a27cc565d28a08 (feat(coderd): add task prompt modification endpoint) — author: Danielle Maywood.
- Latest modifications to this block in b255827a5269f767c2dba476c7189ef6157ff574 (chore: promote tasks to stable from experimental) — also by Danielle Maywood.
- Based on last modification of the failing test lines, ownership points to Danielle Maywood.
Suggested Fix Direction
- Make the test resilient to rapid state transitions:
- After creating the transition, poll the build status and only attempt cancel if status is running/pending; otherwise assert completed and proceed.
- Alternatively, accept 400 "already completed" as a valid outcome for the TaskStatusError scenario or use expect_status to tolerate either canceled or completed.
- Introduce a brief synchronization (e.g., check that build has entered running) before calling CancelWorkspaceBuild to reduce races.
- Keep the test using testutil.WaitLong but avoid relying on cancellation timing.
Reproduction Hints
- On macOS runner: go test ./coderd -run 'TestTasks/UpdateInput/TaskStatusError' -count=20
- This is timing-sensitive; reproduces intermittently when build completes before cancel.
Related Issues
- flake: TestTasks/Logs/UpstreamError #1067 (flake in the same file, different subtest family)
Quality Checklist
- Identified exact failing test and captured error output
- Verified not a matrix cancellation artifact (run_attempt=1; only windows job canceled due to macOS failure)
- No race/panic/OOM signatures in logs
- Searched coder/internal for duplicates with multiple queries
- Assignment based on last modification of the failing test block