Skip to content

flake: TestTasks/UpdateInput/TaskStatusError #1178

@flake-investigator

Description

@flake-investigator

CI Failure Details

Failing Test

  • Package: github.com/coder/coder/v2/coderd
  • Test: TestTasks/UpdateInput/TaskStatusError
  • Location: coderd/aitasks_test.go:819 (macOS)

Error Evidence

=== FAIL: coderd TestTasks/UpdateInput/TaskStatusError (2.59s)
    aitasks_test.go:819:
        Error:      Received unexpected error:
                    PATCH http://127.0.0.1:54206/api/v2/workspacebuilds/9c22fa2d-a7ba-4f32-a94b-8edbb785588d/cancel?expect_status=: unexpected status code 400: Job has already completed!
        Test:       TestTasks/UpdateInput/TaskStatusError

More context (logs show the build transitioned and completed quickly, then cancel was attempted):

  • Build created and reached succeeded, then cancel attempt returned 400 Job has already completed!

Root Cause Classification

  • Flaky Test (timing-dependent)
  • The test sets cancelTransition=true and then issues CancelWorkspaceBuild expecting to cancel the in-flight START transition. On macOS CI, the transition can complete before cancel executes, producing 400 "Job has already completed!".
  • Not infra, not a race, not a process crash.

Duplicate Search (coder/internal)

  • Queried: "TestTasks/UpdateInput/TaskStatusError", "Job has already completed!", "aitasks_test.go", "TestTasks"
  • Found related but different closed issue: flake: TestTasks/Logs/UpstreamError (flake: TestTasks/Logs/UpstreamError #1067)
  • No existing issue for UpdateInput/TaskStatusError; this appears new/different failure mode.

Precise Assignment Analysis (test blame)

  • The failing subtest lives under TestTasks -> UpdateInput -> "TaskStatusError" block.
  • History via commit diffs:
    • Added the UpdateInput test block (including cancelTransition logic) in 82f525baf36a2341bc92c2f6b6a27cc565d28a08 (feat(coderd): add task prompt modification endpoint) — author: Danielle Maywood.
    • Latest modifications to this block in b255827a5269f767c2dba476c7189ef6157ff574 (chore: promote tasks to stable from experimental) — also by Danielle Maywood.
  • Based on last modification of the failing test lines, ownership points to Danielle Maywood.

Suggested Fix Direction

  • Make the test resilient to rapid state transitions:
    • After creating the transition, poll the build status and only attempt cancel if status is running/pending; otherwise assert completed and proceed.
    • Alternatively, accept 400 "already completed" as a valid outcome for the TaskStatusError scenario or use expect_status to tolerate either canceled or completed.
    • Introduce a brief synchronization (e.g., check that build has entered running) before calling CancelWorkspaceBuild to reduce races.
  • Keep the test using testutil.WaitLong but avoid relying on cancellation timing.

Reproduction Hints

  • On macOS runner: go test ./coderd -run 'TestTasks/UpdateInput/TaskStatusError' -count=20
  • This is timing-sensitive; reproduces intermittently when build completes before cancel.

Related Issues

Quality Checklist

  • Identified exact failing test and captured error output
  • Verified not a matrix cancellation artifact (run_attempt=1; only windows job canceled due to macOS failure)
  • No race/panic/OOM signatures in logs
  • Searched coder/internal for duplicates with multiple queries
  • Assignment based on last modification of the failing test block

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions