-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
coder/coder
#21059Labels
Description
CI Failure Details
- CI Run: https://github.com/coder/coder/actions/runs/19860399421
- Failed Job: https://github.com/coder/coder/actions/runs/19860399421/job/56908756129 (test-go-race-pg)
- Branch: main
- Commit: 74d0c39cb3f182eadf8d92f260904608334a25ca (author: Mathias Fredriksson)
- Date: 2025-12-02 13:43:36Z
Root Cause Classification
- Type: Data Race (race detector)
- Affected package: agent/agentcontainers
- Failing test: TestAPI/Error/DuringUpWithContainerID
Race Detector Evidence (from logs)
WARNING: DATA RACE
Write at 0x00c000a2cc60 by goroutine 819:
runtime.mapdelete()
/opt/hostedtoolcache/go/1.24.10/x64/src/runtime/map_swiss.go:144
github.com/coder/coder/v2/agent/agentcontainers_test.(*fakeSubAgentClient).Delete()
/home/runner/work/coder/coder/agent/agentcontainers/api_test.go:320 +0x3a7
github.com/coder/coder/v2/agent/agentcontainers.(*API).maybeInjectSubAgentIntoContainerLocked()
/home/runner/work/coder/coder/agent/agentcontainers/api.go:1935 +0x4b4e
github.com/coder/coder/v2/agent/agentcontainers.(*API).processUpdatedContainersLocked()
/home/runner/work/coder/coder/agent/agentcontainers/api.go:1037 +0x2b2f
github.com/coder/coder/v2/agent/agentcontainers.(*API).updateContainers()
/home/runner/work/coder/coder/agent/agentcontainers/api.go:881 +0x415
github.com/coder/coder/v2/agent/agentcontainers.(*API).updaterLoop()
/home/runner/work/coder/coder/agent/agentcontainers/api.go:702 +0xd5e
Previous write at 0x00c000a2cc60 by goroutine 447:
runtime.mapdelete()
/opt/hostedtoolcache/go/1.24.10/x64/src/runtime/map_swiss.go:144
github.com/coder/coder/v2/agent/agentcontainers_test.(*fakeSubAgentClient).Delete()
/home/runner/work/coder/coder/agent/agentcontainers/api_test.go:320 +0x3a7
github.com/coder/coder/v2/agent/agentcontainers.(*API).Close()
/home/runner/work/coder/coder/agent/agentcontainers/api.go:2023 +0xcf7
github.com/coder/coder/v2/agent/agentcontainers_test.TestAPI.func10.2.1()
/home/runner/work/coder/coder/agent/agentcontainers/api_test.go:2139 +0x51
... (additional races in fakeSubAgentClient.Delete and slice access omitted for brevity)
testing.go:1490: race detected during execution of test
Also observed after the race detection:
panic: runtime error: invalid memory address or nil pointer dereference
at agent/agentcontainers/api_test.go:2182 in TestAPI.func10.2
Error Analysis
- The data race occurs between:
- API updater loop path (maybeInjectSubAgentIntoContainerLocked -> processUpdatedContainersLocked -> updateContainers) and
- API.Close/test cleanup logic where fakeSubAgentClient.Delete is invoked.
- Shared state in fakeSubAgentClient (map and slice used in Delete) is concurrently mutated/read by different goroutines during test teardown vs. ongoing updater processing.
- This leads to race detector failure and subsequent cascading test panics/unknown failures in the same package.
Ownership / Assignment
Primary assignment uses test function blame per guidelines.
Relevant test function and lines:
- File: agent/agentcontainers/api_test.go
- Subtest: TestAPI("Error/DuringUpWithContainerID") around lines ~2077–2185
Suggested blame commands:
grep -n "DuringUpWithContainerID" agent/agentcontainers/api_test.go
# -> shows start near line ~2077
git blame -L 2077,2185 agent/agentcontainers/api_test.go
Context:
- Recent commit touching this area (message references lifecycle script error handling and continuing with agent injection):
- 74d0c39cb3f1 (Mathias Fredriksson) "fix(agent/agentcontainer): allow lifecycle script error on devcontainer up" – likely introduced/modified this subtest and related API behavior.
Based on this, assigning to the maintainer who most recently authored the failing test logic.
Related Issues
- Closed, similar package race (different code path): data race in agent/agentcontainers: processUpdatedContainersLocked #675 (data race in processUpdatedContainersLocked)
- Other agent/agentcontainers flakes (not data races): flake: TestAPI/Recreate/Devcontainer_CLI_error #1041, flake: TestAPI/FileWatcher #1034, flake:
TestAPI/NoUpdaterLoopLogspam#769
Reproduction
- Local: from repo root, run race build for the package
go test -race ./agent/agentcontainers -run TestAPI/Error/DuringUpWithContainerID -count=50
- CI: observe failures in test-go-race-pg on main.
Next Steps
- Protect shared state in fakeSubAgentClient.Delete (map/slice) with synchronization (mutex) or redesign to avoid concurrent mutation during API.Close and updater loop.
- Audit API.Close and updaterLoop interactions to avoid races when shutting down while update/injection is in-flight.
- Consider using channels/atomic flags to gate deletion during injection and vice versa.