Skip to content

flake: Data race in agent/agentcontainers - TestAPI/Error/DuringUpWithContainerID #1169

@flake-investigator

Description

@flake-investigator

CI Failure Details

Root Cause Classification

  • Type: Data Race (race detector)
  • Affected package: agent/agentcontainers
  • Failing test: TestAPI/Error/DuringUpWithContainerID

Race Detector Evidence (from logs)

WARNING: DATA RACE
Write at 0x00c000a2cc60 by goroutine 819:
  runtime.mapdelete()
    /opt/hostedtoolcache/go/1.24.10/x64/src/runtime/map_swiss.go:144
  github.com/coder/coder/v2/agent/agentcontainers_test.(*fakeSubAgentClient).Delete()
    /home/runner/work/coder/coder/agent/agentcontainers/api_test.go:320 +0x3a7
  github.com/coder/coder/v2/agent/agentcontainers.(*API).maybeInjectSubAgentIntoContainerLocked()
    /home/runner/work/coder/coder/agent/agentcontainers/api.go:1935 +0x4b4e
  github.com/coder/coder/v2/agent/agentcontainers.(*API).processUpdatedContainersLocked()
    /home/runner/work/coder/coder/agent/agentcontainers/api.go:1037 +0x2b2f
  github.com/coder/coder/v2/agent/agentcontainers.(*API).updateContainers()
    /home/runner/work/coder/coder/agent/agentcontainers/api.go:881 +0x415
  github.com/coder/coder/v2/agent/agentcontainers.(*API).updaterLoop()
    /home/runner/work/coder/coder/agent/agentcontainers/api.go:702 +0xd5e

Previous write at 0x00c000a2cc60 by goroutine 447:
  runtime.mapdelete()
    /opt/hostedtoolcache/go/1.24.10/x64/src/runtime/map_swiss.go:144
  github.com/coder/coder/v2/agent/agentcontainers_test.(*fakeSubAgentClient).Delete()
    /home/runner/work/coder/coder/agent/agentcontainers/api_test.go:320 +0x3a7
  github.com/coder/coder/v2/agent/agentcontainers.(*API).Close()
    /home/runner/work/coder/coder/agent/agentcontainers/api.go:2023 +0xcf7
  github.com/coder/coder/v2/agent/agentcontainers_test.TestAPI.func10.2.1()
    /home/runner/work/coder/coder/agent/agentcontainers/api_test.go:2139 +0x51

... (additional races in fakeSubAgentClient.Delete and slice access omitted for brevity)

testing.go:1490: race detected during execution of test

Also observed after the race detection:

panic: runtime error: invalid memory address or nil pointer dereference
  at agent/agentcontainers/api_test.go:2182 in TestAPI.func10.2

Error Analysis

  • The data race occurs between:
    • API updater loop path (maybeInjectSubAgentIntoContainerLocked -> processUpdatedContainersLocked -> updateContainers) and
    • API.Close/test cleanup logic where fakeSubAgentClient.Delete is invoked.
  • Shared state in fakeSubAgentClient (map and slice used in Delete) is concurrently mutated/read by different goroutines during test teardown vs. ongoing updater processing.
  • This leads to race detector failure and subsequent cascading test panics/unknown failures in the same package.

Ownership / Assignment

Primary assignment uses test function blame per guidelines.

Relevant test function and lines:

  • File: agent/agentcontainers/api_test.go
  • Subtest: TestAPI("Error/DuringUpWithContainerID") around lines ~2077–2185

Suggested blame commands:

grep -n "DuringUpWithContainerID" agent/agentcontainers/api_test.go
# -> shows start near line ~2077

git blame -L 2077,2185 agent/agentcontainers/api_test.go

Context:

  • Recent commit touching this area (message references lifecycle script error handling and continuing with agent injection):
    • 74d0c39cb3f1 (Mathias Fredriksson) "fix(agent/agentcontainer): allow lifecycle script error on devcontainer up" – likely introduced/modified this subtest and related API behavior.

Based on this, assigning to the maintainer who most recently authored the failing test logic.

Related Issues

Reproduction

  • Local: from repo root, run race build for the package
go test -race ./agent/agentcontainers -run TestAPI/Error/DuringUpWithContainerID -count=50
  • CI: observe failures in test-go-race-pg on main.

Next Steps

  • Protect shared state in fakeSubAgentClient.Delete (map/slice) with synchronization (mutex) or redesign to avoid concurrent mutation during API.Close and updater loop.
  • Audit API.Close and updaterLoop interactions to avoid races when shutting down while update/injection is in-flight.
  • Consider using channels/atomic flags to gate deletion during injection and vice versa.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions