feat(coderd): add overload protection with rate limiting and concurrency control #21161

kacpersaw · 2025-12-08T13:04:59Z

Summary

This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load.

Partially addresses coder/internal#1153 (rate limits and concurrency control; circuit breakers are deferred to a follow-up).

New Configuration Options

Option	Environment Variable	Description	Default
`--aibridge-max-concurrency`	`CODER_AIBRIDGE_MAX_CONCURRENCY`	Maximum number of concurrent AI Bridge requests. Set to 0 to disable (unlimited).	`0`
`--aibridge-rate-limit`	`CODER_AIBRIDGE_RATE_LIMIT`	Maximum number of AI Bridge requests per second. Set to 0 to disable rate limiting.	`0`

Behavior

When limits are exceeded:

Concurrency limit: Returns HTTP 503 Service Unavailable with message "AI Bridge is currently at capacity. Please try again later."
Rate limit: Returns HTTP 429 Too Many Requests with Retry-After header.

Both protections are optional and disabled by default (0 values).

Implementation

The overload protection is implemented as reusable middleware in coderd/httpmw/ratelimit.go:

RateLimitByAuthToken: Per-user rate limiting that uses APITokenFromRequest to extract the authentication token, with fallback to X-Api-Key header for AI provider compatibility (e.g., Anthropic). Falls back to IP-based rate limiting if no token is present. Includes Retry-After header for backpressure signaling.
ConcurrencyLimit: Uses an atomic counter to track in-flight requests and reject when at capacity.

The middleware is applied in enterprise/coderd/aibridge.go via r.Group in the following order:

Concurrency check (faster rejection for load shedding)
Rate limit check

Note: Rate limiting currently applies to all AI Bridge requests, including pass-through requests. Ideally only actual interceptions should count, but this would require changes in the aibridge library.

Testing

Added comprehensive tests for:

Rate limiting by auth token (Bearer token, X-Api-Key, no token fallback to IP)
Different tokens not rate limited against each other
Disabled when limit is zero
Retry-After header is set on 429 responses
Concurrency limiting (allows within limit, rejects over limit, disabled when zero)

dannykopping

I'm not a fan of IP-based rate-limiting. It's brittle and lacks the specificity we need.

Limits need to apply per user. We could do that by using the API key in the request.

Also, a nit, your description says:

Fixes coder/internal#1153

But it only addresses the first two points AFAICS.

enterprise/coderd/aibridge.go

coderd/httpmw/ratelimit.go

enterprise/coderd/aibridge.go

cli/testdata/coder_server_--help.golden

coderd/httpmw/ratelimit.go

coderd/httpmw/ratelimit_test.go

enterprise/coderd/aibridge.go

cli/testdata/coder_server_--help.golden

pawbana · 2025-12-09T10:54:17Z

Please update PR description before merging (eg. remove --aibridge-rate-window).

dannykopping

Overall LGTM. Tested it against my deployment and it works as expected.

Can you please add an integration test (one per mechanism) just so we can be assured that this works with the whole API stack?

enterprise/coderd/aibridge.go

coderd/httpmw/ratelimit.go

codersdk/deployment.go

coderd/httpmw/ratelimit_test.go

dannykopping · 2025-12-09T11:48:15Z

@kacpersaw I also think it's worth documenting how/why this differs from the normal API rate-limiting.

kacpersaw · 2025-12-09T12:10:14Z

@dannykopping Regarding your question about > vs >= on line 132:

Using > is correct for "max concurrent" semantics. Here's why:

c := current.Add(1)  // Returns NEW value after incrementing
defer current.Add(-1)

if c > maxConcurrent {  // Reject if over limit

With maxConcurrent = 2:

Request 1: c = 1, check 1 > 2 → false → allowed (1 concurrent)
Request 2: c = 2, check 2 > 2 → false → allowed (2 concurrent)
Request 3: c = 3, check 3 > 2 → true → rejected ✓

If we used >=:

Request 1: c = 1, check 1 >= 2 → false → allowed
Request 2: c = 2, check 2 >= 2 → true → rejected ✗ (we want 2 concurrent allowed!)

The test TestConcurrencyLimit/LimitsConcurrentRequests validates this behavior - it sets maxConcurrency = 2, starts 2 requests that block in handlers, then verifies the 3rd request gets 503.

kacpersaw · 2025-12-10T11:51:26Z

Regarding the ExtractAuthToken duplication comment:

We can't import enterprise/aibridged from coderd/httpmw due to the AGPL licensing restriction (AGPL code cannot import enterprise packages, as enforced by scripts/check_enterprise_imports.sh).

The current approach uses APITokenFromRequest from coderd/httpmw/apikey.go (which handles Bearer tokens, cookies, and standard Coder auth methods) with a small fallback for X-Api-Key header for AI provider compatibility (Anthropic uses this header).

The X-Api-Key fallback is only 3 lines, so the duplication is minimal and unavoidable given the package boundaries.

dannykopping · 2025-12-10T12:43:47Z

Regarding the ExtractAuthToken duplication comment:

We can't import enterprise/aibridged from coderd/httpmw due to the AGPL licensing restriction (AGPL code cannot import enterprise packages, as enforced by scripts/check_enterprise_imports.sh).

The current approach uses APITokenFromRequest from coderd/httpmw/apikey.go (which handles Bearer tokens, cookies, and standard Coder auth methods) with a small fallback for X-Api-Key header for AI provider compatibility (Anthropic uses this header).

The X-Api-Key fallback is only 3 lines, so the duplication is minimal and unavoidable given the package boundaries.

We should extract that logic out then. The duplication is not the problem, it's the lack of a single source of truth.

dannykopping

LGTM once ExtractAPIKeyFromHeader is named/moved appropriately,

coderd/httpmw/apikey.go

coderd/httpmw/ratelimit_test.go

…rrency control This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load. New configuration options: - CODER_AIBRIDGE_MAX_CONCURRENCY: Maximum number of concurrent AI Bridge requests (0 to disable) - CODER_AIBRIDGE_RATE_LIMIT: Maximum number of requests per rate window (0 to disable) - CODER_AIBRIDGE_RATE_WINDOW: Duration of the rate limiting window (default: 1m) When limits are exceeded: - Concurrency limit: Returns HTTP 503 Service Unavailable - Rate limit: Returns HTTP 429 Too Many Requests The overload protection is implemented as middleware in coderd/httpmw/ratelimit.go and applied to the aibridged HTTP handler: - RateLimitByIP: Rate limiting keyed by IP address for endpoints without API key context - ConcurrencyLimit: Limits concurrent requests using an atomic counter Both protections are optional and disabled by default (0 values). Fixes coder/internal#1153

@dannykopping

Address review feedback from @dannykopping: 1. Changed RateLimitByIP to RateLimitByAuthToken which extracts the authentication token from the Authorization header (Bearer token) or X-Api-Key header. Falls back to IP-based rate limiting if no token is present. 2. Added Retry-After header to 429 responses for better backpressure signaling. 3. Added comprehensive tests for the new auth token-based rate limiting including tests for Bearer token, X-Api-Key, different tokens, and fallback to IP.

Changes: 1. Remove --aibridge-rate-window config option, hardcode to 1 minute for simpler configuration (rate limit is now 'requests per minute'). 2. Use APITokenFromRequest instead of custom extractAuthToken function, with fallback to X-Api-Key header for AI provider compatibility (Anthropic). 3. Simplify rate limit error message to 'You've been rate limited. Please try again later.' (Retry-After header provides timing info). 4. Add TODO comment noting that rate limiting currently applies to all AI Bridge requests including pass-through. Ideally only interceptions should count, but this requires changes in the aibridge library. 5. Consolidate rate limit tests with table-driven approach and use time.Hour instead of time.Second for more reliable test behavior.

1. Move rate limiters to r.Group middleware instead of manually wrapping in HandleFunc (aibridge.go). 2. Improve concurrency test synchronization by using sync.WaitGroup as a barrier to ensure all requests are in the handler before testing rejection. Also use close(releaseHandler) instead of sending to each goroutine.

@pawbana

Per @pawbana's feedback, per-second is easier to reason about than per-minute.

1. Add 'per replica' to description for both max concurrency and rate limit to clarify these limits are replica-local. 2. Use consistent '(unlimited)' wording for disabled state in rate limit description. 3. Remove TODO comment about pass-through requests - they should be rate limited since they still consume resources. 4. Add timeout to concurrency test using testutil.WaitShort.

Add detailed documentation explaining the key differences: - Extracts token from headers vs context (for internal auth handling) - No bypass header support for Owners - No per-endpoint keying (limit applies across all endpoints) - Includes Retry-After header for backpressure

…Header directly

Creates a new non-enterprise coderd/aibridge package for AI Bridge utilities. This properly encapsulates the auth token extraction function that's specific to AI provider authentication patterns (Bearer token and X-Api-Key headers). - Add coderd/aibridge/aibridge.go with ExtractAuthToken function - Update coderd/httpmw/ratelimit.go to use aibridge.ExtractAuthToken - Update enterprise/aibridged to use aibridge.ExtractAuthToken - Remove ExtractAPIKeyFromHeader from coderd/httpmw/apikey.go

Adds two integration tests that verify the overload protection middleware works correctly with the full API stack: - TestAIBridgeRateLimiting: Verifies requests are rate limited (429) after exceeding the configured limit, with Retry-After header set. - TestAIBridgeConcurrencyLimiting: Verifies concurrent requests beyond the limit are rejected (503) while allowing requests up to the limit.

enterprise/aibridged/aibridged_test.go

github-actions bot assigned kacpersaw Dec 8, 2025

kacpersaw changed the title ~~feat(aibridged): add overload protection with rate limiting and concurrency control~~ feat(aibridge): add overload protection with rate limiting and concurrency control Dec 8, 2025

kacpersaw changed the title ~~feat(aibridge): add overload protection with rate limiting and concurrency control~~ feat(coderd): add overload protection with rate limiting and concurrency control Dec 8, 2025

kacpersaw requested review from dannykopping and pawbana December 8, 2025 14:29

dannykopping requested changes Dec 8, 2025

View reviewed changes

enterprise/coderd/aibridge.go Outdated Show resolved Hide resolved

pawbana reviewed Dec 8, 2025

View reviewed changes

kacpersaw marked this pull request as draft December 9, 2025 08:26

pawbana reviewed Dec 9, 2025

View reviewed changes

cli/testdata/coder_server_--help.golden Outdated Show resolved Hide resolved

dannykopping reviewed Dec 9, 2025

View reviewed changes

dannykopping mentioned this pull request Dec 9, 2025

feat: add overload protections to aibridged coder/internal#1153

Open

dannykopping self-requested a review December 9, 2025 12:29

kacpersaw marked this pull request as ready for review December 9, 2025 13:20

dannykopping approved these changes Dec 10, 2025

View reviewed changes

coderd/httpmw/apikey.go Outdated Show resolved Hide resolved

coderd/httpmw/ratelimit_test.go Outdated Show resolved Hide resolved

kacpersaw added 11 commits December 11, 2025 10:48

refactor: change rate limit window from per-minute to per-second

aedd7de

Per @pawbana's feedback, per-second is easier to reason about than per-minute.

refactor: remove ExtractAuthToken alias, use httpmw.ExtractAPIKeyFrom…

88e50eb

…Header directly

fix lint

ac9e7ee

Use testutil.Context

5ad4c6e

kacpersaw added 2 commits December 11, 2025 10:48

fix: avoid require in goroutine to prevent data races

1317128

kacpersaw force-pushed the kacpersaw/aibridge-overload-protection branch from 7aed14b to 1317128 Compare December 11, 2025 10:51

pawbana self-requested a review December 11, 2025 11:06

pawbana approved these changes Dec 11, 2025

View reviewed changes

dannykopping approved these changes Dec 11, 2025

View reviewed changes

enterprise/aibridged/aibridged_test.go Outdated Show resolved Hide resolved

refactor: update import alias for aibridge package

522b336

kacpersaw mentioned this pull request Dec 11, 2025

feat: add circuit breaker for upstream provider overload protection coder/aibridge#75

Draft

Merge branch 'main' into kacpersaw/aibridge-overload-protection

9b2e3c9

kacpersaw merged commit 6f86f67 into main Dec 11, 2025
54 of 56 checks passed

kacpersaw deleted the kacpersaw/aibridge-overload-protection branch December 11, 2025 15:38

github-actions bot locked and limited conversation to collaborators Dec 11, 2025

feat(coderd): add overload protection with rate limiting and concurrency control #21161

feat(coderd): add overload protection with rate limiting and concurrency control #21161

Uh oh!

Conversation

kacpersaw commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Configuration Options

Behavior

Implementation

Testing

Uh oh!

dannykopping left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pawbana commented Dec 9, 2025

Uh oh!

dannykopping left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dannykopping commented Dec 9, 2025

Uh oh!

kacpersaw commented Dec 9, 2025

Uh oh!

kacpersaw commented Dec 10, 2025

Uh oh!

dannykopping commented Dec 10, 2025

Uh oh!

dannykopping left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kacpersaw commented Dec 8, 2025 •

edited

Loading