-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(coderd): add overload protection with rate limiting and concurrency control #21161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dannykopping
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of IP-based rate-limiting. It's brittle and lacks the specificity we need.
Limits need to apply per user. We could do that by using the API key in the request.
Also, a nit, your description says:
Fixes coder/internal#1153
But it only addresses the first two points AFAICS.
|
Please update PR description before merging (eg. remove |
dannykopping
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Tested it against my deployment and it works as expected.
Can you please add an integration test (one per mechanism) just so we can be assured that this works with the whole API stack?
|
@kacpersaw I also think it's worth documenting how/why this differs from the normal API rate-limiting. |
|
@dannykopping Regarding your question about Using c := current.Add(1) // Returns NEW value after incrementing
defer current.Add(-1)
if c > maxConcurrent { // Reject if over limitWith
If we used
The test |
|
Regarding the We can't import The current approach uses The X-Api-Key fallback is only 3 lines, so the duplication is minimal and unavoidable given the package boundaries. |
We should extract that logic out then. The duplication is not the problem, it's the lack of a single source of truth. |
dannykopping
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once ExtractAPIKeyFromHeader is named/moved appropriately,
…rrency control This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load. New configuration options: - CODER_AIBRIDGE_MAX_CONCURRENCY: Maximum number of concurrent AI Bridge requests (0 to disable) - CODER_AIBRIDGE_RATE_LIMIT: Maximum number of requests per rate window (0 to disable) - CODER_AIBRIDGE_RATE_WINDOW: Duration of the rate limiting window (default: 1m) When limits are exceeded: - Concurrency limit: Returns HTTP 503 Service Unavailable - Rate limit: Returns HTTP 429 Too Many Requests The overload protection is implemented as middleware in coderd/httpmw/ratelimit.go and applied to the aibridged HTTP handler: - RateLimitByIP: Rate limiting keyed by IP address for endpoints without API key context - ConcurrencyLimit: Limits concurrent requests using an atomic counter Both protections are optional and disabled by default (0 values). Fixes coder/internal#1153
Address review feedback from @dannykopping: 1. Changed RateLimitByIP to RateLimitByAuthToken which extracts the authentication token from the Authorization header (Bearer token) or X-Api-Key header. Falls back to IP-based rate limiting if no token is present. 2. Added Retry-After header to 429 responses for better backpressure signaling. 3. Added comprehensive tests for the new auth token-based rate limiting including tests for Bearer token, X-Api-Key, different tokens, and fallback to IP.
Changes: 1. Remove --aibridge-rate-window config option, hardcode to 1 minute for simpler configuration (rate limit is now 'requests per minute'). 2. Use APITokenFromRequest instead of custom extractAuthToken function, with fallback to X-Api-Key header for AI provider compatibility (Anthropic). 3. Simplify rate limit error message to 'You've been rate limited. Please try again later.' (Retry-After header provides timing info). 4. Add TODO comment noting that rate limiting currently applies to all AI Bridge requests including pass-through. Ideally only interceptions should count, but this requires changes in the aibridge library. 5. Consolidate rate limit tests with table-driven approach and use time.Hour instead of time.Second for more reliable test behavior.
1. Move rate limiters to r.Group middleware instead of manually wrapping in HandleFunc (aibridge.go). 2. Improve concurrency test synchronization by using sync.WaitGroup as a barrier to ensure all requests are in the handler before testing rejection. Also use close(releaseHandler) instead of sending to each goroutine.
Per @pawbana's feedback, per-second is easier to reason about than per-minute.
1. Add 'per replica' to description for both max concurrency and rate limit to clarify these limits are replica-local. 2. Use consistent '(unlimited)' wording for disabled state in rate limit description. 3. Remove TODO comment about pass-through requests - they should be rate limited since they still consume resources. 4. Add timeout to concurrency test using testutil.WaitShort.
Add detailed documentation explaining the key differences: - Extracts token from headers vs context (for internal auth handling) - No bypass header support for Owners - No per-endpoint keying (limit applies across all endpoints) - Includes Retry-After header for backpressure
Creates a new non-enterprise coderd/aibridge package for AI Bridge utilities. This properly encapsulates the auth token extraction function that's specific to AI provider authentication patterns (Bearer token and X-Api-Key headers). - Add coderd/aibridge/aibridge.go with ExtractAuthToken function - Update coderd/httpmw/ratelimit.go to use aibridge.ExtractAuthToken - Update enterprise/aibridged to use aibridge.ExtractAuthToken - Remove ExtractAPIKeyFromHeader from coderd/httpmw/apikey.go
Adds two integration tests that verify the overload protection middleware works correctly with the full API stack: - TestAIBridgeRateLimiting: Verifies requests are rate limited (429) after exceeding the configured limit, with Retry-After header set. - TestAIBridgeConcurrencyLimiting: Verifies concurrent requests beyond the limit are rejected (503) while allowing requests up to the limit.
7aed14b to
1317128
Compare
Summary
This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load.
Partially addresses coder/internal#1153 (rate limits and concurrency control; circuit breakers are deferred to a follow-up).
New Configuration Options
--aibridge-max-concurrencyCODER_AIBRIDGE_MAX_CONCURRENCY0--aibridge-rate-limitCODER_AIBRIDGE_RATE_LIMIT0Behavior
When limits are exceeded:
503 Service Unavailablewith message "AI Bridge is currently at capacity. Please try again later."429 Too Many RequestswithRetry-Afterheader.Both protections are optional and disabled by default (0 values).
Implementation
The overload protection is implemented as reusable middleware in
coderd/httpmw/ratelimit.go:RateLimitByAuthToken: Per-user rate limiting that usesAPITokenFromRequestto extract the authentication token, with fallback toX-Api-Keyheader for AI provider compatibility (e.g., Anthropic). Falls back to IP-based rate limiting if no token is present. IncludesRetry-Afterheader for backpressure signaling.ConcurrencyLimit: Uses an atomic counter to track in-flight requests and reject when at capacity.The middleware is applied in
enterprise/coderd/aibridge.goviar.Groupin the following order:Note: Rate limiting currently applies to all AI Bridge requests, including pass-through requests. Ideally only actual interceptions should count, but this would require changes in the aibridge library.
Testing
Added comprehensive tests for: