Skip to content

Commit 6f86f67

Browse files
authored
feat(coderd): add overload protection with rate limiting and concurrency control (#21161)
## Summary This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load. Partially addresses coder/internal#1153 (rate limits and concurrency control; circuit breakers are deferred to a follow-up). ## New Configuration Options | Option | Environment Variable | Description | Default | |--------|---------------------|-------------|---------| | `--aibridge-max-concurrency` | `CODER_AIBRIDGE_MAX_CONCURRENCY` | Maximum number of concurrent AI Bridge requests. Set to 0 to disable (unlimited). | `0` | | `--aibridge-rate-limit` | `CODER_AIBRIDGE_RATE_LIMIT` | Maximum number of AI Bridge requests per second. Set to 0 to disable rate limiting. | `0` | ## Behavior When limits are exceeded: - **Concurrency limit**: Returns HTTP `503 Service Unavailable` with message "AI Bridge is currently at capacity. Please try again later." - **Rate limit**: Returns HTTP `429 Too Many Requests` with `Retry-After` header. Both protections are optional and disabled by default (0 values). ## Implementation The overload protection is implemented as reusable middleware in `coderd/httpmw/ratelimit.go`: 1. **`RateLimitByAuthToken`**: Per-user rate limiting that uses `APITokenFromRequest` to extract the authentication token, with fallback to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic). Falls back to IP-based rate limiting if no token is present. Includes `Retry-After` header for backpressure signaling. 2. **`ConcurrencyLimit`**: Uses an atomic counter to track in-flight requests and reject when at capacity. The middleware is applied in `enterprise/coderd/aibridge.go` via `r.Group` in the following order: 1. Concurrency check (faster rejection for load shedding) 2. Rate limit check **Note**: Rate limiting currently applies to all AI Bridge requests, including pass-through requests. Ideally only actual interceptions should count, but this would require changes in the aibridge library. ## Testing Added comprehensive tests for: - Rate limiting by auth token (Bearer token, X-Api-Key, no token fallback to IP) - Different tokens not rate limited against each other - Disabled when limit is zero - Retry-After header is set on 429 responses - Concurrency limiting (allows within limit, rejects over limit, disabled when zero)
1 parent 8ead6f7 commit 6f86f67

File tree

17 files changed

+567
-36
lines changed

17 files changed

+567
-36
lines changed

cli/testdata/coder_server_--help.golden

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,12 +125,20 @@ AI BRIDGE OPTIONS:
125125
requests (requires the "oauth2" and "mcp-server-http" experiments to
126126
be enabled).
127127

128+
--aibridge-max-concurrency int, $CODER_AIBRIDGE_MAX_CONCURRENCY (default: 0)
129+
Maximum number of concurrent AI Bridge requests per replica. Set to 0
130+
to disable (unlimited).
131+
128132
--aibridge-openai-base-url string, $CODER_AIBRIDGE_OPENAI_BASE_URL (default: https://api.openai.com/v1/)
129133
The base URL of the OpenAI API.
130134

131135
--aibridge-openai-key string, $CODER_AIBRIDGE_OPENAI_KEY
132136
The key to authenticate against the OpenAI API.
133137

138+
--aibridge-rate-limit int, $CODER_AIBRIDGE_RATE_LIMIT (default: 0)
139+
Maximum number of AI Bridge requests per second per replica. Set to 0
140+
to disable (unlimited).
141+
134142
CLIENT OPTIONS:
135143
These options change the behavior of how clients interact with the Coder.
136144
Clients include the Coder CLI, Coder Desktop, IDE extensions, and the web UI.

cli/testdata/server-config.yaml.golden

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -748,6 +748,14 @@ aibridge:
748748
# (token, prompt, tool use).
749749
# (default: 60d, type: duration)
750750
retention: 1440h0m0s
751+
# Maximum number of concurrent AI Bridge requests per replica. Set to 0 to disable
752+
# (unlimited).
753+
# (default: 0, type: int)
754+
maxConcurrency: 0
755+
# Maximum number of AI Bridge requests per second per replica. Set to 0 to disable
756+
# (unlimited).
757+
# (default: 0, type: int)
758+
rateLimit: 0
751759
# Configure data retention policies for various database tables. Retention
752760
# policies automatically purge old data to reduce database size and improve
753761
# performance. Setting a retention duration to 0 disables automatic purging for

coderd/aibridge/aibridge.go

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
// Package aibridge provides utilities for the AI Bridge feature.
2+
package aibridge
3+
4+
import (
5+
"net/http"
6+
"strings"
7+
)
8+
9+
// ExtractAuthToken extracts an authorization token from HTTP headers.
10+
// It checks the Authorization header (Bearer token) and X-Api-Key header,
11+
// which represent the different ways clients authenticate against AI providers.
12+
// If neither are present, an empty string is returned.
13+
func ExtractAuthToken(header http.Header) string {
14+
if auth := strings.TrimSpace(header.Get("Authorization")); auth != "" {
15+
fields := strings.Fields(auth)
16+
if len(fields) == 2 && strings.EqualFold(fields[0], "Bearer") {
17+
return fields[1]
18+
}
19+
}
20+
if apiKey := strings.TrimSpace(header.Get("X-Api-Key")); apiKey != "" {
21+
return apiKey
22+
}
23+
return ""
24+
}

coderd/apidoc/docs.go

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

coderd/apidoc/swagger.json

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

coderd/httpmw/ratelimit.go

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,13 @@ import (
44
"fmt"
55
"net/http"
66
"strconv"
7+
"sync/atomic"
78
"time"
89

910
"github.com/go-chi/httprate"
1011
"golang.org/x/xerrors"
1112

13+
"github.com/coder/coder/v2/coderd/aibridge"
1214
"github.com/coder/coder/v2/coderd/database"
1315
"github.com/coder/coder/v2/coderd/httpapi"
1416
"github.com/coder/coder/v2/coderd/rbac"
@@ -70,3 +72,72 @@ func RateLimit(count int, window time.Duration) func(http.Handler) http.Handler
7072
}),
7173
)
7274
}
75+
76+
// RateLimitByAuthToken returns a handler that limits requests based on the
77+
// authentication token in the request.
78+
//
79+
// This differs from [RateLimit] in several ways:
80+
// - It extracts the token directly from request headers (Authorization Bearer
81+
// or X-Api-Key) rather than from the request context, making it suitable for
82+
// endpoints that handle authentication internally (like AI Bridge) rather than
83+
// via [ExtractAPIKeyMW] middleware.
84+
// - It does not support the bypass header for Owners.
85+
// - It does not key by endpoint, so the limit applies across all endpoints using
86+
// this middleware.
87+
// - It includes a Retry-After header in 429 responses for backpressure signaling.
88+
//
89+
// If no token is found in the headers, it falls back to rate limiting by IP address.
90+
func RateLimitByAuthToken(count int, window time.Duration) func(http.Handler) http.Handler {
91+
if count <= 0 {
92+
return func(handler http.Handler) http.Handler {
93+
return handler
94+
}
95+
}
96+
97+
return httprate.Limit(
98+
count,
99+
window,
100+
httprate.WithKeyFuncs(func(r *http.Request) (string, error) {
101+
// Try to extract auth token for per-user rate limiting using
102+
// AI provider authentication headers (Authorization Bearer or X-Api-Key).
103+
if token := aibridge.ExtractAuthToken(r.Header); token != "" {
104+
return token, nil
105+
}
106+
// Fall back to IP-based rate limiting if no token present.
107+
return httprate.KeyByIP(r)
108+
}),
109+
httprate.WithLimitHandler(func(w http.ResponseWriter, r *http.Request) {
110+
// Add Retry-After header for backpressure signaling.
111+
w.Header().Set("Retry-After", fmt.Sprintf("%d", int(window.Seconds())))
112+
httpapi.Write(r.Context(), w, http.StatusTooManyRequests, codersdk.Response{
113+
Message: "You've been rate limited. Please try again later.",
114+
})
115+
}),
116+
)
117+
}
118+
119+
// ConcurrencyLimit returns a handler that limits the number of concurrent
120+
// requests. When the limit is exceeded, it returns HTTP 503 Service Unavailable.
121+
func ConcurrencyLimit(maxConcurrent int64, resourceName string) func(http.Handler) http.Handler {
122+
if maxConcurrent <= 0 {
123+
return func(handler http.Handler) http.Handler {
124+
return handler
125+
}
126+
}
127+
128+
var current atomic.Int64
129+
return func(next http.Handler) http.Handler {
130+
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
131+
c := current.Add(1)
132+
defer current.Add(-1)
133+
134+
if c > maxConcurrent {
135+
httpapi.Write(r.Context(), w, http.StatusServiceUnavailable, codersdk.Response{
136+
Message: fmt.Sprintf("%s is currently at capacity. Please try again later.", resourceName),
137+
})
138+
return
139+
}
140+
next.ServeHTTP(w, r)
141+
})
142+
}
143+
}

0 commit comments

Comments
 (0)