From 944e49e6b903bed1991642a7323e1fe3e0a90b1b Mon Sep 17 00:00:00 2001 From: Spike Curtis Date: Thu, 30 Oct 2025 11:36:32 +0000 Subject: [PATCH 1/2] Revert "docs: add description of dynamic parameters test (#20488)" This reverts commit e720afa9d0d4721753abe096ecd16621ae6c53a6. --- .../validated-architectures/10k-users.md | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/docs/admin/infrastructure/validated-architectures/10k-users.md b/docs/admin/infrastructure/validated-architectures/10k-users.md index 486ac8192c991..e6413711188f7 100644 --- a/docs/admin/infrastructure/validated-architectures/10k-users.md +++ b/docs/admin/infrastructure/validated-architectures/10k-users.md @@ -40,22 +40,6 @@ Test procedure: After, we examine the Coderd, Workspace Proxy, and Database metrics to look for issues. -### Dynamic Parameters - -1000 connections simulating changing parameters while configuring a new workspace. - -Test procedure: - -1. Create a template with complex parameter logic and multiple template versions. -1. Partition the connections among the template versions (forces Coder to process multiple template files) -1. Simultaneously connect to the dynamic-parameters API websocket endpoint for the template version -1. Wait for the initial parameter update. -1. Send a new parameter value that has cascading effects among other parameters. -1. Wait for the next update. - -After, we examine the latency in the initial connection and update, as well as Coderd and Database metrics to look for -issues. - ### API Request Traffic To be determined. From bdd2c9be94e4add611312dccfdfacc9bd27f4dcd Mon Sep 17 00:00:00 2001 From: Spike Curtis Date: Thu, 30 Oct 2025 11:37:17 +0000 Subject: [PATCH 2/2] Revert "docs: create WIP 10k scale doc (#20213)" This reverts commit ccf0b348726370bcf42adab28957d153b1c9eb4a. --- .../validated-architectures/10k-users.md | 108 ------------------ .../validated-architectures/index.md | 2 - docs/manifest.json | 5 - 3 files changed, 115 deletions(-) delete mode 100644 docs/admin/infrastructure/validated-architectures/10k-users.md diff --git a/docs/admin/infrastructure/validated-architectures/10k-users.md b/docs/admin/infrastructure/validated-architectures/10k-users.md deleted file mode 100644 index e6413711188f7..0000000000000 --- a/docs/admin/infrastructure/validated-architectures/10k-users.md +++ /dev/null @@ -1,108 +0,0 @@ -# Reference Architecture: up to 10,000 users - -> [!CAUTION] -> This page is a work in progress. -> -> We are actively testing different load profiles for this user target and will be updating -> recommendations. Use these recommendations as a starting point, but monitor your cluster resource -> utilization and adjust. - -The 10,000 users architecture targets large-scale enterprises with development -teams in multiple geographic regions. - -**Geographic Distribution**: For these tests we deploy on 3 cloud-managed Kubernetes clusters in -the following regions: - -1. USA - Primary - Coderd collocated with the PostgreSQL database deployment. -2. Europe - Workspace Proxies -3. Asia - Workspace Proxies - -**High Availability**: Typically, such scale requires a fully-managed HA -PostgreSQL service, and all Coder observability features enabled for operational -purposes. - -**Observability**: Deploy monitoring solutions to gather Prometheus metrics and -visualize them with Grafana to gain detailed insights into infrastructure and -application behavior. This allows operators to respond quickly to incidents and -continuously improve the reliability and performance of the platform. - -## Testing Methodology - -### Workspace Network Traffic - -6000 concurrent workspaces (2000 per region), each sending 10 kB/s application traffic. - -Test procedure: - -1. Create workspaces. This happens simultaneously in each region with 200 provisioners (and thus 600 concurrent builds). -2. Wait 5 minutes to establish baselines for metrics. -3. Generate 10 kB/s traffic to each workspace (originating within the same region & cluster). - -After, we examine the Coderd, Workspace Proxy, and Database metrics to look for issues. - -### API Request Traffic - -To be determined. - -## Hardware recommendations - -### Coderd - -These are deployed in the Primary region only. - -| vCPU Limit | Memory Limit | Replicas | GCP Node Pool Machine Type | -|----------------|--------------|----------|----------------------------| -| 4 vCPU (4000m) | 12 GiB | 10 | `c2d-standard-16` | - -### Provisioners - -These are deployed in each of the 3 regions. - -| vCPU Limit | Memory Limit | Replicas | GCP Node Pool Machine Type | -|-----------------|--------------|----------|----------------------------| -| 0.1 vCPU (100m) | 1 GiB | 200 | `c2d-standard-16` | - -**Footnotes**: - -- Each provisioner handles a single concurrent build, so this configuration implies 200 concurrent - workspace builds per region. -- Provisioners are run as a separate Kubernetes Deployment from Coderd, although they may - share the same node pool. -- Separate provisioners into different namespaces in favor of zero-trust or - multi-cloud deployments. - -### Workspace Proxies - -These are deployed in the non-Primary regions only. - -| vCPU Limit | Memory Limit | Replicas | GCP Node Pool Machine Type | -|----------------|--------------|----------|----------------------------| -| 4 vCPU (4000m) | 12 GiB | 10 | `c2d-standard-16` | - -**Footnotes**: - -- Our testing implies this is somewhat overspecced for the loads we have tried. We are in process of revising these numbers. - -### Workspaces - -These numbers are for each of the 3 regions. We recommend that you use a separate node pool for user Workspaces. - -| Users | Node capacity | Replicas | GCP | AWS | Azure | -|-------------|----------------------|-------------------------------|------------------|--------------|-------------------| -| Up to 3,000 | 8 vCPU, 32 GB memory | 256 nodes, 12 workspaces each | `t2d-standard-8` | `m5.2xlarge` | `Standard_D8s_v3` | - -**Footnotes**: - -- Assumed that a workspace user needs 2 GB memory to perform -- Maximum number of Kubernetes workspace pods per node: 256 -- As workspace nodes can be distributed between regions, on-premises networks - and cloud areas, consider different namespaces in favor of zero-trust or - multi-cloud deployments. - -### Database nodes - -We conducted our test using the `db-custom-16-61440` tier on Google Cloud SQL. - -**Footnotes**: - -- This database tier was only just able to keep up with 600 concurrent builds in our tests. diff --git a/docs/admin/infrastructure/validated-architectures/index.md b/docs/admin/infrastructure/validated-architectures/index.md index 59602f22bc47a..6bd18f7f3c132 100644 --- a/docs/admin/infrastructure/validated-architectures/index.md +++ b/docs/admin/infrastructure/validated-architectures/index.md @@ -220,8 +220,6 @@ For sizing recommendations, see the below reference architectures: - [Up to 3,000 users](3k-users.md) -- DRAFT: [Up to 10,000 users](10k-users.md) - ### AWS Instance Types For production AWS deployments, we recommend using non-burstable instance types, diff --git a/docs/manifest.json b/docs/manifest.json index 57711406c87d7..8ef8e3e5fa326 100644 --- a/docs/manifest.json +++ b/docs/manifest.json @@ -396,11 +396,6 @@ "title": "Up to 3,000 Users", "description": "Enterprise-scale architecture recommendations for Coder deployments that support up to 3,000 users", "path": "./admin/infrastructure/validated-architectures/3k-users.md" - }, - { - "title": "Up to 10,000 Users", - "description": "Enterprise-scale architecture recommendations for Coder deployments that support up to 10,000 users", - "path": "./admin/infrastructure/validated-architectures/10k-users.md" } ] },