Skip to content

Commit 6abb889

Browse files
authored
perf: optimize GetDeploymentWorkspaceAgentStats by eliminating 2nd select (#21112)
Tracking issue here: coder/internal#1009 To summarize, the current version of this query selects from `workspace_agent_stats` twice. The expensive portion of this query is the bitmap heap scan we have to do for each of these selects. We can easily cut the cost of this query by 40-50% by cutting this down to a single select, and using those rows for both sets of calculations. Eliminating the heap scan itself would require a follow up PR to introduce a new index. Blink helped with the rewrite of the query. The current plan looks like this: ``` Nested Loop (cost=6101.64..6101.69 rows=1 width=64) (actual time=11.782..11.787 rows=1 loops=1) -> Aggregate (cost=2996.17..2996.19 rows=1 width=32) (actual time=3.356..3.357 rows=1 loops=1) -> Bitmap Heap Scan on workspace_agent_stats (cost=54.80..2992.86 rows=440 width=24) (actu al time=0.346..2.927 rows=818 loops=1) Recheck Cond: (created_at > (now() - '00:15:00'::interval)) Filter: (connection_median_latency_ms > '0'::double precision) Rows Removed by Filter: 1070 Heap Blocks: exact=486 -> Bitmap Index Scan on idx_agent_stats_created_at (cost=0.00..54.69 rows=1368 width =0) (actual time=0.241..0.241 rows=1888 loops=1) Index Cond: (created_at > (now() - '00:15:00'::interval)) -> Aggregate (cost=3105.47..3105.49 rows=1 width=32) (actual time=8.418..8.420 rows=1 loops=1) -> Subquery Scan on a (cost=3060.95..3105.39 rows=7 width=32) (actual time=7.851..8.394 ro ws=63 loops=1) Filter: (a.rn = 1) -> WindowAgg (cost=3060.95..3088.29 rows=1368 width=209) (actual time=7.850..8.382 r ows=63 loops=1) Run Condition: (row_number() OVER (?) <= 1) -> Sort (cost=3060.93..3064.35 rows=1368 width=56) (actual time=7.836..8.036 r ows=1888 loops=1) Sort Key: workspace_agent_stats_1.agent_id, workspace_agent_stats_1.create d_at DESC Sort Method: quicksort Memory: 181kB -> Bitmap Heap Scan on workspace_agent_stats workspace_agent_stats_1 (co st=55.03..2989.67 rows=1368 width=56) (actual time=0.388..2.096 rows=1888 loops=1) Recheck Cond: (created_at > (now() - '00:15:00'::interval)) Heap Blocks: exact=486 -> Bitmap Index Scan on idx_agent_stats_created_at (cost=0.00..54. 69 rows=1368 width=0) (actual time=0.295..0.295 rows=1888 loops=1) Index Cond: (created_at > (now() - '00:15:00'::interval)) Planning Time: 2.350 ms Execution Time: 13.152 ms (24 rows) ``` The new plan looks like this ``` Aggregate (cost=2966.96..2966.98 rows=1 width=64) (actual time=3.812..3.814 rows=1 loops=1) -> WindowAgg (cost=2891.96..2916.94 rows=1250 width=88) (actual time=2.696..3.412 rows=1890 loop s=1) -> Sort (cost=2891.94..2895.06 rows=1250 width=80) (actual time=2.686..2.780 rows=1890 loo ps=1) Sort Key: workspace_agent_stats.agent_id, workspace_agent_stats.created_at DESC Sort Method: quicksort Memory: 226kB -> Bitmap Heap Scan on workspace_agent_stats (cost=50.11..2827.64 rows=1250 width=80 ) (actual time=0.218..1.551 rows=1890 loops=1) Recheck Cond: (created_at > (now() - '00:15:00'::interval)) Heap Blocks: exact=474 -> Bitmap Index Scan on idx_agent_stats_created_at (cost=0.00..49.80 rows=1250 width=0) (actual time=0.146..0.147 rows=1890 loops=1) Index Cond: (created_at > (now() - '00:15:00'::interval)) Planning Time: 0.534 ms Execution Time: 3.969 ms (12 rows) ``` If we compare the results of the query they're similar enough that any differences can be attributed to slightly different timestamps for `now()` in the version of the query I am using to generate results for comparison: ``` workspace_rx_bytes | workspace_tx_bytes | workspace_connection_latency_50 | workspace_connection_latency_95 | session_count_vscode | session_count_ssh | session_count_jetbrains | session_count_reconnecting_pty --------------------+--------------------+---------------------------------+---------------------------------+----------------------+-------------------+-------------------------+-------------------------------- 15263563 | 74555854 | 47.933 | 250.5522 | 239 | 59 | 3 | 3 (1 row) workspace_rx_bytes | workspace_tx_bytes | workspace_connection_latency_50 | workspace_connection_latency_95 | session_count_vscode | session_count_ssh | session_count_jetbrains | session_count_reconnecting_pty --------------------+--------------------+---------------------------------+---------------------------------+----------------------+-------------------+-------------------------+-------------------------------- 15295819 | 74598410 | 47.933 | 250.5522 | 239 | 59 | 3 | 3 ``` --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>
1 parent fca9343 commit 6abb889

File tree

2 files changed

+50
-40
lines changed

2 files changed

+50
-40
lines changed

coderd/database/queries.sql.go

Lines changed: 25 additions & 20 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

coderd/database/queries/workspaceagentstats.sql

Lines changed: 25 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -99,27 +99,32 @@ WHERE
9999
);
100100

101101
-- name: GetDeploymentWorkspaceAgentStats :one
102-
WITH agent_stats AS (
103-
SELECT
104-
coalesce(SUM(rx_bytes), 0)::bigint AS workspace_rx_bytes,
105-
coalesce(SUM(tx_bytes), 0)::bigint AS workspace_tx_bytes,
106-
coalesce((PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY connection_median_latency_ms)), -1)::FLOAT AS workspace_connection_latency_50,
107-
coalesce((PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY connection_median_latency_ms)), -1)::FLOAT AS workspace_connection_latency_95
108-
FROM workspace_agent_stats
109-
-- The greater than 0 is to support legacy agents that don't report connection_median_latency_ms.
110-
WHERE workspace_agent_stats.created_at > $1 AND connection_median_latency_ms > 0
111-
), latest_agent_stats AS (
112-
SELECT
113-
coalesce(SUM(session_count_vscode), 0)::bigint AS session_count_vscode,
114-
coalesce(SUM(session_count_ssh), 0)::bigint AS session_count_ssh,
115-
coalesce(SUM(session_count_jetbrains), 0)::bigint AS session_count_jetbrains,
116-
coalesce(SUM(session_count_reconnecting_pty), 0)::bigint AS session_count_reconnecting_pty
117-
FROM (
118-
SELECT *, ROW_NUMBER() OVER(PARTITION BY agent_id ORDER BY created_at DESC) AS rn
119-
FROM workspace_agent_stats WHERE created_at > $1
120-
) AS a WHERE a.rn = 1
102+
WITH stats AS (
103+
SELECT
104+
agent_id,
105+
created_at,
106+
rx_bytes,
107+
tx_bytes,
108+
connection_median_latency_ms,
109+
session_count_vscode,
110+
session_count_ssh,
111+
session_count_jetbrains,
112+
session_count_reconnecting_pty,
113+
ROW_NUMBER() OVER (PARTITION BY agent_id ORDER BY created_at DESC) AS rn
114+
FROM workspace_agent_stats
115+
WHERE created_at > $1
121116
)
122-
SELECT * FROM agent_stats, latest_agent_stats;
117+
SELECT
118+
coalesce(SUM(rx_bytes), 0)::bigint AS workspace_rx_bytes,
119+
coalesce(SUM(tx_bytes), 0)::bigint AS workspace_tx_bytes,
120+
-- The greater than 0 is to support legacy agents that don't report connection_median_latency_ms.
121+
coalesce((PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY connection_median_latency_ms) FILTER (WHERE connection_median_latency_ms > 0)), -1)::FLOAT AS workspace_connection_latency_50,
122+
coalesce((PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY connection_median_latency_ms) FILTER (WHERE connection_median_latency_ms > 0)), -1)::FLOAT AS workspace_connection_latency_95,
123+
coalesce(SUM(session_count_vscode) FILTER (WHERE rn = 1), 0)::bigint AS session_count_vscode,
124+
coalesce(SUM(session_count_ssh) FILTER (WHERE rn = 1), 0)::bigint AS session_count_ssh,
125+
coalesce(SUM(session_count_jetbrains) FILTER (WHERE rn = 1), 0)::bigint AS session_count_jetbrains,
126+
coalesce(SUM(session_count_reconnecting_pty) FILTER (WHERE rn = 1), 0)::bigint AS session_count_reconnecting_pty
127+
FROM stats;
123128

124129
-- name: GetDeploymentWorkspaceAgentUsageStats :one
125130
WITH agent_stats AS (

0 commit comments

Comments
 (0)