-
Notifications
You must be signed in to change notification settings - Fork 1.1k
perf(coderd/database): add index on workspace_app_statuses.app_id #21099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(coderd/database): add index on workspace_app_statuses.app_id #21099
Conversation
| @@ -0,0 +1 @@ | |||
| CREATE INDEX workspace_app_statuses_app_id_idx ON workspace_app_statuses (app_id); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're often looking for the latest status, it would help to make this (app_id, created_at)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of GetWorkspaceAppStatusesByAppIDs we're only filtering on appID and not even sorting, this may make PostgreSQL ditch the index.
Sorting might make sense for that query, but will of course add its own overhead. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the sorting to GetWorkspaceAppStatusesByAppIDs as well. Given that the index is already sorted, the overhead should be minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field order matters. PG should be able to use a (app_id, ...) index for WHERE app_id = clauses no matter what else is in the index because the tuples are sorted left to right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes true, but the query optimizer can also opt not to use it due to higher specificity/perceived cost.
|
If there are any heavy users of task statuses with a large table this could be a potentially expensive migration. We need some way to call it out in the release notes, at minimum. cc @bpmct @david-fraley |
The queries GetWorkspaceAppStatusesByAppIDs and GetLatestWorkspaceAppStatusByAppID both filter by app_id but there was no index to support this, causing sequential scans. Use a composite index on (app_id, created_at DESC) to support both queries efficiently, and add ORDER BY to GetWorkspaceAppStatusesByAppIDs for consistent ordering (newest first). Refs https://www.notion.so/coderhq/2k-Task-Status-Updates-retest-2bfd579be59280ca8f86eb376d05e0fa
836370c to
821a4f4
Compare
Added a note to the description 👍🏻 |
The queries
GetWorkspaceAppStatusesByAppIDsandGetLatestWorkspaceAppStatusByAppIDboth filter byapp_idbut there was no index to support this, causing sequential scans.Each query took ~30ms on average with 80 requests/second to the cluster, resulting in ~5.2 query-seconds every second.
Use a composite index on
(app_id, created_at DESC)to support both queries efficiently, and addORDER BYtoGetWorkspaceAppStatusesByAppIDsfor consistent ordering (newest first).NOTE: This migration creates an index on
workspace_app_statuses. For deployments with heavy task usage, this may take a moment to complete. The concern is minor since tasks are a new feature and most tables will be small.Refs https://www.notion.so/coderhq/2k-Task-Status-Updates-retest-2bfd579be59280ca8f86eb376d05e0fa