-
Notifications
You must be signed in to change notification settings - Fork 956
fix(coderd/prometheusmetrics): filter deleted wsbuilds to reduce db load #19197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b8a3784
to
068dbd6
Compare
require.NoError(t, err) | ||
for range 1 + n { | ||
insertDeleted(t, db, u, org) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: This fails in the old implementation, but I thought it best to formalize the assumption. Considering the old implementation did not test for this, it seems like validation for the new approach.
This change removes the `GetLatestWorkspaceBuilds` query which includes all workspaces for all time (including deleted). This allows us to also stop using `GetProvisionerJobsByIDs` for said builds as the job status is included in `GetWorkspaces` called separately. Fixes coder/internal#717
068dbd6
to
7b1fc44
Compare
Soft deleted workspaces have been around forever, as has this metric. I wonder if @kylecarbs can reach back that far in his memory and recall whether it was a deliberate decision to incluce builds from deleted workspaces in the The description of the metric talks about "current" number of workspace builds by status, which makes me think it was just an oversight. Since we track the latest build per workspace, not just all builds, the metric isn't useful for tracking an overall rate of succeeding / failing builds. My take is that the most obvious interpretation is that it's for non-deleted workspaces and this change is fine. |
That is the conclusion I also came to when I read the descriptions, which is why I went with this approach. If need be, we can include deleted builds with a specialized query that selects the least amount of data needed. Previously we selected all data only to use a single column which is why optimizations were not possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable 👍
Please ensure the semantics of this change are documented in the release notes for the next release.
We could clarify the definition of this metric too, which may help operators:
https://coder.com/docs/admin/integrations/prometheus#available-metrics
Currently it's quite brief: "The latest workspace builds with a status."
This change removes the
GetLatestWorkspaceBuilds
query which includesall workspaces for all time (including deleted). This allows us to also
stop using
GetProvisionerJobsByIDs
for said builds as the job statusis included in
GetWorkspaces
called separately.Fixes coder/internal#717