chore: add usage tracking package #19095

deansheather · 2025-07-30T06:02:37Z

Not used in coderd yet, see stack.

Adds two new packages:

coderd/usage: provides an interface for the "Collector" as well as a stub implementation for AGPL
enterprise/coderd/usage: provides an interface for the "Publisher" as well as a Tallyman implementation

Relates to coder/internal#814

deansheather · 2025-07-30T06:02:57Z

chore: move usage types to new package #19103
chore: wire up usage tracking for managed agents #19096
chore: add usage tracking package #19095 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

johnstcn

I have some non-blocking comments below but might need to take another pass.

johnstcn · 2025-07-30T08:14:58Z

coderd/database/dbauthz/dbauthz.go

@@ -3913,6 +3913,13 @@ func (q *querier) InsertTemplateVersionWorkspaceTag(ctx context.Context, arg dat
 	return q.db.InsertTemplateVersionWorkspaceTag(ctx, arg)
 }

+func (q *querier) InsertUsageEvent(ctx context.Context, arg database.InsertUsageEventParams) error {
+	if err := q.authorizeContext(ctx, policy.ActionCreate, rbac.ResourceSystem); err != nil {


We should probably create a separate RBAC resource and role for these events.

These are currently not directly CRUDable by any users, even admins. I was going to originally add this to it's own resource, but after I realized that I decided that it could be added down the track if we add APIs for using this information

Even so, ResourceSystem has become a dumping ground of a bunch of random stuff that is unrelated. The whole point of RBAC is to separate sensitive stuff from unsensitive, and this is way less important than, say, crypto keys (which are currently ResourceSystem, so it needs its own resource.

johnstcn · 2025-07-30T08:17:25Z

coderd/database/queries/usageevents.sql

+                -- The parenthesis around @now::timestamptz are necessary to
+                -- avoid sqlc from generating an extra argument.


johnstcn · 2025-07-30T08:21:26Z

enterprise/coderd/usage/publisher.go

+const (
+	CoderLicenseJWTHeader = "Coder-License-JWT"
+
+	tallymanURL         = "https://tallyman-ingress.coder.com"


I can see an argument for making this a var so it's configurable at build time.

johnstcn · 2025-07-30T08:27:55Z

enterprise/coderd/usage/publisher.go

+			if !allFailed {
+				// These are all going to have the same message, so don't log
+				// them. We already logged the overall error above.
+				p.log.Warn(ctx, "tallyman rejected usage event", slog.F("id", event.ID), slog.F("message", rejectedEvent.Message), slog.F("permanent", rejectedEvent.Permanent))


Should we instead collect the failed IDs and warn once? Depending on the number of events, this could spam.

johnstcn · 2025-07-30T08:28:22Z

enterprise/coderd/usage/publisher.go

+		} else {
+			// It's not good if this path gets hit, but we'll handle it as if it
+			// was a temporary rejection.
+			p.log.Warn(ctx, "tallyman did not include a usage event in the response, considering it temporarily rejected", slog.F("id", event.ID))


As above, I'd prefer if we didn't log in a tight loop.

johnstcn · 2025-07-30T08:29:03Z

enterprise/coderd/usage/publisher.go

+	err = p.db.UpdateUsageEventsPostPublish(ctx, dbUpdate)
+	if err != nil {


nit: suggest

if err := p.b.UpdateUsageEventsPostPublish(ctx, dbUpdate); err != nil { ... }

johnstcn · 2025-07-30T08:34:40Z

coderd/database/migrations/000353_create_usage_events_table.up.sql

Obligatory reminder to check migration number before merging!

spikecurtis · 2025-07-30T07:58:53Z

CODEOWNERS

+
+# Usage tracking code requires intimate knowledge of Tallyman and Metronome, as
+# well as guidance from revenue.
+coderd/usage/ @deansheather


in general, I'd like to ensure we have 2 code owners for each thing, so that we have some ability to make progress if someone is out

spikecurtis · 2025-07-30T08:17:00Z

coderd/database/migrations/000353_create_usage_events_table.up.sql

@@ -0,0 +1,26 @@
+CREATE TYPE usage_event_type AS ENUM (


I'd normally be in favor of an enum here, and I'm sorry I didn't think of it at RFC time, but...

We are restricted from using ALTER TYPE ... ADD VALUE in later migrations and instead have to convert everything to an intermediate text column, then back to the enum. Since the usage events table is likely to be very large, this could be a costly migration.

Instead we could make the event_type a text and for now just add a CHECK constraint that it equals 'dc_managed_agents_v1'. Later migrations can use NOT VALID when modifying this check constraint to avoid a costly scan. WDYT?

spikecurtis · 2025-07-30T08:34:20Z

coderd/database/queries/usageevents.sql

+                -- always permanently reject these events anyways.
+                -- The parenthesis around @now::timestamptz are necessary to
+                -- avoid sqlc from generating an extra argument.
+                potential_event.created_at > (@now::timestamptz) - INTERVAL '30 days'


I don't know if PG's query planner is smart enough to figure out the optimal order of these clauses, but this constraint probably is less selective than published_at IS NULL in a working system, so should probably go last.

spikecurtis · 2025-07-30T08:41:01Z

coderd/database/migrations/000353_create_usage_events_table.up.sql

+
+CREATE INDEX idx_usage_events_created_at ON usage_events (created_at);
+CREATE INDEX idx_usage_events_publish_started_at ON usage_events (publish_started_at);
+CREATE INDEX idx_usage_events_published_at ON usage_events (published_at);


I think you want a single index over a tuple and the order matters. Having 3 indexes is much less useful for a query that uses all 3 fields.

It should be published_at first, to allow the query to ignore already published events, which will be the vast majority of the table. Next is publish_started_at, which we use to filter out in-progress events. Lastly created_at since we order by this and exclude anything older than 30 days.

spikecurtis · 2025-07-30T08:53:10Z

coderd/usage/events.go

+// Note that the following event types should not be updated once they are
+// merged into the product. Please consult Dean before making any changes.
+type Event interface {
+	usageEvent() // to prevent external types from implementing this interface


what's the motivation for preventing external types from implementing?

spikecurtis · 2025-07-30T09:00:06Z

coderd/usage/events.go

+//     the count of all existing managed agents (count=N)
+//   - A new managed agent is created (count=1)
+type DCManagedAgentsV1 struct {
+	Count uint64 `json:"count"`


Seems like we'll want a lot more than just the count so that customers can understand their usage, e.g. by template, organization, user.

spikecurtis · 2025-07-30T09:21:04Z

enterprise/coderd/usage/publisher_test.go

+	startErr := make(chan error)
+	go func() {
+		err := publisher.Start()
+		testutil.RequireSend(ctx, t, startErr, err)


can't call Require methods in goroutines, only the main test goroutine.

spikecurtis · 2025-07-30T09:28:38Z

enterprise/coderd/usage/publisher_test.go

+		handler func(req usage.TallymanIngestRequestV1) any
+	)
+	ingestURL := fakeServer(t, tallymanHandler(t, licenseJWT, func(req usage.TallymanIngestRequestV1) any {
+		callCount := atomic.AddInt64(&calls, 1)


In order to prevent test races, we need to ensure that the publish calls are completed before we attempt to read this value. If the test prevents such races, atomic is superfluous because we will not concurrently access it. Making it atomic doesn't do anything to prevent racy tests and is confusing because it signals to people that there might be concurrent accesses, when we need to ensure there are none.

spikecurtis · 2025-07-30T09:35:20Z

enterprise/coderd/usage/publisher.go

+// publishOnce publishes up to tallymanPublishBatchSize usage events to
+// tallyman. It returns the number of successfully published events.
+func (p *tallymanPublisher) publishOnce(ctx context.Context, deploymentID uuid.UUID) (int, error) {
+	licenseJwt, err := p.getBestLicenseJWT(ctx)


We shouldn't be doing this for every publish. Can we connect it up to the rest of the license code so we don't add more queries? Or at least use the same query strategy of querying every 10 minutes and listening for published changes?

spikecurtis · 2025-07-30T09:38:46Z

enterprise/coderd/usage/publisher.go

+	var (
+		acceptedEvents = make(map[string]*TallymanIngestAcceptedEventV1)
+		rejectedEvents = make(map[string]*TallymanIngestRejectedEventV1)
+	)


nit: the multiline var makes this harder to read IMO and takes up twice the number of lines as

acceptedEvents := make(map[string]*TallymanIngestAcceptedEventV1) rejectedEvents := make(map[string]*TallymanIngestRejectedEventV1)

chore: add usage tracking package

f2963cd

github-actions bot assigned deansheather Jul 30, 2025

deansheather requested review from johnstcn and spikecurtis July 30, 2025 06:04

deansheather marked this pull request as ready for review July 30, 2025 06:05

deansheather mentioned this pull request Jul 30, 2025

chore: wire up usage tracking for managed agents #19096

Draft

deansheather added 2 commits July 30, 2025 07:04

add fixture

939aba4

fix manifest.json

e376311

johnstcn reviewed Jul 30, 2025

View reviewed changes

spikecurtis reviewed Jul 30, 2025

View reviewed changes

deansheather mentioned this pull request Jul 30, 2025

chore: move usage types to new package #19103

Draft

		-- The parenthesis around @now::timestamptz are necessary to
		-- avoid sqlc from generating an extra argument.

		err = p.db.UpdateUsageEventsPostPublish(ctx, dbUpdate)
		if err != nil {

chore: add usage tracking package #19095

Are you sure you want to change the base?

chore: add usage tracking package #19095

Conversation

deansheather commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deansheather commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnstcn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deansheather commented Jul 30, 2025 •

edited

Loading

deansheather commented Jul 30, 2025 •

edited

Loading