Skip to content

fix: don't create autostart workspace builds with no available provisioners #19067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

cstyan
Copy link
Contributor

@cstyan cstyan commented Jul 29, 2025

This should fix #17941

With just the test addition the test will fail since the workspace build is triggered, we attempt to move the workspace into the start state: require.Equal(t, codersdk.WorkspaceTransitionStop, workspace.LatestBuild.Transition)

    lifecycle_executor_test.go:1718: 
                Error Trace:    /home/coder/coder/coderd/autobuild/lifecycle_executor_test.go:1718
                Error:          Not equal: 
                                expected: "stop"
                                actual  : "start"
                            
                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,2 +1,2 @@
                                -(codersdk.WorkspaceTransition) (len=4) "stop"
                                +(codersdk.WorkspaceTransition) (len=5) "start"
                                 
                Test:           TestExecutorAutostartSkipsWhenNoProvisionersAvailable

With the 2nd commit we skip the calls to wsbuilder.New and builder.Build and so the transition is never scheduled.

coder@automatic-builds-test:~/coder/coderd/autobuild$ go test -run TestExecutorAutostartSkipsWhenNoProvisionersAvailable
PASS
ok      github.com/coder/coder/v2/coderd/autobuild      1.134s
coder@automatic-builds-test:~/coder/coderd/autobuild$ go test -run TestExecutorAutostartSkipsWhenNoProvisionersAvailable -v | grep autostart
    t.go:106: 2025-07-28 22:11:09.405 [warn]  coderd.autobuild: skipping autostart - no available provisioners  workspace_id=219f4901-dfd0-4cdc-8cf2-258861adfd18  workspace_name=dazzling-jemison3-89l

However, many tests don't fully setup provisioner or wait for the provisioner + DB to be fully available, so the third commit adds a somewhat smelly way to skip the provisioner check in tests that don't need to test that functionality.

@cstyan cstyan changed the title fix: don't create autostart workspace builds if there is no available provisioners for the job fix: don't create autostart workspace builds with no available provisioners Jul 29, 2025
@deansheather deansheather self-requested a review July 30, 2025 17:04
}

// Check if any provisioners are active (not stale)
now := dbtime.Now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be using the passed in time from runOnce rather than getting it's own time

Comment on lines 148 to 150
if testing.Testing() {
staleInterval = TestingStaleInterval
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the passed in time from runOnce I don't think changing the interval would be necessary, you could write the test in a way that the default interval works fine

@@ -52,6 +56,9 @@ type Executor struct {
experiments codersdk.Experiments

metrics executorMetrics

// Skip provisioner availability check (should only be true for *some* tests)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says it should only be true for some tests, but it seems it's true for almost every test using coderdtest. How many tests are actually broken by this change? If it's only a few, I'd rather we just fix those tests (e.g. by adding provisioners) rather than add a skip flag.

cstyan added 5 commits August 5, 2025 19:54
provisioner is not available

Signed-off-by: Callum Styan <callumstyan@gmail.com>
transition so that we don't accumulate autostart jobs that will
eventually have to be cleaned up by the reaper routine

Signed-off-by: Callum Styan <callumstyan@gmail.com>
DB/provisioner to be available, so add a way to skip the check

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
…ndling

This commit addresses the requirements to:
1. Comment out provisionerCloser.Close() in TestExecutorAutostartSkipsWhenNoProvisionersAvailable
2. Remove the need for SkipProvisionerCheck by refactoring tests

Changes:
- Remove SkipProvisionerCheck field and logic from Executor
- Remove SkipProvisionerCheck option from coderdtest.Options
- Add proper provisioner daemon tags to match template requirements
- Add helper functions for waiting for provisioner availability
- Update tests to use real provisioner availability checks
- Fix timing issues in provisioner staleness tests
- Comment out provisioner close call to test proper behavior

The test now correctly validates that provisioners remain available
when not explicitly closed, and uses the real hasAvailableProvisioners
logic instead of bypassing it.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan cstyan force-pushed the callum-autostart-no-provisioner branch from 9971794 to 78a8f5e Compare August 5, 2025 20:01
autobuild path have a valid provisioner available and pass an
appropriate time.Time to the channel for triggering the build based on
whether the provisioner is supposed to be valid or stale

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan
Copy link
Contributor Author

cstyan commented Aug 7, 2025

Okay @deansheather thanks for your patience, I think this is finally ready for another review. Took quite a bit of digging with the help of Blink to find all the bits and pieces that had to be updated to get the tests working without the skipProvisionerCheck flag. I've done my own run through of the changes twice now but I may have still missed some small stuff, nit pick away 👍

The TL; DR of the changes is that we in tests which deal with autobuilds we need:

  • to create the test in a way where we also return the handle to the database instance
  • ensure the test creates a provisioner with the relevant tags for the workspace that will be used within that test case (in most cases this is just the default owner: "", scope: organization but there is a mechanism for using specific tags in at least one of the tests)
  • based on whether the autobuild should happen or not, we need to ensure we can retrieve the relevant provisioner and sest the scheduled tick time for the autobuild appropriately (should the provisioner be valid, or should it be stale)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatic builds should not be created if there are no provisioners with matching tags
2 participants