Skip to content

Conversation

@ammario
Copy link
Member

@ammario ammario commented Dec 3, 2025

Summary

This PR improves parity between ssh behavior for Coder workspaces vs non-Coder hosts. When you run ssh server -- echo hello, you get hello. But ssh workspace.coder -- echo hello would dump hundreds of lines of startup script logs first—even when not waiting for them.

This is part of making Coder Workspaces and Mux work better together.

Fixes #13580

Changes

  • Non-blocking mode (--wait=no or via ProxyCommand): Show only a concise warning if startup failed/timed out, no logs
  • Non-TTY (e.g., ssh workspace.coder -- cmd): Suppress all agent status output via new Quiet option
  • Blocking mode (--wait=yes): Unchanged—still streams logs

Before/After

Scenario Before After
ssh workspace.coder -- echo hello Dumps full startup logs Just hello
coder ssh --wait=no ws (TTY) "Running" stage + logs Concise warning if failed
coder ssh --wait=yes ws Streams logs Unchanged

Generated with mux

@ammario ammario requested review from mafredri and matifali and removed request for matifali December 3, 2025 01:36
@ammario ammario marked this pull request as ready for review December 3, 2025 01:49
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with not dumping logs in non-blocking mode, but skipping the stage log doesn't add value IMO and introduces inconsistency.

case codersdk.WorkspaceAgentLifecycleStartTimeout:
// Backwards compatibility: Avoid printing warning if
// coderd is old and doesn't set ReadyAt for timeouts.
// Backwards compat: old coderd may not set ReadyAt for timeouts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This seems like a needless change? I guess we should update CLAUDE.md.

default:
if agent.LifecycleState.Starting() {
sw.Log(time.Time{}, codersdk.LogLevelWarn, noticeStartupRunning)
sw.Log(time.Time{}, codersdk.LogLevelWarn, troubleshootingMessage(agent, docsURL+"/admin/templates/troubleshooting#your-workspace-may-be-incomplete"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: For consistency, we should make this an anchor variable too.

"⧗ Waiting for the workspace agent to connect",
"✔ Waiting for the workspace agent to connect",
"⧗ Running workspace agent startup scripts (non-blocking)",
// Non-blocking mode: no "Running" stage, just a notice
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we want this removed, considering we show the Waiting above anyway?

For consistency, we should still show the Running line, just not dump logs.

@mafredri
Copy link
Member

mafredri commented Dec 3, 2025

I took a look at #13580, and it seems like this isn't addressing the decision in that ticket, which is to remove the log streaming entirely from all modes and replace it with instructions. Or am I misunderstanding the conclusion from that ticket?

@ammario
Copy link
Member Author

ammario commented Dec 3, 2025

I took a look at #13580, and it seems like this isn't addressing the decision in that ticket, which is to remove the log streaming entirely from all modes and replace it with instructions. Or am I misunderstanding the conclusion from that ticket?

I don't have an opinion on live-log-streaming. My instinct is we shouldn't do it in the ssh flow for general script-ability but such a decision shouldn't be made by me since I'm not close to Workspaces users these days.

We should definitely not show a massive dump of failed logs every single SSH invocation though. That turns what could be a transient failure into a permanent one for things like Mux which rely on ssh.

@mafredri
Copy link
Member

mafredri commented Dec 3, 2025

I don't have an opinion on live-log-streaming. My instinct is we shouldn't do it in the ssh flow for general script-ability but such a decision shouldn't be made by me since I'm not close to Workspaces users these days.

Alright 👍🏻, we can leave it in then for interactive mode. But just to be clear, it's totally fine to do this for ssh in general since everything is (or at least should be) output to stderr.

Consider:

ssh dev.w.me.coder -- 'echo hi 1>&2'  | grep hi
hi

We output on stderr inside the SSH session, but it gets printed on stdout. So we're not preventing normal scriptability.

We should definitely not show a massive dump of failed logs every single SSH invocation though. That turns what could be a transient failure into a permanent one for things like Mux which rely on ssh.

Totally agree with this 👍🏻. I'd think this could be done with a fairly minor diff, though.

@ammario ammario force-pushed the fix-ssh-startup-script-error branch 3 times, most recently from c5547f3 to 40655a8 Compare December 3, 2025 18:31
},
{
// In non-blocking mode, a startup error still shows the warning but
// doesn't stream logs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Copy this test rather than change it so we don't lose coverage.

}
// For Starting states in non-blocking mode, complete the
// stage since we're not waiting for scripts to finish.
if agent.LifecycleState.Starting() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The previous behavior of not completing the stage and showing the Notice: seemed fine, as long as we don't show script logs.

In non-blocking mode (--wait=no), skip streaming startup script logs
while still showing the stage indicator and warnings for failures.

This eliminates log spam when using SSH via ProxyCommand or with
--wait=no, bringing behavior closer to standard SSH expectations.

Fixes #13580
@ammario ammario force-pushed the fix-ssh-startup-script-error branch from 40655a8 to 3dceb0e Compare December 3, 2025 18:38
mafredri added a commit that referenced this pull request Dec 3, 2025
Add guidelines to prevent common PR feedback issues:

- Don't reword existing comments or code unless directly motivated by the task
- Don't delete existing comments that explain non-obvious behavior
- Add new test cases instead of modifying existing ones

Refs #21073
err = func() error {
// In non-blocking mode, skip streaming logs.
// See: https://github.com/coder/coder/issues/13580
if !opts.Wait {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but code within this func could be simplified, checking follow and having conditionals does not make sense anymore.

@ammario
Copy link
Member Author

ammario commented Dec 4, 2025

@mafredri can you take this to completion? I think you have a very specific vision for how it should work, which is great, but I don't intend on repeat contributions so the dialogue to get the change acceptable not going to yield long-term benefits for us.

@mafredri
Copy link
Member

mafredri commented Dec 4, 2025

@mafredri can you take this to completion? I think you have a very specific vision for how it should work, which is great, but I don't intend on repeat contributions so the dialogue to get the change acceptable not going to yield long-term benefits for us.

Sure, I don't mind. 👍🏻

@mafredri
Copy link
Member

mafredri commented Dec 4, 2025

@ammario This implements the fix with tests: #21105 (essentially ~ same change as the final version of this PR)

For my own sanity though, I thought we should refactor the state machine: #21104.

@ammario
Copy link
Member Author

ammario commented Dec 4, 2025

Ok thanks. I can rubber stamp those if you want but my code review shouldn't be worth anything here.

@mafredri
Copy link
Member

mafredri commented Dec 4, 2025

Ok thanks. I can rubber stamp those if you want but my code review shouldn't be worth anything here.

I'm fairly confident we retained behavior (test coverage and manual testing), but I'll still get another engineer to review in the meantime. Feel free to stamp, at least the latter PR, assuming it matches what you expect to happen.

Early attach (does not skip build logs):

❯ CODER_SSH_WAIT=no ssh foo.coder
=== ✔ Queued [0ms]
==> ⧗ Running
==> ⧗ Running
=== ✔ Running [206ms]
==> ⧗ Setting up
=== ✔ Setting up [82ms]
==> ⧗ Planning infrastructure
2025-12-04 17:53:19.242Z Initializing the backend...
[...]
2025-12-04 17:53:26.503Z Plan: 19 to add, 0 to change, 1 to destroy.
=== ✔ Planning infrastructure [7895ms]
==> ⧗ Commit quota
2025-12-04 17:53:27.129Z Budget           —   40
2025-12-04 17:53:27.129Z Credits consumed —   12
=== ✔ Commit quota [10ms]
==> ⧗ Starting workspace
2025-12-04 17:53:27.184Z Terraform 1.13.4
[...]
2025-12-04 17:53:30.047Z Outputs: 0
=== ✔ Starting workspace [3583ms]
==> ⧗ Cleaning Up
=== ✔ Cleaning Up [19ms]
==> ⧗ Waiting for the workspace agent to connect
=== ✔ Waiting for the workspace agent to connect [955ms]
==> ⧗ Running workspace agent startup scripts (non-blocking)
Notice: The startup scripts are still running and your workspace may be incomplete.
For more information and troubleshooting, see https://coder.com/docs/@v2.29.0/admin/templates/troubleshooting#your-workspace-may-be-incomplete and https://coder.com/docs/admin/templates/troubleshooting
coder@foo:~/coder$

During startup:

❯ CODER_SSH_WAIT=no ssh foo.coder
==> ⧗ Running workspace agent startup scripts (non-blocking)
Notice: The startup scripts are still running and your workspace may be incomplete.
For more information and troubleshooting, see https://coder.com/docs/@v2.29.0/admin/templates/troubleshooting#your-workspace-may-be-incomplete and https://coder.com/docs/admin/templates/troubleshooting
coder@foo:~/coder$

After startup (still shows warning, but no logs):

❯ CODER_SSH_WAIT=no ssh foo.coder
==> ⧗ Running workspace agent startup scripts (non-blocking)
=== ✘ Running workspace agent startup scripts (non-blocking) [18303ms]
Warning: A startup script exited with an error and your workspace may be incomplete.
For more information and troubleshooting, see https://coder.com/docs/@v2.29.0/admin/templates/troubleshooting#startup-script-exited-with-an-error and https://coder.com/docs/admin/templates/troubleshooting
coder@foo:~/coder$

@github-actions github-actions bot added the stale This issue is like stale bread. label Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale This issue is like stale bread.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Startup script error message is obnoxious on SSH

3 participants