-
Notifications
You must be signed in to change notification settings - Fork 956
Closed
Labels
must-doIssues that must be completed by the end of the Sprint. Or else. Only humans may set this.Issues that must be completed by the end of the Sprint. Or else. Only humans may set this.
Description
Sending SIGTERM
to the coder server is supposed to trigger a graceful shutdown that drains build jobs before exiting. However, it seems like when a build job is running at the time SIGTERM
is received, the job gets interrupted anyway:
Stop caught, waiting for provisioner jobs to complete and gracefully exiting. Use ctrl+\ to force quitShutting down API server...
2024-08-23 15:12:17.146 [info] provisionerd-40d0ef3f-5f61-40ea-838a-45d20073363d-3.runner: workspace provisioner job logged job_id=4b457a13-609f-413b-bf61-fd29bf86bebd template_name=workspace-v1 template_version=zealous_borg5 workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151 workspace_id=d8b32732-8313-47a1-b12e-61a5be6ea289 workspace_name=[redacted] workspace_owner=[redacted] workspace_transition=start level=INFO workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151 ...
output= Interrupt received.
Please wait for Terraform to exit or data loss may occur.
Gracefully shutting down...
2024-08-23 15:12:17.146 [info] provisionerd-40d0ef3f-5f61-40ea-838a-45d20073363d-3.runner: workspace provisioner job logged job_id=4b457a13-609f-413b-bf61-fd29bf86bebd template_name=workspace-v1 template_version=zealous_borg5 workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151 workspace_id=d8b32732-8313-47a1-b12e-61a5be6ea289 workspace_name=[redacted] workspace_owner=[redacted] workspace_transition=start level=INFO output="Stopping operation..." workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151
2024-08-23 15:12:17.146 [info] provisionerd-40d0ef3f-5f61-40ea-838a-45d20073363d-3.runner: workspace provisioner job logged job_id=4b457a13-609f-413b-bf61-fd29bf86bebd template_name=workspace-v1 template_version=zealous_borg5 workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151 workspace_id=d8b32732-8313-47a1-b12e-61a5be6ea289 workspace_name=[redacted] workspace_owner=[redacted] workspace_transition=start level=INFO output="netflix_ec2.dev: Modifications errored after 24s" workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151
This was a result of configuring systemd to send the coder server SIGTERM
and wait 10 minutes before following up with a kill signal. Howver, the interrupt and "Stopping operation..." log message appears to be immediate. The provider log also showed that its operation was cancelled partway through.
KillSignal=SIGTERM
SendSIGKILL=yes
TimeoutStopSec=10min
This is a high priority issue for us as it limits our ability to safely deploy updates.
Metadata
Metadata
Assignees
Labels
must-doIssues that must be completed by the end of the Sprint. Or else. Only humans may set this.Issues that must be completed by the end of the Sprint. Or else. Only humans may set this.