Skip to content

Allow events to be processed in the minion after the signal handler has been invoked #68209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

barneysowood
Copy link
Collaborator

@barneysowood barneysowood commented Jul 23, 2025

What does this PR do?

Modifies the behaviour of the MinionManager.stop() function which is called from the signal handler in cli.daemons.Minion() so that it allows the io_loop to run. This allows minions to handle events on the minion event bus before being shut down.

What issues does this PR fix or reference?

Fixes #68183

Previous Behavior

When calling salt <minion> service.stop salt-minion no_block=True, the result of the job would be sent as an event on the minion event bus, but would not be returned to the master as the signal handler was non-async so blocked any further processing.

This would lead to an error due to no return on the master:

# salt salt-dev1 service.restart salt-minion no_block=True
salt-dev1:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20250714111342562766
ERROR: Minions returned with non-zero exit code
# salt-run jobs.lookup_jid 20250714111342562766

and an exception on the minion:

2025-07-14 11:13:42,724 [salt.loaded.int.module.cmdmod:481 ][INFO    ][43501] Executing command /usr/bin/systemd-run in directory '/root'
2025-07-14 11:13:42,754 [salt.loaded.int.module.cmdmod:974 ][DEBUG   ][43501] stderr: Running as unit: run-r61cd83bc2ece4cb4b2766266ff87fd90.scope; invocation ID: a149b1e649cd4ea392d31fe2ca20762b
2025-07-14 11:13:42,754 [salt.transport.ipc:361 ][DEBUG   ][43373] Closing IPCMessageClient instance
2025-07-14 11:13:42,754 [salt.minion      :2301][INFO    ][43501] Returning information for job: 20250714111342562766
2025-07-14 11:13:42,755 [salt.channel.client:374 ][DEBUG   ][43373] Closing AsyncReqChannel instance
2025-07-14 11:13:42,755 [salt.utils.event :312 ][DEBUG   ][43501] SaltEvent PUB socket URI: /var/run/salt/minion/minion_event_43fd0ec500_pub.ipc
2025-07-14 11:13:42,755 [salt.utils.event :313 ][DEBUG   ][43501] SaltEvent PULL socket URI: /var/run/salt/minion/minion_event_43fd0ec500_pull.ipc
2025-07-14 11:13:42,756 [salt.transport.ipc:361 ][DEBUG   ][43373] Closing IPCMessageSubscriber instance
2025-07-14 11:13:42,757 [salt.utils.parsers:1062][WARNING ][43373] Minion received a SIGTERM. Exiting.
2025-07-14 11:13:42,758 [salt.utils.event :823 ][DEBUG   ][43501] Sending event: tag = __master_req_channel_payload/231a384f-4af6-4751-ab2a-388cea909b21/127.0.0.1; data = {'cmd': '_return', 'id': 'salt-dev1', 'success': True, 'return': True, 'retcode': 0, 'jid': '20250714111342562766', 'fun': 'service.restart', 'fun_args': ['salt-minion', {'__kwarg__': True, 'no_block': True}], 'user': 'sudo_barney', '_stamp': '2025-07-14T11:13:42.758443'}
2025-07-14 11:13:42,757 [salt.cli.daemons :82  ][INFO    ][43373] Shutting down the Salt Minion
2025-07-14 11:14:12,784 [salt.transport.ipc:361 ][DEBUG   ][43501] Closing IPCMessageSubscriber instance
2025-07-14 11:14:12,785 [salt.transport.ipc:361 ][DEBUG   ][43501] Closing IPCMessageClient instance
2025-07-14 11:14:12,785 [salt.utils.process:1004][ERROR   ][43501] An un-handled exception from the multiprocessing process 'ProcessPayload(jid=20250714111342562766)' was caught:
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/process.py", line 999, in wrapped_run_func
    return run_func()
  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 1929, in _target
    run_func(minion_instance, opts, data)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 1923, in run_func
    return Minion._thread_return(minion_instance, opts, data)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 2159, in _thread_return
    minion_instance._return_pub(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 2391, in _return_pub
    ret_val = self._send_req_sync(load, timeout=timeout)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 1652, in _send_req_sync
    raise TimeoutError("Request timed out")
TimeoutError: Request timed out

New Behavior

The minion now returns correctly:

# salt -linfo salt-dev1 service.restart salt-minion no_block=True
salt-dev1:
    True

Merge requirements satisfied?

[NOTICE] Bug fixes or features added to Salt require tests.

Commits signed with SSH key?

Yes

@barneysowood barneysowood self-assigned this Jul 23, 2025
@barneysowood barneysowood force-pushed the minion-restart-handing-202507-3006.x branch from a9d99dc to 4ff867e Compare July 23, 2025 17:55
@barneysowood barneysowood force-pushed the minion-restart-handing-202507-3006.x branch from 4ff867e to 4cdcb35 Compare July 24, 2025 15:02
@barneysowood barneysowood force-pushed the minion-restart-handing-202507-3006.x branch from 7270796 to 94395ed Compare July 24, 2025 17:29
@barneysowood barneysowood force-pushed the minion-restart-handing-202507-3006.x branch from 94395ed to f0b408e Compare July 24, 2025 17:41
@barneysowood barneysowood marked this pull request as ready for review July 24, 2025 17:44
@barneysowood barneysowood requested a review from a team as a code owner July 24, 2025 17:44
@barneysowood barneysowood added the test:full Run the full test suite label Jul 24, 2025
Allows event to be handled after the salt-minion service has received
a SIGTERM. Previously once the signal handler was entered, the ioloop
would no longer run. If there are events on the minion event bus that
needs processing, they would not be handled.

Moves the MinionManager stop() functionality to an async function and
allows the ioloop to run and clear any waiting events and returns to
masters.
Adds test for calling MinionManager stop() to test new functionality to
allow events to be processed as minion is stopping.

Sets up a MinionManager instance with a running event bus, then calls
the stop function and immeadiately sends a test message on the event bus
and reads it back to check that works once the stop() function has been
called.

Then checks that the usual functions to destroy the minion etc have
also been called.
Ignores warning from pylint about self.io_loop.add_callback() not being
callable - it clearly is as stop_async gets called.
Sets up the short sock_dir path in the test. Previously setting it in
conftest.py was breaking another test because I'd done it as a Path,
not a string, but I want to avoid changin behaviour of other tests, so
setting it locally to the test.
@twangboy twangboy added this to the Sulfur v3006.15 milestone Jul 29, 2025
@dwoz dwoz merged commit 6d0af4b into saltstack:3006.x Jul 30, 2025
662 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test:full Run the full test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants