gh-64192: Make `imap()`/`imap_unordered()` in `multiprocessing.pool` actually lazy #136871

obaltian · 2025-07-20T13:07:20Z

Context recap (#74028)

Let's consider that we have an input iterable and N = len(iterable).

Current multiprocessing.Pool.imap and multiprocessing.Pool.imap_unordered are O(N) in space (unecessarily expensive on large iterables, completely impossible to use on infinite iterables):
The call results: Iterator = pool.imap(func, iterable) iterates over all the elements of the iterable, submitting N tasks to the pool (results are collected into a list of size N). Following calls to next(results) take the oldest result from the list (FIFO) (waiting for it if not available yet) and return it.

Proposal: add an optional buffersize param

With this proposal, the call results: Iterator = pool.imap(func, iterable, buffersize=b) will iterate only over the first b elements of iterable (acquiring buffersize semaphore while iterating), submitting b tasks to worker threads and then will return the results iterator.

Calls to next(results) will release buffersize semaphore (allowing task_handler thread to get the next input element from iterable to submit a new task to a worker thread) and then return the result.

buffersize semaphores from iterators not exhausted yet are also being released on pool termination to avoid deadlocks.

Benefits:

The space complexity becomes O(b)
When using a buffersize the client code takes back the control over the speed of iteration over the input iterable: after an initial spike of b calls to func to fill the buffer, the iteration over input iterable will follow the rate of the iteration over the results (controlled by the client), which is critical when func involves talking to services that you don't want to overload.

Feature history

buffersize support has been recently merged into concurrent.futures.Executor.map implementation (#125663) and many code/test/doc parts are based on ones from there to ensure consistency between modules.
I want to thank to authors of that PR for the references.

Links:

Docs preview

Issue: Pool.imap doesn't work as advertised #64192

python-cla-bot · 2025-07-20T13:07:24Z

All commit authors signed the Contributor License Agreement.

bedevere-app · 2025-07-20T13:07:24Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

bedevere-app · 2025-07-20T13:09:25Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

bedevere-app · 2025-07-20T13:16:56Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

picnixz · 2025-07-20T13:17:54Z

Please open an issue first for this one (I don't know if it's something that is discussed in the sprint).

picnixz · 2025-07-20T13:19:10Z

Previous discussion for executor: #74028

obaltian · 2025-07-20T13:19:14Z

@picnixz thanks. The issue already exists, I will add more detailed description here soon, just wanted to run all tests for now.

obaltian · 2025-07-20T13:22:01Z

@picnixz here is a related issue - #64192, I've left a summary at the end of that thread. Thank you.

picnixz · 2025-07-20T13:25:04Z

Thanks! I'm actually adding the GH issue inthe title so that we can backref it easily.

picnixz · 2025-07-20T13:25:33Z

I'm adding a skipnews just for the CI otherwise you'll get a notification saying that the labels are incorrect.

Using `threading.Semaphore` makes it easier to cap the number of concurrently ran tasks. It also makes it possible to remove busy wait in child thread by waiting for semaphore. Also I've updated code to use the backpressure pattern - the new tasks are scheduled as soon as the user consumes the old ones.

This new behavior allow smaller real concurrency number than number of running processes. Previously, it was not allowed since we implicitly incremented buffersize by `self._processes`.

These tests mostly come from a similar PR adding `buffersize` param to `concurrent.futures.Executor.map` - https://github.com/python/cpython/pull/125663/files

obaltian · 2025-07-28T00:23:22Z

Doc/library/multiprocessing.rst

+      the *iterables* pauses until a result is yielded from the buffer.
+      To fully utilize pool's capacity, set *buffersize* to the number of
+      processes in pool (to consume *iterable* as you go) or even higher
+      (to prefetch *buffersize - processes* arguments).


I was questioning myself whether we should also describe the difference in buffersize usefulness
between multiprocessing.Pool and multiprocessing.ThreadPool, I would be glad to hear an opinion on that – what do you think?

This feature is less useful for using with multiprocessing.ThreadPool class, where user can pass generator as iterable. multiprocessing.Pool with processes currently can't accept generators as they aren't pickable, so the user still needs to pass iterable as, for example, list, which is O(n). However, there is another huge benefit to using it – tasks will also be submitted lazily (while user iterates over results), and not-needed-yet results won't stack up in memory. So I think the feature is useful for any kind of pool and docs shouldn't suggest to use it specifically for threads.

obaltian force-pushed the feature/add-buffersize-to-multiprocessing branch from 3cf40e3 to c1f8081 Compare July 20, 2025 13:09

obaltian force-pushed the feature/add-buffersize-to-multiprocessing branch from c1f8081 to ec37be8 Compare July 20, 2025 13:16

picnixz changed the title ~~Make imap()/imap_unordered() in multiprocessing.Pool actually lazy~~ gh-64192: Make imap()/imap_unordered() in multiprocessing.Pool actually lazy Jul 20, 2025

bedevere-app bot mentioned this pull request Jul 20, 2025

Pool.imap doesn't work as advertised #64192

Open

picnixz added the skip news label Jul 20, 2025

obaltian changed the title ~~gh-64192: Make imap()/imap_unordered() in multiprocessing.Pool actually lazy~~ gh-64192: Make imap()/imap_unordered() in multiprocessing.pool actually lazy Jul 20, 2025

obaltian force-pushed the feature/add-buffersize-to-multiprocessing branch 5 times, most recently from fc37cb4 to 2e2435c Compare July 27, 2025 22:40

obaltian added 9 commits July 28, 2025 00:47

draft: impl lazy input consumption in mp.Pool.imap(_unordered)

d8e8a02

Update buffersize behavior to match concurrent.futures.Executor behavior

937862d

This new behavior allow smaller real concurrency number than number of running processes. Previously, it was not allowed since we implicitly incremented buffersize by `self._processes`.

Release all buffersize_lock obj from the parent thread when terminate

b6f6caa

Add 2 basic ThreadPool.imap() tests w/ and w/o buffersize

3bafd5d

Fix accidental swap in imports

e43232b

clear Pool._taskqueue_buffersize_semaphores safely

dd416e0

Slightly optimize Pool._taskqueue_buffersize_semaphores terminate

99f5a8c

Rename Pool.imap() buffersize-related tests

2a53398

obaltian added 5 commits July 28, 2025 00:47

Fix typo in IMapIterator.__init__()

f8878eb

Add tests for buffersize combinations with other kwargs

2ca51e3

Remove if-branch in _terminate_pool

bf27d5d

Add more edge-case tests for imap and imap_unodered

508c765

These tests mostly come from a similar PR adding `buffersize` param to `concurrent.futures.Executor.map` - https://github.com/python/cpython/pull/125663/files

Split inf iterable test for imap and imap_unordered

dff1167

obaltian force-pushed the feature/add-buffersize-to-multiprocessing branch from 2e2435c to d0b584d Compare July 27, 2025 22:47

Add doc for buffersize argument of imap and imap_unordered

94cc0b9

obaltian force-pushed the feature/add-buffersize-to-multiprocessing branch from d0b584d to 94cc0b9 Compare July 27, 2025 23:00

obaltian commented Jul 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-64192: Make `imap()`/`imap_unordered()` in `multiprocessing.pool` actually lazy #136871

gh-64192: Make `imap()`/`imap_unordered()` in `multiprocessing.pool` actually lazy #136871

obaltian commented Jul 20, 2025 •

edited

Loading

Uh oh!

python-cla-bot bot commented Jul 20, 2025 •

edited

Loading

Uh oh!

bedevere-app bot commented Jul 20, 2025

Uh oh!

bedevere-app bot commented Jul 20, 2025

Uh oh!

bedevere-app bot commented Jul 20, 2025

Uh oh!

picnixz commented Jul 20, 2025

Uh oh!

picnixz commented Jul 20, 2025

Uh oh!

obaltian commented Jul 20, 2025

Uh oh!

obaltian commented Jul 20, 2025

Uh oh!

picnixz commented Jul 20, 2025

Uh oh!

picnixz commented Jul 20, 2025 •

edited

Loading

Uh oh!

obaltian Jul 28, 2025

Uh oh!

Uh oh!

Uh oh!

gh-64192: Make imap()/imap_unordered() in multiprocessing.pool actually lazy #136871

Are you sure you want to change the base?

gh-64192: Make imap()/imap_unordered() in multiprocessing.pool actually lazy #136871

Conversation

obaltian commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context recap (#74028)

Proposal: add an optional buffersize param

Feature history

Links:

Uh oh!

python-cla-bot bot commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-app bot commented Jul 20, 2025

Uh oh!

bedevere-app bot commented Jul 20, 2025

Uh oh!

bedevere-app bot commented Jul 20, 2025

Uh oh!

picnixz commented Jul 20, 2025

Uh oh!

picnixz commented Jul 20, 2025

Uh oh!

obaltian commented Jul 20, 2025

Uh oh!

obaltian commented Jul 20, 2025

Uh oh!

picnixz commented Jul 20, 2025

Uh oh!

picnixz commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

obaltian Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gh-64192: Make `imap()`/`imap_unordered()` in `multiprocessing.pool` actually lazy #136871

gh-64192: Make `imap()`/`imap_unordered()` in `multiprocessing.pool` actually lazy #136871

obaltian commented Jul 20, 2025 •

edited

Loading

python-cla-bot bot commented Jul 20, 2025 •

edited

Loading

picnixz commented Jul 20, 2025 •

edited

Loading