Enabling of `MDAnalysis.analysis.align.AverageStructure` parallelization #4738

talagayev · 2024-10-18T18:16:26Z

Fixes #4659 attempt

Changes made in this Pull Request:

added backends and aggregators to AlignTraj and AverageStructure in analysis.align.
added the client_AlignTraj and client_AverageStructure in conftest.py
added client_AlignTraj and client_AverageStructure in run() in test_align.py

Currently for AlignTraj it only accepts serial and dask with multiprocessing leading to the pytests taking forever. An additional error that appears is the following:

OSError: File opened in mode: self.mode. Reading only allow in mode "r"

For AverageStructure the Failure that appears is the following:

AttributeError: 'numpy.ndarray' object has no attribute 'load_new'

Which leads me to believe that AverageStructure can not be parallelized, but I would need additional opinions on it and on AlignTraj :)

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

Developers certificate of origin

I certify that this contribution is covered by the LGPLv2.1+ license as defined in our LICENSE and adheres to the Developer Certificate of Origin.

📚 Documentation preview 📚: https://mdanalysis--4738.org.readthedocs.build/en/4738/

pep8speaks · 2024-10-18T18:16:33Z

Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file testsuite/MDAnalysisTests/analysis/test_align.py:

Line 310:80: E501 line too long (86 > 79 characters)
Line 327:80: E501 line too long (87 > 79 characters)
Line 333:80: E501 line too long (97 > 79 characters)
Line 357:80: E501 line too long (85 > 79 characters)
Line 377:80: E501 line too long (91 > 79 characters)

Comment last updated at 2025-01-11 21:40:18 UTC

codecov · 2025-01-11T20:06:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.73%. Comparing base (b62e2f7) to head (5009b02).
⚠️ Report is 2 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #4738   +/-   ##
========================================
  Coverage    92.72%   92.73%           
========================================
  Files          180      180           
  Lines        22472    22491   +19     
  Branches      3188     3191    +3     
========================================
+ Hits         20837    20856   +19     
  Misses        1177     1177           
  Partials       458      458

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

talagayev · 2025-11-26T22:12:44Z

@marinegor I would ping you in this PR as well.

Here basically I tried different ways to see if it is possible to parallelize the AverageStructure and AlignTraj classes.
I was able to implement the parallelization for AverageStructure with the first attempt not working due to being able to read a universe so that error appeared

AttributeError: 'numpy.ndarray' object has no attribute 'load_new'

I added _first to make the class parallelizable. This works well with the test, expect the case, when in_memory=True, then the process takes very long, which I assume is connected to memory issues, so in the current case I revert back to serial for cases when in_memory=True.

As for AlignTraj due to it transforming coordinates and writing out structures it would be necessary to rewrite more parts of the code to make it parallelizable, so I didn't find an easy solution for that, which wouldn't require bigger modifications of the class. So there the question would be if we keep it as non parallelizable or should I try to modify the code to make it parallelizable, which would require bigger modifications?

marinegor · 2025-11-26T23:30:22Z

@talagayev

expect the case, when in_memory=True, then the process takes very long, which I assume is connected to memory issues

I think parallelization should not actually work with in_memory cases (@yuxuanzhuang please correct me if I'm wrong, afaik you've been working on this). Hence I'd explicitly raise an exception if uses asks for parallel execution and provides in_memory as well.

So there the question would be if we keep it as non parallelizable or should I try to modify the code to make it parallelizable, which would require bigger modifications?

If you're running out of ideas, I'd suggest making this PR for AverageStructure only, and create appropriate issue for AlignTraj, describing your attempts so far.

Also, I imagine there are problems with serialization of self._writer, no? Perhaps we can chat on discord about it (I'm @marinegor there)?

talagayev · 2025-11-27T01:03:46Z

@talagayev

expect the case, when in_memory=True, then the process takes very long, which I assume is connected to memory issues

I think parallelization should not actually work with in_memory cases (@yuxuanzhuang please correct me if I'm wrong, afaik you've been working on this). Hence I'd explicitly raise an exception if uses asks for parallel execution and provides in_memory as well.

So there the question would be if we keep it as non parallelizable or should I try to modify the code to make it parallelizable, which would require bigger modifications?

If you're running out of ideas, I'd suggest making this PR for AverageStructure only, and create appropriate issue for AlignTraj, describing your attempts so far.

Also, I imagine there are problems with serialization of self._writer, no? Perhaps we can chat on discord about it (I'm @marinegor there)?

Yes makes sense. I think the current two ones that use in_memory and are analysis related are AverageStructure and AlignTraj.

Yes that would be good, I can then rename the PR to cover only AverageStructure for now, add the missing parts for the PR (Documentation + Changelog), create an Issue and write you on Discord, so that we can brainstorm how to adjust the code to make it parallelizable. Yes self._writer is one of the difficulties. I guess for the aligntraj you can adjust the code to give it the reference, but yes the writing during parallelization is the tricky part, maybe with tmp information that is then merged or maybe just doing the calculations and the writing is then in conclude, basically keeping that part serial and only making the calculations parallel.

marinegor · 2025-11-27T21:44:40Z

package/MDAnalysis/analysis/align.py

+            if requested_backend not in (None, "serial"):
+                warnings.warn(
+                    "The in-memory parallel trajectory usage is inefficient"
+                    "and not supported. Falling back to serial.",
+                    RuntimeWarning,
+                )


I won't be in favor of a warning, and would rather explicitly raise ValueError because, well, how often do you switch off / ignore warnings?)

True, adjusted it to raise a ValueError for that case and adjusted the test to cover the ValueError.

marinegor · 2025-11-27T21:45:42Z

@talagayev ok, will be waiting for your message.
I also assigned myself a reviewer here, so just re-request review when you think you're done!

talagayev · 2025-11-28T13:03:43Z

@talagayev ok, will be waiting for your message. I also assigned myself a reviewer here, so just re-request review when you think you're done!

Added the Documentation, CHANGELOG and adjust to raise and ValueError. The PR would be ready to be re-reviewed :)

orbeckst added Component-Analysis parallelization labels Mar 14, 2025

talagayev closed this Oct 15, 2025

talagayev force-pushed the align_parllel branch from c37add2 to 03eef45 Compare October 15, 2025 23:15

talagayev and others added 5 commits November 16, 2025 14:58

addition of parallelization to align.py

9152b99

addition of client_AverageStructure to conftest.py

ed792cb

Merge branch 'MDAnalysis:develop' into align_parllel

e4d622d

added parallelization to align.py

20420d2

added tests

75a6c11

talagayev reopened this Nov 23, 2025

talagayev added 3 commits November 23, 2025 20:15

added documentation

489703d

black formatting

a9b73e5

black format

90389bd

talagayev marked this pull request as ready for review November 26, 2025 22:03

talagayev and others added 3 commits November 27, 2025 21:22

Merge branch 'develop' into align_parllel

37df243

adjusted versionchanged and added Changelog entry

6afa3e1

black formatting

ec1cf4e

marinegor self-requested a review November 27, 2025 21:42

marinegor requested changes Nov 27, 2025

View reviewed changes

adjusted to ValueError and black formatting

5009b02

talagayev changed the title ~~'MDAnalysis.analysis.align' parallelization~~ Enabling of MDAnalysis.analysis.align.AverageStructure parallelization Nov 28, 2025

talagayev requested a review from marinegor November 28, 2025 13:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabling of `MDAnalysis.analysis.align.AverageStructure` parallelization #4738

Enabling of `MDAnalysis.analysis.align.AverageStructure` parallelization #4738

talagayev commented Oct 18, 2024 •

edited

Loading

Uh oh!

pep8speaks commented Oct 18, 2024 •

edited

Loading

Uh oh!

codecov bot commented Jan 11, 2025 •

edited

Loading

Uh oh!

talagayev commented Nov 26, 2025

Uh oh!

marinegor commented Nov 26, 2025

Uh oh!

talagayev commented Nov 27, 2025

Uh oh!

marinegor Nov 27, 2025

Uh oh!

talagayev Nov 28, 2025

Uh oh!

marinegor commented Nov 27, 2025

Uh oh!

talagayev commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Enabling of MDAnalysis.analysis.align.AverageStructure parallelization #4738

Are you sure you want to change the base?

Enabling of MDAnalysis.analysis.align.AverageStructure parallelization #4738

Conversation

talagayev commented Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Developers certificate of origin

Uh oh!

pep8speaks commented Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2025-01-11 21:40:18 UTC

Uh oh!

codecov bot commented Jan 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

talagayev commented Nov 26, 2025

Uh oh!

marinegor commented Nov 26, 2025

Uh oh!

talagayev commented Nov 27, 2025

Uh oh!

marinegor Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

talagayev Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

marinegor commented Nov 27, 2025

Uh oh!

talagayev commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Enabling of `MDAnalysis.analysis.align.AverageStructure` parallelization #4738

Enabling of `MDAnalysis.analysis.align.AverageStructure` parallelization #4738

talagayev commented Oct 18, 2024 •

edited

Loading

pep8speaks commented Oct 18, 2024 •

edited

Loading

codecov bot commented Jan 11, 2025 •

edited

Loading