-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Proposed enhancement to VerletSplit #4622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…ck initially required in rk implementation.
…rletSplitRK::setup_minimal.
|
@bcdandurand this is a promising idea. I will look at the implementation and give it a try. |
|
@bcdandurand Thanks for contributing the work, this is potentially very helpful for large scale simulations with long-range electrostatics on a large number of MPI processes. Have you observed any speedup with this implementation in your test runs? I can build with CMake and test with a modified version of
|
|
@ndtrung81 Thank you very much for reviewing this contribution. then the simulation takes about 47 seconds, versus about 77 seconds with the non-default (otherwise baseline) settings In both runs, I use 96 R processes and 4 K processes with a sbatch script similar to |
|
@ndtrung81 Commented out code: Otherwise, the currently commented out implementation of
Please advise as to the preferred way to provide updated changes in addition to pushing to the bcdandurand fork. doc pages: I'll look into this and enquire as needed. |
…yle verlet and kspace_style pppm/rk pass.
src/KSPACE/pppm_rk.h
Outdated
|
|
||
| virtual void compute_grid_potentials(int, int) override; | ||
| //virtual void compute(int, int) override; | ||
| virtual void compute(int, int) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bcdandurand Please note that member functions should have either virtual or override. We prefer using override except for the base function where virtual is required. We're currently cleaning this up and making things consistent in PR #4634
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, will do.
|
I'm looking at incorporating an analogous decomposition for |
Testing with sbatch command, e.g., I'll soon have an update with the addition of an implementation of I can then turn to the doc pages after the above is finished. |
|
@bcdandurand Before working on additional features, you should probably first pay attention to clean integration of your code into the LAMMPS code base. You pull request branch currently has a merge conflict with upstream and fails to compile on all our test platforms. |
|
@akohlmey @ndtrung81 et al. The issue is due to the assumption in PPPM_RK that the initial MPI communicator is partitioned (see Currently, there is only beneficial use of MPI parallelism if My initial thought to prevent users from using an ineffective combination of styles (from a parallel computing perspective) is to have an The issue described in this comment is also relevant to updating the manual, because users would need to have consistency in the I hope this is clear, I'll clarify where helpful. |
|
@bcdandurand If you don't have time to complete the work on |
@ndtrung81 Understood. The cluster I normally work on is under maintenance, I expect to have access to it again on Friday. I'll then address the documentation as soon as I can. |
…the Integrate instance supports rk decomposition XOR the kspace style does.
…ce style and run style are not compatible.
sjplimp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to review this PR when it is closer to merging
|
Thanks @bcdandurand for submitting these enhancements. And for the thorough explanation in the PR description of what you've done. Also thanks to @ndtrung81 for looking it over carefully and testing it out. I have just a couple hi-level suggestions. (1) In the doc pages for run_style and/or kspace_style for the new variant styles could you provide some numbers for sample speed-ups. This is to motivate users to try out the new options. Ideally, the numbers would compare vanilla verlet vs verlet/split (previous) vs verlet/split/rk. Could be explained in a few sentences, or add a simple table with 3 columns of numbers. (2) If you like, you can add a "citation" in the source code for your pending paper which will trigger a prompt for the user to cite it, whenever the verlet/split/rk option is used. |
sjplimp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my recent comment
|
Understood, I shall aim to address these requested changes this week.
-Brian Dandurand
…On Thu, Dec 4, 2025 at 5:32 PM Steve Plimpton ***@***.***> wrote:
***@***.**** requested changes on this pull request.
see my recent comment
—
Reply to this email directly, view it on GitHub
<#4622 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGEPC6REKWRFMPUMLXCMQGT4ABV3VAVCNFSM6AAAAAB7CPMJ4KVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKNBRGIYTSNZQHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Summary
@akohlmey @stanmoore1
Please forward to anyone else as warranted.
Class
VerletSplitRK : Verlet(run_style option verlet/split/rk)Alternative to
VerletSplit, where the K process responsibility is reduced to receiving charge densities,computing grid potentials by solving the Poisson equations via FFTs, and communicating them to the R processes
The R processes thus do more of the computation (accumulating charge densities, interpolating long-range forces from the grid potentials)
that the K processes would have done with
VerletSplit.These are the motivations for this:
Thus, there is no need for the inter-RK communication of data structure information during re-neighboring.
This removes the need for calls to
VerletSplit::rk_setup. Inter-RK communication buffers depend on the grid parameters (and not the number of atoms)Class
PPPM_RK : PPPM(kspace_style pppm/rk)This class is needed to have a clean implementation of
VerletSplitRK.The computation of long-range forces is carried out in a heterogeneously parallel manner
with the same distinction between R processes and K processes as in VerletSplit.
Analogs for other kspace styles, e.g.,
PPPMDipoleRK : PPPMDipole, may possibly be contributed later.Much of the partitioned inter-RK communication that was previously in
VerletSplitis moved toPPPM_RK.The following virtual functions, which were added to
kspace.h, are implemented inpppm_rk.cpp:Effectively, compute has been split into three parts, with inter-RK communication between each part.
R processes compute the first and third part, K processes the second (compute_grid_potentials) part.
Related Issue(s)
This pull request does not address a currently open issue. It is related to a closed issue: Issue #4149
Author(s)
The person responsible for the coding development is:
Brian Dandurand
MSCA Research Fellow
Electronics, Electrical Engineering, and Computer Science (EEECS)
Queen's University Belfast
Institutional Email: b.dandurand@qub.ac.uk
Long-term Email: bcdandurand@gmail.com
The developments behind this pull request were supported by the project
Project: Asynchronous Scientific Continuous Computations Exploiting Disaggregation (ASCCED)
Funding source: UKRI EPSRC EP/X01794X/1
Members of this project:
Research Fellow: Brian Dandurand
Supervisor 1: Prof. Hans Vandierendonck (Queen's University Belfast --- EEECS)
Supervisor 2: Prof. Bronis R. de Supinski (Queen's University Belfast --- EEECS)
The developments of this pull request may be described as a preliminary implementation of the enhanced baseline in
Dandurand, B., Vandierendonck, H., de Supinski, B. "Improving parallel scalability for molecular dynamics simulations in the exascale era". 39th IEEE International Parallel & Distributed Processing Symposium. June 3-7, 2025. Milan, Italy. To appear.
Licensing
By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1).
Backward Compatibility
There should be no issues with backward compatibility. As for core LAMMPS source files, only de minimis additions were made to the KSpace class
Implementation Notes
No other features should be affected. The contributed features are invoked by specifying the following pair of options in a LAMMPS script:
These two options correspond to two new classes:
PPPM_RK : PPPMandVerletSplitRK : VerletPPPM_RK: Due to the communication between R and K processes, buffers forPPPMdata structures are allocated:PPPM_RK: Parallel topology information similar to what is maintained inVerletSplit:PPPM_RK: Most of the collective and point-to-point communications are of the immediate formMPI_I*with correspondingMPI_Waitcalls.Thus
PPPM_RKemploys variousMPI_Commclones ofMPI_Comm blockto avoid collision ofMPI_I*collective communicationsCorresponding
MPI_Requeststructures are also employed.PPPM_RK: The main feature is that compute is split into three parts.protected: virtual void compute_charge_densities(int,int)is called from the R-processes to gather charge densities into the grid it is called frompublic: virtual void r2k_comm(int &eflag, int &vflag), where the charge densities are also communicated to the K-processespublic: virtual void compute_grid_potentials(int, int); is called from the K-processes to compute the grid potentials from the charge densities by solving the Poisson equations via FFTs. These charge densities are sent to the appropriate R-processes by callingpublic: virtual void k2r_comm(int eflag, int vflag)protected: virtual void compute_interpolate_forces(int, int); called frompublic: virtual void k2r_comm(int eflag, int vflag)VerletSplitRK: With the use of KSpace of type havingrk_flag == 1, the implementation ofVerletSplitRK : Verletis analogous to thatVerletSplit : Verlet, but of a more streamlined form. Due to the different task assignments of R versus K processors, certain function calls are no longer necessary.Correctness was verified on three applications from the examples directory.
For all three tests,
kspace_style pppm/rkandrun_style verlet/split/rkwere used in place of the default settings in the in.* script.HEAT: Changed
kspace_style ewald-->kspace_style pppm/rk; Near-constant preservation of total energy was preserved as seen in baselineVISCOSITY: Statistical outputs for the first 1000 iterations was very close to baseline.
(After 1000 iterations, deviation is observed with different parallel topologies even among baseline runs, due to the butterfly effect and
the fact that floating point addition is not perfectly associative.)
peptide: Near-constant preservation of total energy observed similar to baseline.
Post Submission Checklist
Further Information, Files, and Links
The developments of this pull request may be described as a preliminary implementation of the enhanced baseline in
Dandurand, B., Vandierendonck, H., de Supinski, B. "Improving parallel scalability for molecular dynamics simulations in the exascale era".
39th IEEE International Parallel & Distributed Processing Symposium. June 3-7, 2025. Milan, Italy. To appear.
With input from LAMMPS developers, I hope to add analogous /rk variants to other KSpace subclasses as warranted in future pulls so as to make the verlet/split/rk option more broadly applicable.