Proposed enhancement to VerletSplit #4622

bcdandurand · 2025-06-11T12:56:01Z

Summary
@akohlmey @stanmoore1
Please forward to anyone else as warranted.

Class VerletSplitRK : Verlet (run_style option verlet/split/rk)
Alternative to VerletSplit, where the K process responsibility is reduced to receiving charge densities,
computing grid potentials by solving the Poisson equations via FFTs, and communicating them to the R processes
The R processes thus do more of the computation (accumulating charge densities, interpolating long-range forces from the grid potentials)
that the K processes would have done with VerletSplit.

These are the motivations for this:

Improved parallel scalability: The parts of the long-range computation that have good parallel scalability with increasing number of processes are moved to the R-processes. Thus, the parts that scale the least well are isolated to relatively few K-processes.
Reduced need for communication: No need for inter-RK process communication of atom-specific information.
Thus, there is no need for the inter-RK communication of data structure information during re-neighboring.
This removes the need for calls to VerletSplit::rk_setup. Inter-RK communication buffers depend on the grid parameters (and not the number of atoms)
Setup for future developments: The inter-RK communication of grid potentials instead of atom-wise long-range forces is more amenable to approximation in light of asynchronous communication. (This feature is described in our paper but is not part of the current pull request).

Class PPPM_RK : PPPM (kspace_style pppm/rk)
This class is needed to have a clean implementation of VerletSplitRK.
The computation of long-range forces is carried out in a heterogeneously parallel manner
with the same distinction between R processes and K processes as in VerletSplit.
Analogs for other kspace styles, e.g., PPPMDipoleRK : PPPMDipole, may possibly be contributed later.
Much of the partitioned inter-RK communication that was previously in VerletSplit is moved to PPPM_RK.
The following virtual functions, which were added to kspace.h, are implemented in pppm_rk.cpp:

        virtual void compute_grid_potentials(int, int) override;
        virtual void r2k_comm(int &eflag, int &vflag) override;
        virtual void k2r_comm(int eflag, int vflag) override;

Effectively, compute has been split into three parts, with inter-RK communication between each part.
R processes compute the first and third part, K processes the second (compute_grid_potentials) part.

Related Issue(s)

This pull request does not address a currently open issue. It is related to a closed issue: Issue #4149

Author(s)

The person responsible for the coding development is:
Brian Dandurand
MSCA Research Fellow
Electronics, Electrical Engineering, and Computer Science (EEECS)
Queen's University Belfast

Institutional Email: b.dandurand@qub.ac.uk
Long-term Email: bcdandurand@gmail.com

The developments behind this pull request were supported by the project
Project: Asynchronous Scientific Continuous Computations Exploiting Disaggregation (ASCCED)
Funding source: UKRI EPSRC EP/X01794X/1
Members of this project:
Research Fellow: Brian Dandurand
Supervisor 1: Prof. Hans Vandierendonck (Queen's University Belfast --- EEECS)
Supervisor 2: Prof. Bronis R. de Supinski (Queen's University Belfast --- EEECS)

The developments of this pull request may be described as a preliminary implementation of the enhanced baseline in

Dandurand, B., Vandierendonck, H., de Supinski, B. "Improving parallel scalability for molecular dynamics simulations in the exascale era". 39th IEEE International Parallel & Distributed Processing Symposium. June 3-7, 2025. Milan, Italy. To appear.

Licensing

By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1).

Backward Compatibility

There should be no issues with backward compatibility. As for core LAMMPS source files, only de minimis additions were made to the KSpace class

$ diff kspace.h ~/lammps/src/kspace.h
84,85d83
<   int rk_flag;         /* 1 if a solver uses two distinct communicator worlds for
<                             r-space and k-space computations*/
178,182d175
<
<   // to be overridden in the *RK subclasses for which rk_flag == 1
<   virtual void r2k_comm(int &eflag, int &vflag) {};  //rk_flag == 1
<   virtual void k2r_comm(int eflag, int vflag) {};    //rk_flag == 1
<   virtual void compute_grid_potentials(int, int) {}; //rk_flag == 1

$ diff kspace.cpp ~/lammps/src/kspace.cpp
44c44
<     dipoleflag = spinflag = rk_flag = 0;
---
>     dipoleflag = spinflag = 0;

Implementation Notes

No other features should be affected. The contributed features are invoked by specifying the following pair of options in a LAMMPS script:

	kspace_style	pppm/rk		0.0001  #Or whatever precision
	run_style 	verlet/split/rk

These two options correspond to two new classes: PPPM_RK : PPPM and VerletSplitRK : Verlet

PPPM_RK: Due to the communication between R and K processes, buffers for PPPM data structures are allocated:

  FFT_SCALAR ***density_brick_buf;
  FFT_SCALAR ***vdx_brick_buf, ***vdy_brick_buf, ***vdz_brick_buf;
  FFT_SCALAR ***u_brick_buf;

  double energy_buf;
  double virial_mpi_buf[6];
  int flags_buf[2];
  double domain_boxbuf[6];

PPPM_RK: Parallel topology information similar to what is maintained in VerletSplit:

  int rproc;                            // 1 if an Rspace proc, 0 if Kspace; (corresponds to int master in VerletSplit)
  int me_block;                          // proc ID within Rspace/Kspace block
  int ratio;                             // ratio of Rspace procs to Kspace procs
  MPI_Comm block;                        // communicator within one block

PPPM_RK: Most of the collective and point-to-point communications are of the immediate form MPI_I* with corresponding MPI_Wait calls.
Thus PPPM_RK employs various MPI_Comm clones of MPI_Comm block to avoid collision of MPI_I* collective communications
Corresponding MPI_Request structures are also employed.

PPPM_RK: The main feature is that compute is split into three parts.
protected: virtual void compute_charge_densities(int,int) is called from the R-processes to gather charge densities into the grid it is called from public: virtual void r2k_comm(int &eflag, int &vflag), where the charge densities are also communicated to the K-processes
public: virtual void compute_grid_potentials(int, int); is called from the K-processes to compute the grid potentials from the charge densities by solving the Poisson equations via FFTs. These charge densities are sent to the appropriate R-processes by calling public: virtual void k2r_comm(int eflag, int vflag)
protected: virtual void compute_interpolate_forces(int, int); called from public: virtual void k2r_comm(int eflag, int vflag)

VerletSplitRK: With the use of KSpace of type having rk_flag == 1, the implementation of VerletSplitRK : Verlet is analogous to that VerletSplit : Verlet, but of a more streamlined form. Due to the different task assignments of R versus K processors, certain function calls are no longer necessary.

Correctness was verified on three applications from the examples directory.
For all three tests, kspace_style pppm/rk and run_style verlet/split/rk were used in place of the default settings in the in.* script.
HEAT: Changed kspace_style ewald --> kspace_style pppm/rk; Near-constant preservation of total energy was preserved as seen in baseline
VISCOSITY: Statistical outputs for the first 1000 iterations was very close to baseline.
(After 1000 iterations, deviation is observed with different parallel topologies even among baseline runs, due to the butterfly effect and
the fact that floating point addition is not perfectly associative.)
peptide: Near-constant preservation of total energy observed similar to baseline.
Post Submission Checklist

Further Information, Files, and Links

The developments of this pull request may be described as a preliminary implementation of the enhanced baseline in

Dandurand, B., Vandierendonck, H., de Supinski, B. "Improving parallel scalability for molecular dynamics simulations in the exascale era".
39th IEEE International Parallel & Distributed Processing Symposium. June 3-7, 2025. Milan, Italy. To appear.

With input from LAMMPS developers, I hope to add analogous /rk variants to other KSpace subclasses as warranted in future pulls so as to make the verlet/split/rk option more broadly applicable.

…ck initially required in rk implementation.

…rletSplitRK::setup_minimal.

ndtrung81 · 2025-06-11T13:59:25Z

@bcdandurand this is a promising idea. I will look at the implementation and give it a try.

ndtrung81 · 2025-06-17T16:10:47Z

@bcdandurand Thanks for contributing the work, this is potentially very helpful for large scale simulations with long-range electrostatics on a large number of MPI processes. Have you observed any speedup with this implementation in your test runs?

I can build with CMake and test with a modified version of bench/in.rhodo and can run with the GPU package with pair/only yes. I have a couple of (early) comments:

The implementation logic is clear and concise. Communicating only the grid-based charge density (R->K) and potential (K->R) is an improvement to the existing VerletSplit to siginficantly reduce the amount of data transfer between R and K processes.
The change where the R-K communication is done requires changes to the logic in the run() function compared to VerletSplit. The addition of rk_flag and 3 new virtual functions to the core KSpace class makes the VerletSplitRK::run() looks cleaner than having to check if force->kspace style name matches pppm/rk in the init() function and to cast force->kspace to (PPPM_RK*).
PPPM_RK currently comments out compute(), but there are quite a few classes that invoke force->kspace->compute(). Do you think this function should be enabled? Also, is it possible to support compute_group_group()?
Please add sections for verlet/split/rk and pppm/rk in the corresponding doc pages (run_style.rst and kspace_style.rst).
I think you could remove the commented out sections with #if 0 .. #endif in verlet_split_rk.cpp.

bcdandurand · 2025-06-17T22:51:22Z

@ndtrung81 Thank you very much for reviewing this contribution.
First, for the question about observed speedup, yes, improvement is observed where there would be communication bottleneck between R- and K-processes. At small node counts and for R to K ratio small, there would not be much bottleneck and thus little improvement. Otherwise, for larger node count or large R to K ratio, improvement is noticeable. For example, if I run VISCOSITY using non-default settings

replicate       4 4 4
kspace_style   pppm/rk 1.0e-4
run_style      verlet/split/rk

then the simulation takes about 47 seconds, versus about 77 seconds with the non-default (otherwise baseline) settings

replicate       4 4 4
kspace_style   pppm 1.0e-4
run_style      verlet/split

In both runs, I use 96 R processes and 4 K processes with a sbatch script similar to

#!/bin/bash

#SBATCH --ntasks=100                            # Number of tasks to run (for MPI tasks this is number of cores)
#SBATCH --nodes=1                               # Maximum number of nodes on which to launch tasks
#SBATCH --exclusive
#SBATCH --mem=0

mpirun -n 100 ~/lammpsfork/lammps/build/lmp -partition 96 4 -screen vsrk_r96k4 -log none -in in.cos.1000SPCE

bcdandurand · 2025-06-17T23:21:27Z

@ndtrung81
PPPM::compute: Currently, the PPPM::compute would be invoked from an instance of PPPM_RK. Whether this is ideal default behavior is up for debate. It probably is not sensible, since the same computation would be done redundantly by both the communicator of R processes and the communicator of K processes. I'll reply again after I've thought of what the use of pppm_rk requires outside of the use of verlet/split/rk.

Commented out code: Otherwise, the currently commented out implementation of PPPM_RK::compute might be refactored as a comment, to indicate the contents of a decomposed alternative to PPPM::compute.
Other commented code might likewise be refactored as comments if deemed useful, otherwise delete.

compute_group_group: I'm not as familiar with this (though I'm aware of its existence), I can look into this next.

Please advise as to the preferred way to provide updated changes in addition to pushing to the bcdandurand fork.

doc pages: I'll look into this and enquire as needed.

…yle verlet and kspace_style pppm/rk pass.

akohlmey · 2025-06-25T11:44:59Z

src/KSPACE/pppm_rk.h


  virtual void compute_grid_potentials(int, int) override;
-  //virtual void compute(int, int) override;
+  virtual void compute(int, int) override;


@bcdandurand Please note that member functions should have either virtual or override. We prefer using override except for the base function where virtual is required. We're currently cleaning this up and making things consistent in PR #4634

Understood, will do.

bcdandurand · 2025-06-25T13:46:39Z

I'm looking at incorporating an analogous decomposition for PPPM::compute_group_group. For examples that would use this, I see
examples/PACKAGES/tally/in.pe
examples/PACKAGES/tally/in.force

bcdandurand · 2025-07-02T23:14:59Z

I'm looking at incorporating an analogous decomposition for PPPM::compute_group_group. For examples that would use this, I see examples/PACKAGES/tally/in.pe examples/PACKAGES/tally/in.force

Testing in.pe under the script modifications:

kspace_style    pppm/rk 1.0e-4
replicate       4 4 4
run             500

with sbatch command, e.g.,
mpirun -n 60 ~/lammpsfork/lammps/build/lmp -partition 30 30 -screen verlet_r30k30_pppm_rk -log none -in in.pe

I'll soon have an update with the addition of an implementation of PPPM_RK::compute_group_group and ancillary structures. While it separates the execution between R- and K-processes, and produces the same output for tally/in.pe as with the baseline use of kspace_style pppm 1.0e-4, it does so in a manner that does not appear to be so useful in terms of speedup. If it is to be of any use, there needs to be a modified version of the ComputeGroupGroup class that is analogous to the VerletSplitRK variant of VerletSplit. Developing the variant of ComputeGroupGroup, if it's worth it, should probably be part of a later pull request. I need to do a bit more cleanup with the addition of PPPM_RK::compute_group_group before I push to the bcdandurand forked repo. (I won't be able to do so for a couple days.)

I can then turn to the doc pages after the above is finished.

akohlmey · 2025-07-02T23:39:11Z

@bcdandurand Before working on additional features, you should probably first pay attention to clean integration of your code into the LAMMPS code base. You pull request branch currently has a merge conflict with upstream and fails to compile on all our test platforms.

bcdandurand · 2025-07-10T14:25:31Z

@akohlmey @ndtrung81 et al.
I'm working on adding sections for verlet/split/rk and pppm/rk in the corresponding doc pages (run_style.rst and kspace_style.rst). This is in tandem with deciding what to do about PPPM_RK::compute_group_group. The issue I'm bringing up in this comment is also analogous with an issue with the implementation of PPPM_RK::compute, which is already integrated with this fork.

The issue is due to the assumption in PPPM_RK that the initial MPI communicator is partitioned (see PPPM_RK::setupRKBlock). While it's possible to implement PPPM_RK::compute and PPPM_RK::compute_group_group within the partitioned framework, the invocation of these two functions, within, e.g., Verlet::run and ComputeGroupGroup::kspace_contribution, respectively, which do not assume a partitioning of the MPI communicator, would be detrimental to runtime performance. This is due to the default behavior where identical execution occurs on R-processes and K-processes. If there are 100 processes, of which 96 are R-processes and 4 are K-processes, the speedup will be bottlenecked as if only 4 processes are being used.

Currently, there is only beneficial use of MPI parallelism if kspace_style pppm/rk is paired with run_style verlet/split/rk. We would need an analogous group/group/rk compute style implemented in, say, a ComputeGroupGroupRK class for any benefit from a call to PPPM_RK::compute_group_group to be realized.

My initial thought to prevent users from using an ineffective combination of styles (from a parallel computing perspective) is to have an rk_flag in the Verlet class (or higher up in Integrate) and other relevant classes (like Compute), which would be visible to instances of KSpace. An error can be thrown if there is not consistency between the rk_flag values of these classes when their respective styles are used.

The issue described in this comment is also relevant to updating the manual, because users would need to have consistency in the rk option for run_style, kspace_style, etc.

I hope this is clear, I'll clarify where helpful.

…tion.

ndtrung81 · 2025-08-13T03:20:31Z

@bcdandurand If you don't have time to complete the work on compute group/group/rk at the moment, we can leave it to a later PR. If possible, please wrap up the documentation for this PR and get it ready to the final round of review and to merge.

bcdandurand · 2025-08-13T10:34:30Z

@bcdandurand If you don't have time to complete the work on compute group/group/rk at the moment, we can leave it to a later PR. If possible, please wrap up the documentation for this PR and get it ready to the final round of review and to merge.

@ndtrung81 Understood. The cluster I normally work on is under maintenance, I expect to have access to it again on Friday. I'll then address the documentation as soon as I can.

…the Integrate instance supports rk decomposition XOR the kspace style does.

…r now).

…ce style and run style are not compatible.

…tyle pppm/rk.

sjplimp

I would like to review this PR when it is closer to merging

sjplimp · 2025-12-04T17:31:32Z

Thanks @bcdandurand for submitting these enhancements. And for the thorough explanation in the PR description of what you've done. Also thanks to @ndtrung81 for looking it over carefully and testing it out.

I have just a couple hi-level suggestions.

(1) In the doc pages for run_style and/or kspace_style for the new variant styles could you provide some numbers for sample speed-ups. This is to motivate users to try out the new options. Ideally, the numbers would compare vanilla verlet vs verlet/split (previous) vs verlet/split/rk. Could be explained in a few sentences, or add a simple table with 3 columns of numbers.

(2) If you like, you can add a "citation" in the source code for your pending paper which will trigger a prompt for the user to cite it, whenever the verlet/split/rk option is used.

sjplimp

see my recent comment

bcdandurand · 2025-12-09T09:36:59Z

Understood, I shall aim to address these requested changes this week. -Brian Dandurand

…

On Thu, Dec 4, 2025 at 5:32 PM Steve Plimpton ***@***.***> wrote: ***@***.**** requested changes on this pull request. see my recent comment — Reply to this email directly, view it on GitHub <#4622 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGEPC6REKWRFMPUMLXCMQGT4ABV3VAVCNFSM6AAAAAB7CPMJ4KVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKNBRGIYTSNZQHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Brian C. Dandurand and others added 10 commits May 23, 2025 16:46

Initial passing tests for pppm_rk and verlet_split_rk.

279fb4e

Re-organising computation of energy and virial slightly to not use ha…

2178741

…ck initially required in rk implementation.

de-synchronising communication of flags, box bounds.

0f430ea

Splitting initial computation of force in VerletSplitRK::setup and Ve…

54fa7de

…rletSplitRK::setup_minimal.

Some cleanup of PPPM_RK.

09f855f

More cleanup.

190a99f

Merge branch 'lammps:develop' into develop

ad4a31f

Adding contribution information.

15a0c08

Merge branch 'lammps:develop' into develop

cd30db2

Cleanup toward meeting LAMMPS programming style.

6bb2f69

bcdandurand requested a review from sjplimp as a code owner June 11, 2025 12:56

akohlmey assigned ndtrung81 Jun 11, 2025

akohlmey added the replica_package label Jun 11, 2025

akohlmey added this to LAMMPS Pull Requests Jun 11, 2025

akohlmey moved this to In Progress in LAMMPS Pull Requests Jun 11, 2025

akohlmey requested review from ndtrung81 and stanmoore1 June 11, 2025 14:45

Brian C. Dandurand added 2 commits June 20, 2025 16:50

Some cleanup, adding calls to timer->stamp.

5b23789

Adding implementation of PPPM_RK::compute, initial tests using run_st…

132339d

…yle verlet and kspace_style pppm/rk pass.

akohlmey reviewed Jun 25, 2025

View reviewed changes

akohlmey moved this from In Progress to After Stable Release in LAMMPS Pull Requests Jun 27, 2025

Brian C. Dandurand added 2 commits July 3, 2025 09:58

Merging upstream updates.

b42856e

Cleanup and formatting.

47b168c

Brian C. Dandurand and others added 4 commits July 3, 2025 10:37

Removing trailing whitespace.

15b5136

add more MPI dummy functions to STUBS

4a60059

silence compiler warnings

4a1e1ff

various programming style, source formatting, and other minor updates

01ec627

akohlmey added the kspace_package label Jul 3, 2025

Initial modification to kspace_style.rst to describe the /rk style op…

07f599b

…tion.

akohlmey moved this from After Stable Release to In Progress in LAMMPS Pull Requests Jul 22, 2025

Brian C. Dandurand and others added 7 commits August 15, 2025 16:12

Merge branch 'develop' of https://github.com/lammps/lammps into develop

2515872

Adding rk_flag to the Integrate class; having error thrown if either …

a668482

…the Integrate instance supports rk decomposition XOR the kspace style does.

Setting PPPM_RK::group_group_enable to value 0 in the constructor (fo…

f2c186b

…r now).

Editing error messages when the use of rk decompositions between kspa…

b71dc0d

…ce style and run style are not compatible.

Updating the documentation for run style verlet/split/rk and kspace s…

290c52a

…tyle pppm/rk.

Removing trailing whitespace from kspace_style.rst.

2e9513f

Merge branch 'develop' into develop

70dd0a2

sjplimp requested changes Sep 4, 2025

View reviewed changes

stanmoore1 assigned sjplimp and unassigned ndtrung81 Sep 9, 2025

Merge branch 'develop' of https://github.com/lammps/lammps into develop

44cb134

sjplimp requested changes Dec 4, 2025

View reviewed changes

Proposed enhancement to VerletSplit #4622

Are you sure you want to change the base?

Proposed enhancement to VerletSplit #4622

Conversation

bcdandurand commented Jun 11, 2025

Uh oh!

ndtrung81 commented Jun 11, 2025

Uh oh!

ndtrung81 commented Jun 17, 2025

Uh oh!

bcdandurand commented Jun 17, 2025

Uh oh!

bcdandurand commented Jun 17, 2025

Uh oh!

akohlmey Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

bcdandurand Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

bcdandurand commented Jun 25, 2025

Uh oh!

bcdandurand commented Jul 2, 2025

Uh oh!

akohlmey commented Jul 2, 2025

Uh oh!

bcdandurand commented Jul 10, 2025

Uh oh!

ndtrung81 commented Aug 13, 2025

Uh oh!

bcdandurand commented Aug 13, 2025

Uh oh!

sjplimp left a comment

Choose a reason for hiding this comment

Uh oh!

sjplimp commented Dec 4, 2025

Uh oh!

sjplimp left a comment

Choose a reason for hiding this comment

Uh oh!

bcdandurand commented Dec 9, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants