|
|
Next: Conclusion
Up: OOPSE: An Object-Oriented Parallel
Previous: Energy Minimization
Contents
Parallel Simulation Implementation
Although processor power is continually improving, it is still
unreasonable to simulate systems of more than 10,000 atoms on a single
processor. To facilitate study of larger system sizes or smaller
systems for longer time scales, parallel methods were developed to
allow multiple CPU's to share the simulation workload. Three general
categories of parallel decomposition methods have been developed:
these are the atomic,[54] spatial [55] and
force [8] decomposition methods.
Algorithmically simplest of the three methods is atomic decomposition,
where
particles in a simulation are split among
processors for
the duration of the simulation. Computational cost scales as an
optimal
for atomic decomposition. Unfortunately, all
processors must communicate positions and forces with all other
processors at every force evaluation, leading the communication costs
to scale as an unfavorable
, independent of the
number of processors. This communication bottleneck led to the
development of spatial and force decomposition methods, in which
communication among processors scales much more favorably. Spatial or
domain decomposition divides the physical spatial domain into 3D boxes
in which each processor is responsible for calculation of forces and
positions of particles located in its box. Particles are reassigned to
different processors as they move through simulation space. To
calculate forces on a given particle, a processor must simply know the
positions of particles within some cutoff radius located on nearby
processors rather than the positions of particles on all
processors. Both communication between processors and computation
scale as
in the spatial method. However, spatial
decomposition adds algorithmic complexity to the simulation code and
is not very efficient for small
, since the overall communication
scales as the surface to volume ratio
in
three dimensions.
The parallelization method used in OOPSE is the force
decomposition method.[56] Force decomposition assigns
particles to processors based on a block decomposition of the force
matrix. Processors are split into an optimally square grid forming row
and column processor groups. Forces are calculated on particles in a
given row by particles located in that processor's column
assignment. One deviation from the algorithm described by Hendrickson
et al. is the use of column ordering based on the row indexes
preventing the need for a transpose operation necessitating a second
communication step when gathering the final force components. Force
decomposition is less complex to implement than the spatial method but
still scales computationally as
and scales as
in communication cost. Plimpton has also
found that force decompositions scale more favorably than spatial
decompositions for systems up to 10,000 atoms and favorably compete
with spatial methods up to 100,000 atoms.[55]
Next: Conclusion
Up: OOPSE: An Object-Oriented Parallel
Previous: Energy Minimization
Contents
|