next up previous contents
Next: Conclusion Up: OOPSE: An Object-Oriented Parallel Previous: Energy Minimization   Contents


Parallel Simulation Implementation

Although processor power is continually improving, it is still unreasonable to simulate systems of more than 10,000 atoms on a single processor. To facilitate study of larger system sizes or smaller systems for longer time scales, parallel methods were developed to allow multiple CPU's to share the simulation workload. Three general categories of parallel decomposition methods have been developed: these are the atomic,[54] spatial [55] and force [8] decomposition methods.

Algorithmically simplest of the three methods is atomic decomposition, where $ N$ particles in a simulation are split among $ P$ processors for the duration of the simulation. Computational cost scales as an optimal $ \mathcal{O}(N/P)$ for atomic decomposition. Unfortunately, all processors must communicate positions and forces with all other processors at every force evaluation, leading the communication costs to scale as an unfavorable $ \mathcal{O}(N)$ , independent of the number of processors. This communication bottleneck led to the development of spatial and force decomposition methods, in which communication among processors scales much more favorably. Spatial or domain decomposition divides the physical spatial domain into 3D boxes in which each processor is responsible for calculation of forces and positions of particles located in its box. Particles are reassigned to different processors as they move through simulation space. To calculate forces on a given particle, a processor must simply know the positions of particles within some cutoff radius located on nearby processors rather than the positions of particles on all processors. Both communication between processors and computation scale as $ \mathcal{O}(N/P)$ in the spatial method. However, spatial decomposition adds algorithmic complexity to the simulation code and is not very efficient for small $ N$ , since the overall communication scales as the surface to volume ratio $ \mathcal{O}(N/P)^{2/3}$ in three dimensions.

The parallelization method used in OOPSE is the force decomposition method.[56] Force decomposition assigns particles to processors based on a block decomposition of the force matrix. Processors are split into an optimally square grid forming row and column processor groups. Forces are calculated on particles in a given row by particles located in that processor's column assignment. One deviation from the algorithm described by Hendrickson et al. is the use of column ordering based on the row indexes preventing the need for a transpose operation necessitating a second communication step when gathering the final force components. Force decomposition is less complex to implement than the spatial method but still scales computationally as $ \mathcal{O}(N/P)$ and scales as $ \mathcal{O}(N/\sqrt{P})$ in communication cost. Plimpton has also found that force decompositions scale more favorably than spatial decompositions for systems up to 10,000 atoms and favorably compete with spatial methods up to 100,000 atoms.[55]


next up previous contents
Next: Conclusion Up: OOPSE: An Object-Oriented Parallel Previous: Energy Minimization   Contents
Copyright © 2006 - University of Notre Dame

Updated on January 16, 2006