Parallelization of the Lattice Boltzmann Method Codes
Dr. Comer Duncan
and
Dr. Haowen Xi
Physics Department
Bowling Green State University

gcd@chandra.bgsu.edu
haowen@bgnet.bgsu.edu

Parallelization of the Lattice Boltzmann Method Codes

The work proposed for the period March 1 through September 30, 1997 involves two projects. We describe our plans for the parallelization of these two codes based on the Lattice Boltzmann Methods to provide a further context for the request. We have created a web page for our projects at the Ohio Supercomputer Center. We have developed a new approach for implementing the threee dimensional Lattice Boltzmann Method LBM which simulates the flow of fluids through porous media. This novel approach, which efficiently encodes the incorporation of the boundary conditions for any complicated geometry into the updates due to collisions and streaming, will be submitted to the Journal of Computational Physics.

Project 1: Parallelization of Single Fluid Lattice Boltzmann Code

The Lattice Boltzmann Method algorithms on a single processor uses a spatial lattice of points with a set of discrete velocity directions for the velocities at each point. The result is a discrete representation of phase space in which there is a close coupling between the spatial lattice and the velocity space lattice. In performing a simulation, a discrete approximation of Boltzmann's equation of kinetic theory is obtained in which the collision integral is approximated by a single relaxation time of the following form:

Collision integral = - (f - f_eq )/tau
where tau is the relaxation time and f_eq is an appropriately chosen function of the macroscopic density and momemtum. The form of f_eq is chosen so that the resulting evolution of the Boltzmann equation solution provides the correct physics at the Navier-Stokes level. This constitutes a central component of the method and has been demonstrated to provide the claimed behavior for a wide variety of problems. The point about the collisions' contribution to the change to the distribution function is that it is a spatially local contribution. Hence given a spatial domain decomposition parallel approach, the collisions will not require data off a given processor.

However, the second contribution due to streaming of information from one spatial point to adjacent spatial points will necessitate the use of off-processor data for the boundary points. The streaming contributions to the change involve data from 18 adjacent points in three dimensions [using Cartesian coordinates]. We thus need data from a stencil of at most one cell away from a given point with contributions to the change at i,j,k. Consequently, when the point i,j,k lies one cell from the boundary between at the edge between two processors' values, it will be necessary to perform message passing between adjacent processors involving a layer of cells which reside on the adjacent processor. The message passing needs of the parallel version of the LBM will thus involve a needed message passing step just prior to when the updates due to streaming are performed.

The approach is the following:

Perform a spatial domain decomposition of the three dimensional volume in which those processors which have the top and bottom of the volume as boundaries will have no-slip boundary conditions applied.
At the front and back of the volume we apply a pressure gradient boundary condition to impose the external flow, while on the other two sides of the volume we impose periodic boundary conditions.
For those processors whose data reside inside the volume and do not have a physical boundary condition to be met, we perform message passing to share the data before the streaming step.

We need to experiment with just which domain decomposition provides the best parallel speedup. There are several good candidates including a decomposition of the iterior into cubes, a decompostion into long, thin regions in the direction parallel to the flow and one in the direction perpendicular to the imposed external flow. We have experience with such domain decompositions of othe fluid dynamics algorithms and have achieved a speedup of 60 out of 64 on the T3D/E with that algorithm. We thus have the needed knowlege on how to implement fast message passing algorithms. We feel that we have a fundamental understanding of the methods and actual experience in writing and optimizing domain decomposition of the LBM code to simulate the Poisuelle flow between parallel plates. Having the exact solution will also provide excellent tests of the accuracy of the algorithms.

Project 2: Parallelization of a Single-Fluid LBM Code with
Inhomogeneously Distributed Solid Bodies:
A Model of Flow Through a Porous Medium

The porous medium problem presents several challenges. We have solutions which should provide a major improvement in efficiency as well as allow us to simulate more realistic approximations to porous media.

One of the biggest challenges comes from the complexity of the boundary conditions. A porous medium has many small pores which are connected together in complex ways. Most conventional fluid dynamics algorithms have great difficulty in accurately handling such complex boundaries and are thus of very limited utility for such problems. The LBM can handle these boundaries with relative ease and thus is very a attractive alternative approach. We have developed a new method of encoding the three dimensional LBM which simulates the flow of a fluid through a model porous medium of complexity limited only by the memory of the processor. The details of our techniques will be submitted to the Journal of Computational Physics. This is a new approach, which efficiently encodes the incorporation of the boundary conditions at the solid material into the updates due to collisions and streaming, allows essentially the same algorithm we devised for the single fluid flow of Project 1 to be used for the parallel version with little or no increase in computational complexity. For the parallel version of the algorithm, the only additional activity over and above that for the first project discussed above is the bookkeeping associated with the message passing of data between adjacent processors which share data. Some of the data will be from points inside solid cells on adjacent processors while other data will be from points which reside on fluid cells on the adjacent processors. The message passing interface will need to be modified to maximize the efficiency so that we do not pass solid-bound data when not necessary.

The basic design will thus be a cartesian cubical decomposition with the message passing performed in much the same way as with the first project. The new aspects will involve methods to optimize the decomposition among processors to mimimize the amount of computational work done passing data associated with the solid regions in comparison to that associated with the fluid. We intend to determine the best way to decompose the medium among the processors to insure the best fit among processors and boundary data. Our present single processor code is very fast and very efficient. A number of strategies are available for optimizing the computation to communication ratio. In addtition to a uniform cubical decomposition we will perform decompositions which attempt to optimize the number of different interior cells by partitioning the domain so that subdomains contain approximately the same number of fluid-bound cells. Such a decomposition may be achieved in a number of ways (perhaps using the METIS graph partitioning software from the University ov Minn.). In our approach, the algorithm will have very good computational efficiency, as we have optimized that aspect on a single processor already. With the message passing design from the first project combined with our porous medium LBM code we will have a very fast and flexible parallel design. Our goal is to have a parallel algorithm which scales linerly with the number of processors. Given our incorporation of the boundary conditions into the fabric of the algorithm itself, we feel that the code will have very good scalability. The scientific goal is to have a simulation tool which will be able to characterize porous media of very large size, thus presenting the community with a tool which will provide a level of porous media simulations which have not been achieved with other methods.

Parallelization of the Lattice Boltzmann Method Codes

Project 1: Parallelization of Single Fluid Lattice Boltzmann Code

Project 2: Parallelization of a Single-Fluid LBM Code with Inhomogeneously Distributed Solid Bodies: A Model of Flow Through a Porous Medium

Project 2: Parallelization of a Single-Fluid LBM Code with
Inhomogeneously Distributed Solid Bodies:
A Model of Flow Through a Porous Medium