The work proposed for the period March 1 through September 30, 1997 involves
two projects. We describe our plans for the parallelization of these two codes
based on the Lattice Boltzmann Methods to provide a further context for the
request.
We have created a web page for our projects at the Ohio Supercomputer Center.
We have developed a new approach for implementing the threee dimensional
Lattice Boltzmann Method LBM which simulates the flow of fluids through
porous media. This novel approach, which efficiently encodes the incorporation
of the boundary conditions for any complicated geometry into the updates due to
collisions and streaming, will be submitted to the Journal of Computational
Physics.
The Lattice Boltzmann Method algorithms on a single processor uses a spatial
lattice of points with a set of discrete velocity directions for the velocities
at each point. The result is a discrete representation of phase space in which
there is a close coupling between the spatial lattice and the velocity space
lattice. In performing a simulation, a discrete approximation of Boltzmann's
equation of kinetic theory is obtained in which the collision integral is
approximated by a single relaxation time of the following form:
However, the second contribution due to streaming of information from one spatial point to adjacent spatial points will necessitate the use of off-processor data for the boundary points. The streaming contributions to the change involve data from 18 adjacent points in three dimensions [using Cartesian coordinates]. We thus need data from a stencil of at most one cell away from a given point with contributions to the change at i,j,k. Consequently, when the point i,j,k lies one cell from the boundary between at the edge between two processors' values, it will be necessary to perform message passing between adjacent processors involving a layer of cells which reside on the adjacent processor. The message passing needs of the parallel version of the LBM will thus involve a needed message passing step just prior to when the updates due to streaming are performed.
The approach is the following:
We need to experiment with just which domain decomposition provides the best
parallel speedup. There are several good candidates including a decomposition
of the iterior into cubes, a decompostion into long, thin regions in the
direction parallel to the flow and one in the direction perpendicular to the
imposed external flow. We have experience with such domain decompositions of
othe fluid dynamics algorithms and have achieved a speedup of 60 out of 64 on
the T3D/E with that algorithm. We thus have the needed knowlege on how to
implement fast message passing algorithms. We feel that we have a fundamental
understanding of the methods and actual experience in writing and optimizing
domain decomposition of the LBM code to simulate the Poisuelle flow
between parallel plates. Having the exact solution will also provide excellent
tests of the accuracy of the algorithms.
The porous medium problem presents several challenges. We have solutions which should provide a major improvement in efficiency as well as allow us to simulate more realistic approximations to porous media.
One of the biggest challenges comes from the complexity of the boundary conditions. A porous medium has many small pores which are connected together in complex ways. Most conventional fluid dynamics algorithms have great difficulty in accurately handling such complex boundaries and are thus of very limited utility for such problems. The LBM can handle these boundaries with relative ease and thus is very a attractive alternative approach. We have developed a new method of encoding the three dimensional LBM which simulates the flow of a fluid through a model porous medium of complexity limited only by the memory of the processor. The details of our techniques will be submitted to the Journal of Computational Physics. This is a new approach, which efficiently encodes the incorporation of the boundary conditions at the solid material into the updates due to collisions and streaming, allows essentially the same algorithm we devised for the single fluid flow of Project 1 to be used for the parallel version with little or no increase in computational complexity. For the parallel version of the algorithm, the only additional activity over and above that for the first project discussed above is the bookkeeping associated with the message passing of data between adjacent processors which share data. Some of the data will be from points inside solid cells on adjacent processors while other data will be from points which reside on fluid cells on the adjacent processors. The message passing interface will need to be modified to maximize the efficiency so that we do not pass solid-bound data when not necessary.
The basic design will thus be a cartesian cubical decomposition with the message passing performed in much the same way as with the first project. The new aspects will involve methods to optimize the decomposition among processors to mimimize the amount of computational work done passing data associated with the solid regions in comparison to that associated with the fluid. We intend to determine the best way to decompose the medium among the processors to insure the best fit among processors and boundary data. Our present single processor code is very fast and very efficient. A number of strategies are available for optimizing the computation to communication ratio. In addtition to a uniform cubical decomposition we will perform decompositions which attempt to optimize the number of different interior cells by partitioning the domain so that subdomains contain approximately the same number of fluid-bound cells. Such a decomposition may be achieved in a number of ways (perhaps using the METIS graph partitioning software from the University ov Minn.). In our approach, the algorithm will have very good computational efficiency, as we have optimized that aspect on a single processor already. With the message passing design from the first project combined with our porous medium LBM code we will have a very fast and flexible parallel design. Our goal is to have a parallel algorithm which scales linerly with the number of processors. Given our incorporation of the boundary conditions into the fabric of the algorithm itself, we feel that the code will have very good scalability. The scientific goal is to have a simulation tool which will be able to characterize porous media of very large size, thus presenting the community with a tool which will provide a level of porous media simulations which have not been achieved with other methods.