Simulations of Relativistic Extragalactic Jets

G. Comer Duncan - Philip A. Hughes - Mark A. Miller

Proposal Abstract

Jets of goober peas will be shot at relativistic speeds. God only knows what will happen. Give us an infinite amount of resources so we can find out...

Description of Project Problem

Over the last decade the quantity and quality of images of extragalactic relativistic jets have increased to a point where it has become evident that curvature of light-year-scale flows is the norm, not the exception (e.g., Wardle et al. 1994). Some of this curvature may be `apparent', resulting from a more modestly curved flow seen close to the line of sight. Nevertheless, such flows must posses some intrinsic curvature, and thus their 3-D morphology must be addressed.

This raises both questions and possibilities: how can highly relativistic flows suffer significant curvature, and yet retain their integrity? It has been argued (Begelman, Blandford & Rees 1984) that sublight-year-scale flows are highly dissipative, and should radiate a significant fraction of their flow energy if subjected to a perturbation such as bending. How do transverse shocks propagate along a curved flow? Will the shock plane rotate with respect to the flow axis; will the shock strengthen or weaken? What role does flow curvature have to play in explaining phenomena such as stationary knots between which superluminal components propagate (Shaffer et al. 1987), knots which brighten after an initial fading (Mutel et al. 1990), and changing component speed (Lobanov & Zensus 1996). Numerous lines of evidence point convincingly to the occurrence of oblique shocks (e.g., Heinz & Begelman 1997; Lister & Marscher 1999; Marscher et al. 1997; Polatidis & Wilkinson 1998); how do they form and evolve? All these issues are amenable to study through hydrodynamic simulation - but all demand that such simulations be 3-D.

Furthermore, it has become evident over the last five years that such highly energetic flows are also found in Galactic objects with stellar mass `engines'. In the galactic superluminals GRS 1915+105 (Mirabel & Rodríguez 1994, 1995) and GRO J1655-40 (Hjellming & Rupen 1995; Tingay et al. 1995) the observed motions indicate jet flow speeds up to 92%c. There is compelling evidence from the observation of optical afterglow that gamma-ray bursts are of cosmological origin, and whether produced by the mergers of compact objects, or accretion induced collapse (AIC), simple relativistic fireball models seem ruled out, the data strongly favoring highly relativistic jets (Dar 1998). Thus a detailed understanding of the dynamics of collimated relativistic flows has wide application in astrophysics.

This grant application requests resources to perform an initial exploration of the key issues pertaining to the uniquely 3D dynamics of relativistic astrophysical flows.

Rationale for Machine Use

Since 1993 we have used machines at OSC and elsewhere in a continuing project to study the dynamics and radiation properties of relativistic extragalactic jets. Our first results (Duncan & Hughes 1994) constituted the first-published high-resolution axisymmetric simulations of these objects. That work demonstrated the stability of highly relativistic flows, and suggested an interpretation of a long-known difference between two classes of astrophysical flows in terms of flow speed-dependent stability. That work was performed on a combination of computers, including workstations at BGSU, the University of Michigan, and the OSC Cray and OVL Onyx machines. In addition, OSC personnel helped us make movies of the 2D simulations which proved very useful in both the assessment of the science and the communication of the work.

Some representative graphics illustrating the jet dynamics is shown in the following figures.

That work has subsequently developed along four parallel tracks:

A detailed exploration of the morphology and dynamics of these flows within the parameter space of flow speed, Mach number, and gas temperature. This work has enabled us to understand the difference in character between flows with and without relativistic speed. See, e.g., Rosen, Hughes, Duncan & Hardee (1999). The work was done on workstations at the Universities of Michigan and Alabama, in addition to OSC machines.
A comparison of the results of linear stability analyses with those of numerical simulation, for both relativistic and nonrelativistic axisymmetric flows. We can understand the instabilities that are seen in the numerical simulations if account is taken of the coupling between available modes and the disturbances that may excite them. See, e.g., Hardee, Rosen, Hughes & Duncan (1998). This part of the work was done on workstations at the Universities of Michigan and Alabama, in addition to OSC machines.
Radiation transfer calculations to predict the appearance of these flows for comparison with observations with the latest and next generation telescopes. See, e.g., Mioduszewski, Hughes & Duncan (1997). This work was done on workstations at the University of Michigan and BGSU.
The development and first use of a 3D version of our original 2D code, in recognition of the many effects that demand a fully 3D study. We have been developing the 3D version on workstations at Washington University, St. Louis, the University of Michigan, BGSU, and on the OSC Cray and Origin 2000 over the last two years. Some additional testing has been done on other machines (Potsdam, NCSA).

The major focus of the present proposal for OSC resources is the 3D relativistic jet simulations using our Adaptive Mesh Refinement code. Over the last two years we have performed numerous tests on the code and have endeavored to make it as efficient as the underlying algorithms permit. Following this extensive development and testing, the code is ready to be used to perform research level simulations on the OSC Cray T94 and the Origin 2000. The computational requirements of such simulations are significant in both memory and wall clock hours, and the 3D simulations requested herein require facilities of the caliber of those at the OSC, principally due to the memory requirements. The fast, large memory machines such as the Cray T94 and the Origin 2000 are ideal for our purpose, because currently available workstations provide insufficient memory for us to achieve anything like adequate resolution while, as noted in §3.4.2, the use of massively parallel technology still presents serious challenges to AMR-type codes.

Justification for Amount of Resource Units

Resource Units Per Run

The following is a list of specific projects to be done along with estimates of the number of RU needed for each part. These estimates are based on testing done on the OSC Origin 2000 and on the NCSA Origin 2000 cluster.

Specifically, this project plans to

Forge a link with earlier, 2D axisymmetric hydrodynamic simulations, by exploring for a small subset of earlier 2D simulations those modes of instability that are suppressed in 2D, but which are expected on the basis of linear stability analysis to dominate the formation of internal jet structure in 3D; see the next section for further comments. For these axisymmetric relativistic jets in 3D cartesian coordinates we request a total of three such runs at 150 RU per run. The total requested is 450 RU for this part.
Study the nonlinear development of instability for a small subset of earlier 2D simulations by perturbing the flow with a periodic driving force designed to excite the most unstable modes suggested by linear stability analysis. This project will study the nonlinear development of instability in Cartesian coordinates of some of the axisymmetric simulations. We request a total of three such runs at 150 RU per run. The total requested is 450 RU for this part.
Study the development of internal jet structure, and its subsequent evolution, due to the interaction of a jet with an ambient density inhomogeneity - an idealization of the interaction of the jet with an interstellar cloud or gradient in the diffuse interstellar medium. The deflection of initially axisymmetric jets in Cartesian coordinates will begin to address the important issue of which mechanisms are most viable for deflection, help us to understand the jet's response to bending, the possible maintenance of the jet's integrity and the development of its internal structure. We request a total of three such runs at 150 RU per run. The total requested is 450 RU for this part.
Study the development of internal jet structure and its subsequent evolution due to precession of the jet inflow - a process strongly suggested by the S-symmetric kiloparsec-scale sources, and likely to occur in situations with multiple galactic nuclei. We hope to extract information on the kinematics of high brightness structures in the complex evolving flow. We have made some preliminary runs with coarse resolution of the evloving precessing jet, the results of which are illustrated in the last figure Fig. 2 . Due to the complexity of the flow induced by the precession this will be a single high resolution run of the code. We estimate that 500 RU will be needed for this.

The total requested RU for the project is thus: 1850 RU.

Number of Runs

The total number of runs from all four parts of the proposed studies amounts to of the order of ten, with the exact number determined in part by earlier results: novel initial results might compel us to invest more time in few high resolution runs, or a broader exploration of parameter space. Either way, trial runs on the Origin 2000s at both OSC and NCSA provide us with a dependable estimate of the resources required.

We note the very important point that because of our use of Adaptive Mesh Refinement, low resolution 3D runs can be made on workstation class machines with 128Mb of memory. Such runs have insufficient resolution to address the questions set out above, but provide invaluable insight into which initial conditions and parts of parameter space are most appropriate choices for production runs at OSC. We thus expect to need little or no experimentation, and will make efficient use of OSC facilities.

Runs will require typically 1 - 2 Gb of memory (which is available through a queue - Q95.2g36000 - set up at our request in October 1998). Figure-N shows the results of an exploratory run with a precessing jet; this required 0.7 Gb on the Origin 2000, and employs 2 levels of refinement on a base grid of size . Although interesting, and indeed capable of giving insight into the dynamics of the flow, the resolution is clearly insufficient. An additional level of refinement alone would lead to untenable memory requirements, but an additional level in conjunction with more selective refinement criteria - restricting refinement to key portions of the flow - would increase the memory required by no more than .

Performance Report

Algorithmic Description

Our current calculations assume an inviscid and compressible gas, and an ideal equation of state with constant adiabatic index. We use a Godunov-type solver which is a relativistic generalization of the method due to Harten, Lax, & Van Leer (1983), and Einfeldt (1988), in which the full solution to the Riemann problem is approximated by two waves separated by a piecewise constant state. We evolve mass density R, the three components of the momentum density , and , and the total energy density E relative to the laboratory frame. Earlier, 2D axisymmetric calculations computed the axial and radial components of momentum density, and , in a cylindrical coordinate system.

Defining the vector

and the three flux vectors

the conservative form of the relativistic Euler equation is

The pressure is given by the ideal gas equation of state The Godunov-type solvers are well known for their capability as robust, conservative flow solvers with excellent shock capturing features. In this family of solvers one reduces the problem of updating the components of the vector U, averaged over a cell, to the computation of fluxes at the cell interfaces. In one spatial dimension the part of the update due to advection of the vector U may be written as

In the scheme originally devised by Godunov (1959), a fundamental emphasis is placed on the strategy of decomposing the problem into many local Riemann problems, one for each pair of values of and to yield values which allow the computation of the local interface fluxes . In general, an initial discontinuity at due to and will evolve into four piecewise constant states separated by three waves. The left-most and right-most waves may be either shocks or rarefaction waves, while the middle wave is always a contact discontinuity. The determination of these four piecewise constant states can, in general, be achieved only by iteratively solving nonlinear equations. Thus the computation of the fluxes necessitates a step which can be computationally expensive. For this reason much attention has been given to approximate, but sufficiently accurate, techniques. One notable method is that due to Harten, Lax, & Van Leer (1983; HLL), in which the middle wave, and the two constant states that it separates, are replaced by a single piecewise constant state. One benefit of this approximation, which smears the contact discontinuity somewhat, is to eliminate the iterative step, thus significantly improving efficiency. However, the HLL method requires accurate estimates of the wave speeds for the left- and right-moving waves. Einfeldt (1988) analyzed the HLL method and found good estimates for the wave speeds. The resulting method combining the original HLL method with Einfeldt's improvements (the HLLE method), has been taken as a starting point for our simulations. In our implementation we use wave speed estimates based on a simple application of the relativistic addition of velocities formula for the individual components of the velocities, and the relativistic sound speed , assuming that the waves can be decomposed into components moving perpendicular to the two coordinate directions.

In order to compute the pressure p and sound speed we need the rest frame mass density n and energy density e. However, these quantities are nonlinearly coupled to the components of the velocity as well as to the laboratory frame variables via the Lorentz transformation:

where is the Lorentz factor and . When the adiabatic index is constant it is possible to reduce the computation of n, e, , and to the solution of the following quartic equation:

eqnarray88

where . This quartic is solved at each cell several times during the update of a given mesh using Newton-Raphson iteration.

Our scheme is generally of second order accuracy, which is achieved by taking the state variables as piecewise linear in each cell, and computing fluxes at the half-time step. However, in estimating the laboratory frame values on each cell boundary, it is possible that through discretization, the lab frame quantities are unphysical - they correspond to rest frame values v;SPMgt;1 or p;SPMlt;0. At each point where a transformation is needed, we check that certain conditions on M/E and R/E are satisfied, and if not, recompute cell interface values in the piecewise constant approximation. We find that such `fall back to first order' rarely occurs.

The relativistic HLLE (RHLLE) method discussed above constitutes the basic flow integration scheme on a single mesh. Our past and planned work also utilizes adaptive mesh refinement (AMR) in order to gain spatial and temporal resolution.

The AMR algorithm used is a general purpose mesh refinement scheme which is an outgrowth of original work by Berger (1982) and Berger and Colella (1989). The AMR method uses a hierarchical collection of grids consisting of embedded meshes to discretize the flow domain. We have used a scheme which subdivides the domain into logically rectangular meshes with uniform spacing in the three coordinate directions, and a fixed refinement ratio of . The AMR algorithm orchestrates i) the flagging of cells which need further refinement, assembling collections of such cells into meshes; ii) the construction of boundary zones so that a given mesh is a self-contained entity consisting of the interior cells and the needed boundary information; iii) mechanisms for sweeping over all levels of refinement and over each mesh in a given level to update the physical variables on each such mesh; and iv) the transfer of data between various meshes in the hierarchy, with the eventual completed update of all variables on all meshes to the same final time level. The adaption process is dynamic so that the AMR algorithm places further resolution where and when it is needed, as well as removing resolution when it is no longer required. Adaption occurs in time, as well as in space: the time step on a refined grid is less than that on the coarser grid, by the refinement factor for the spatial dimension. More time steps are taken on finer grids, and the advance of the flow solution is synchronized by interleaving the integrations at different levels. This helps prevent any interlevel mismatches that could adversely affect the accuracy of the simulation.

In order for the AMR method to sense where further refinement is needed, some monitoring function is required. We have used a combination of the gradient of the laboratory frame mass density, a test that recognizes the presence of contact surfaces, and a measure of the cell-to-cell shear in the flow, the choice of which functions to use being determined by which part of the flow is of most significance in a given study. Since the tracking of shock waves is of paramount importance, a buffer ensures the flagging of extra cells at the edge of meshes, ensuring that important flow structures do not `leak' out of meshes during the update of the hydrodynamic solution. The combined effect of using the RHLLE single mesh solver and the AMR algorithm results in a very efficient scheme. Where the RHLLE method is unable to give adequate resolution on a single coarse mesh the AMR algorithm places more cells, resulting in an excellent overall coverage of the computational domain.

Algorithmic Description

Defining the vector

and the three flux vectors

the conservative form of the relativistic Euler equation is

where is the Lorentz factor and . When the adiabatic index is constant it is possible to reduce the computation of n, e, , and to the solution of the following quartic equation:

eqnarray88

where . This quartic is solved at each cell several times during the update of a given mesh using Newton-Raphson iteration.

Vectorization or Parallelization Implementation

While there exist many implementations of distributed Adaptive Mesh Refinement in 3D (e.g. DAGH, PYRAMID), the problems associated with implementing an efficient method remain largely unsolved. Specifically, the issues of load balancing and data locality present large obstacles in constructing an efficient AMR implementation in a distributed memory environment. We have based our AMR implementation on a simple Fortran 90 derived data type. We plan to eventually employ multiple processor technology at two different levels. First, we plan to utilize recent advances in High Performance Fortran (HPF) compiler technology in our code. This will enable us to access distributed memory architectures. Second, we plan to implement the shared memory parallelism offered by the SGI Origin 2000.

Performance Analysis Reports

Code compilation has been performed with extensive but safe optimization (typically -O2). We have avoided certain options (e.g., the -OPT:IEEE_ arithmetic=3 option of the Origin 2000) to minimize potential rounding error induced problems. The code makes few calls to intrinsic math functions and performs no standard tasks such as linear equation solving or fast Fourier transformation. We have thus not called upon libraries with optimized functions or packages.

Profiling has been used to very good effect: in an early version of the code profiling revealed that interpolation to populate newly generated grids - in particular to provide boundary values during advance of the solution to a common time on all grid levels - took 58.5% of the processor time. Recoding to perform this interpolation on a grid-by-grid, rather than point-by-point basis reduced the time spent in this task to 6.5% in a fiducial problem, with 77% of the time spent on core calculations pertaining to the calculation of fluxes and updating conserved variables.

perfex data from a run on the Origin 2000 - a machine well-suited to the current project - showed a graduated instructions per cycle value of 0.78, somewhat, but not substantially, below 1.0. Graduated loads (and stores) per issued loads (and stores) were close to 1.0; both L1 and L2 data cache hit rates were above 0.93. In general, perfex indicates that the code suffers no significant inefficiencies.

Bibliography

Begelman, M. C., Blandford, R. D., & Rees, M. J. 1984, Rev. Mod. Phys., 56, 255

Berger, M. 1982, Dissertation, Stanford University

Berger, M., & Colella, P. 1989, J. Comp. Phys., 82, 67

Dar, A. 1998, ApJ (Letters), 500, L93

Duncan, G. C., & Hughes, P. A. 1994, ApJ, 436, L119

Einfeldt, B. 1988, SIAM J. Numerical Analysis, 25, 294

Godunov, S. K. 1959, Mat. Sb., 47, 271

Hardee, P. E., Rosen, A., Hughes, P. A., & Duncan, G. C. 1998, ApJ, 500, 599

Harten, A., Lax, P., & Van Leer, B. 1983, SIAM Rev., 25, 35

Heinz, S., & Begelman, M.C. 1997, ApJ, 490, 653

Hjellming, R.M., & Rupen, M.P. 1995, Nature, 375, 464

Lister, M.L., & Marscher, A.P. 1999, ApJ, in press

Lobanov, A. P., & Zensus, J. A. 1996, in Energy Transport in Radio Galaxies and Quasars, ed. P. E. Hardee, A. H. Bridle, & J. A. Zensus (San Francisco: ASP ), 109

Marscher, A.P. et al. 1997, BAAS, 190, 5106

Mioduszewski, A. J., Hughes, P. A., & Duncan, G. C. 1997, ApJ, 476, 649

Mirabel, I.F., & Rodríguez, L.F. 1994, Nature, 371, 46

Mirabel, I.F., & Rodríguez, L.F. 1995, Annals of the New York Academy of Sciences, eds. H. Böhringer, G.E. Morfil, & J. Trümper, 759, 21

Mutel, R. L., Phillips, R. B., Su, B., & Bucciferro, R. R. 1990, ApJ, 352, 81

Polatidis, A.G., & Wilkinson, P.N. 1998, MNRAS, 294, 327

Rosen, A., Hughes, P. A., Duncan, G. C., & Hardee, P. E. 1999, ApJ, in press

Shaffer, D. B., Marscher, A. P., Marcaide, J., & Romney, J. D. 1987, ApJ, 314, L1

Tingay, S.J., et al. 1995, Nature, 374, 141

Wardle, J. F. C., Cawthorne, T. V., Roberts, D. H., & Brown, L. F. 1994, ApJ, 437, 122

List of Publications and Reprints of Work Done on OSC Machines

Duncan, G. C., & Hughes, P. A. 1994, ApJ, 436, L119

Duncan, G. C., Hughes, P. A., & Opperman, J. 1996, in Energy Transport in Radio Galaxies and Quasars, ed. Hardee, P. E., Bridle, A. H., & Zensus, J. A., (ASP Conference Series), p. 143

Hardee, P. E., Rosen, A., Hughes, P. A., & Duncan, G. C. 1998, ApJ, 500, 599

Hughes, P. A., Duncan, G. C., & Mioduszewski, A. J. 1996, in Energy Transport in Radio Galaxies and Quasars, ed. Hardee, P. E., Bridle, A. H., & Zensus, J. A., (ASP Conference Series), p. 137

Mioduszewski, A. J., Hughes, P. A., & Duncan, G. C. 1997, ApJ, 476, 649

Rosen, A., Hughes, P. A., Duncan, G. C., & Hardee, P. E. 1999, ApJ, in press

G. Comer Duncan
gcd@chandra.bgsu.edu
Monday. April 5, 1999