%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\documentstyle[12pt]{article}
\begin{document}
\setcounter{section}{1}
\title{Simulations of Relativistic Extragalactic Jets}
\author{G. Comer Duncan \and Philip A. Hughes \and Mark A. Miller}
\maketitle
\section{Proposal Abstract}
\section{Proposal Text}
\subsection{Description of Project Problem}
Over the last decade the quantity and quality of images of extragalactic
relativistic jets have increased to a point where it has become evident that
curvature of light-year-scale flows is the norm, not the exception (e.g.,
Wardle et al. 1994). Some of this curvature may be `apparent', resulting
from a more modestly curved flow seen close to the line of sight.
Nevertheless, such flows must posses some intrinsic curvature, and thus their
3-D morphology must be addressed.
This raises both questions and possibilities: how can highly relativistic
flows suffer significant curvature, and yet retain their integrity? It has
been argued (Begelman, Blandford \& Rees 1984) that sublight-year-scale flows
are highly dissipative, and should radiate a significant fraction of their
flow energy if subjected to a perturbation such as bending. How do transverse
shocks propagate along a curved flow? Will the shock plane rotate with
respect to the flow axis; will the shock strengthen or weaken? What role does
flow curvature have to play in explaining phenomena such as stationary knots
between which superluminal components propagate (Shaffer et al. 1987), knots
which brighten after an initial fading (Mutel et al. 1990), and changing
component speed (Lobanov \& Zensus 1996). Numerous lines of evidence point
convincingly to the occurrence of oblique shocks (e.g., Heinz \& Begelman
1997; Lister \& Marscher 1999; Marscher et al. 1997; Polatidis \& Wilkinson
1998); how do they form and evolve? All these issues are amenable to study
through hydrodynamic simulation -- but all demand that such simulations be
3-D.
Furthermore, it has become evident over the last five years that such highly
energetic flows are also found in Galactic objects with stellar mass
`engines'. In the galactic superluminals GRS~1915+105 (Mirabel \& Rodr\'\i
guez 1994, 1995) and GRO~J1655--40 (Hjellming \& Rupen 1995; Tingay et al.
1995) the observed motions indicate jet flow speeds up to 92\%c. There is
compelling evidence from the observation of optical afterglow that gamma-ray
bursts are of cosmological origin, and whether produced by the mergers of
compact objects, or accretion induced collapse (AIC), simple relativistic
fireball models seem ruled out, the data strongly favoring highly
relativistic jets (Dar 1998). Thus a detailed understanding of the dynamics
of collimated relativistic flows has wide application in astrophysics.
This grant application requests resources to perform an initial exploration of
the key issues pertaining to the uniquely 3D dynamics of relativistic
astrophysical flows.
\subsection{Rationale for Machine Use}
Since 1993 we have used machines at OSC and elsewhere in a continuing
project to study the dynamics and radiation properties of relativistic
extragalactic jets. Our first results (Duncan \& Hughes 1994) constituted the
first-published high-resolution axisymmetric simulations of these objects.
That work demonstrated the stability of highly relativistic flows, and
suggested an interpretation of a long-known difference between two classes of
astrophysical flows in terms of flow speed-dependent stability. That work was
performed on a combination of computers, including workstations at BGSU, the
University of Michigan, and the OSC Cray and OVL Onyx machines. In addition,
OSC personnel helped us make movies of the 2D simulations which proved very
useful in both the assessment of the science and the communication of the
work.
That work has subsequently developed along four parallel tracks:
\begin{itemize}
\item A detailed exploration of the morphology and dynamics of these flows
within the parameter space of flow speed, Mach number, and gas
temperature. This work has enabled us to understand the difference in
character between flows with and without relativistic speed. See, e.g.,
Rosen, Hughes, Duncan \& Hardee (1999). The work was done on
workstations at the Universities of Michigan and Alabama, in addition
to OSC machines.
\item A comparison of the results of linear stability analyses with those of
numerical simulation, for both relativistic and nonrelativistic
axisymmetric flows. We can understand the instabilities that are seen
in the numerical simulations if account is taken of the coupling
between available modes and the disturbances that may excite them.
See, e.g., Hardee, Rosen, Hughes \& Duncan (1998). This part of the
work was done on workstations at the Universities of Michigan and
Alabama, in addition to OSC machines.
\item Radiation transfer calculations to predict the appearance of these
flows for comparison with observations with the latest and next
generation telescopes. See, e.g., Mioduszewski, Hughes \& Duncan
(1997). This work was done on workstations at the University of
Michigan and BGSU.
\item The development and first use of a 3D version of our original 2D code,
in recognition of the many effects that demand a fully 3D study. We
have been developing the 3D version on workstations at Washington
University, St. Louis, the University of Michigan, BGSU, and on the OSC
Cray and Origin 2000 over the last two years. Some additional testing
has been done on other machines (Potsdam, NCSA).
\end{itemize}
The major focus of the present proposal for OSC resources is the 3D
relativistic jet simulations using our Adaptive Mesh Refinement code. Over
the last two years we have performed numerous tests on the code and have
endeavored to make it as efficient as the underlying algorithms permit.
Following this extensive development and testing, the code is ready to be
used to perform research level simulations on the OSC Cray T94 and the
Origin 2000. The computational requirements of such simulations are
significant in both memory and wall clock hours, and the 3D simulations
requested herein require facilities of the caliber of those at the OSC,
principally due to the memory requirements. The fast, large memory machines
such as the Cray T94 and the Origin 2000 are ideal for our purpose, because
currently available workstations provide insufficient memory for us to
achieve anything like adequate resolution while, as noted in \S 3.4.2, the
use of massively parallel technology still presents serious challenges to
AMR-type codes.
\subsection{Justification for Amount of Resource Units}
\subsubsection{Resource Units Per Run}
The following is a list of specific projects to be done along with estimates
of the number of RU needed for each part. These estimates are based on
testing done on the OSC Origin 2000 and on the NCSA Origin 2000 cluster.
Specifically, this project plans to
\begin{itemize}
\item Forge a link with earlier, 2D axisymmetric hydrodynamic simulations, by
exploring for a small subset of earlier 2D simulations those modes of
instability that are suppressed in 2D, but which are expected on the
basis of linear stability analysis to dominate the formation of
internal jet structure in 3D; see the next section for further
comments. For these axisymmetric relativistic jets in 3D cartesian
coordinates we request a total of three such runs at 150 RU per run.
The total requested is 450 RU for this part.
\item Study the {\it nonlinear} development of instability for a small subset
of earlier 2D simulations by perturbing the flow with a periodic
driving force designed to excite the most unstable modes suggested by
linear stability analysis. This project will study the nonlinear
development of instability in Cartesian coordinates of some of the
axisymmetric simulations. We request a total of three such runs at 150
RU per run. The total requested is 450 RU for this part.
\item Study the development of internal jet structure, and its subsequent
evolution, due to the interaction of a jet with an ambient density
inhomogeneity -- an idealization of the interaction of the jet with an
interstellar cloud or gradient in the diffuse interstellar medium. The
deflection of initially axisymmetric jets in Cartesian coordinates will
begin to address the important issue of which mechanisms are most
viable for deflection, help us to understand the jet's response to
bending, the possible maintenance of the jet's integrity and the
development of its internal structure. We request a total of three
such runs at 150 RU per run. The total requested is 450 RU for this
part.
\item Study the development of internal jet structure and its subsequent
evolution due to {\bf precession} of the jet inflow -- a process
strongly suggested by the S-symmetric kiloparsec-scale sources, and
likely to occur in situations with multiple galactic nuclei. We hope to
extract information on the kinematics of high brightness structures in
the complex evolving flow. Due to the complexity of the flow induced
by the precession this will be a single high resolution run of the
code. We estimate that 500 RU will be needed for this.
\end{itemize}
The total requested RU for the project is thus: 1850 RU.
\subsubsection{Number of Runs}
The total number of runs from all four parts of the proposed studies
amounts to of the order of ten, with the exact number determined in
part by earlier results: novel initial results might compel us to invest more
time in few high resolution runs, or a broader exploration of parameter
space. Either way, trial runs on the Origin 2000s at both OSC and NCSA
provide us with a dependable estimate of the resources required.
We note the very important point that because of our use of Adaptive Mesh
Refinement, low resolution 3D runs can be made on workstation class machines
with 128Mb of memory. Such runs have insufficient resolution to address the
questions set out above, but provide invaluable insight into which initial
conditions and parts of parameter space are most appropriate choices for
production runs at OSC. We thus expect to need little or no experimentation,
and will make efficient use of OSC facilities.
Runs will require typically 1 -- 2 Gb of memory (which is available through a
queue -- Q95.2g36000 -- set up at our request in October 1998). Figure-N
shows the results of an exploratory run with a precessing jet; this required
0.7 Gb on the Origin 2000, and employs 2 levels of refinement on a base grid
of size $16\times 16\times 29$. Although interesting, and indeed capable of
giving insight into the dynamics of the flow, the resolution is clearly
insufficient. An additional level of refinement alone would lead to
untenable memory requirements, but an additional level in conjunction with
more selective refinement criteria -- restricting refinement to key portions
of the flow -- would increase the memory required by no more than $\times
3$.
\subsection{Performance Report}
\subsubsection{Algorithmic Description}
Our current calculations assume an inviscid and compressible gas, and an
ideal equation of state with constant adiabatic index. We use a Godunov-type
solver which is a relativistic generalization of the method due to Harten,
Lax, \& Van Leer (1983), and Einfeldt (1988), in which the full solution to
the Riemann problem is approximated by two waves separated by a piecewise
constant state. We evolve mass density $R$, the three components of the
momentum density $M_x$, $M_y$ and $M_z$, and the total energy density $E$
relative to the laboratory frame. Earlier, 2D axisymmetric calculations
computed the axial and radial components of momentum density, $M_{\rho}$ and
$M_{z}$, in a cylindrical coordinate system.
Defining the vector
\begin{equation}
U = (R,M_x,M_y,M_z,E)^{T} ,
\end{equation}
and the three flux vectors
\begin{equation}
F^x = (R v^x,M_x v^x + p, M_y v^y, M_z v^z, (E + p)v^x) ^{T } ,
\end{equation}
\begin{equation}
F^y = (R v^y,M_x v^x, M_y v^y + p, M_z v^z, (E + p)v^y) ^{T } ,
\end{equation}
\begin{equation}
F^z = (R v^z,M_x v^x, M_y v^y, M_z v^z + p,(E + p)v^z) ^{T } ,
\end{equation}
the conservative form of the relativistic Euler equation is
\begin{equation}
{\partial{ U}\over \partial{t}} + {\partial\over \partial{x} } (F^x) +
{\partial\over \partial y} (F^{y})+ {\partial\over \partial z} (F^{z}) = 0 .
\end{equation}
The pressure is given by the ideal gas equation of state $ p = (\Gamma - 1)
(e - n) . $ The Godunov-type solvers are well known
for their capability as robust, conservative flow solvers with excellent
shock capturing features. In this family of solvers one reduces the problem
of updating the components of the vector $U$, averaged over a cell, to the
computation of fluxes at the cell interfaces. In one spatial dimension the
part of the update due to advection of the vector $U$ may be written as
\begin{equation}
{U^{n+1}}_{i} = {U^{n}}_{i} - \frac{\delta t}{\delta x} (F_{i + \frac{1}{2} } -
F_{i - \frac{1}{2}}) .
\end{equation}
In the scheme originally devised by Godunov (1959), a fundamental emphasis is
placed on the strategy of decomposing the problem into many local Riemann
problems, one for each pair of values of $U_{i}$ and $U_{i+1}$ to yield
values which allow the computation of the local interface fluxes
$F_{i+\frac{1}{2}}$. In general, an initial discontinuity at $i+\frac{1}{2}$
due to $U_{i}$ and $U_{i+1}$ will evolve into four piecewise constant states
separated by three waves. The left-most and right-most waves may be either
shocks or rarefaction waves, while the middle wave is always a contact
discontinuity. The determination of these four piecewise constant states can,
in general, be achieved only by iteratively solving nonlinear equations.
Thus the computation of the fluxes necessitates a step which can be
computationally expensive. For this reason much attention has been given to
approximate, but sufficiently accurate, techniques. One notable method is
that due to Harten, Lax, \& Van Leer (1983; HLL), in which the middle wave,
and the two constant states that it separates, are replaced by a single
piecewise constant state. One benefit of this approximation, which smears
the contact discontinuity somewhat, is to eliminate the iterative step, thus
significantly improving efficiency. However, the HLL method requires
accurate estimates of the wave speeds for the left- and right-moving waves.
Einfeldt (1988) analyzed the HLL method and found good estimates for the wave
speeds. The resulting method combining the original HLL method with
Einfeldt's improvements (the HLLE method), has been taken as a starting
point for our simulations. In our implementation we use wave speed estimates
based on a simple application of the relativistic addition of velocities
formula for the individual components of the velocities, and the relativistic
sound speed $c_s$, assuming that the waves can be decomposed into components
moving perpendicular to the two coordinate directions.
In order to compute the pressure $p$ and sound speed $c_s$ we need the rest
frame mass density $n$ and energy density $e$. However, these quantities are
nonlinearly coupled to the components of the velocity as well as to the
laboratory frame variables via the Lorentz transformation:
\begin{equation}
R = \gamma n ,
\end{equation}
\begin{equation}
M^{x} = \gamma^2 ( e + p ) v^{x} ,
\end{equation}
\begin{equation}
M^{y} = \gamma^2 ( e + p ) v^{y} ,
\end{equation}
\begin{equation}
M^{z} = \gamma^2 ( e + p ) v^{z} ,
\end{equation}
\begin{equation}
E = \gamma^2 ( e + p ) - p ,
\end{equation}
where $\gamma = ( 1 - v^2 )^{-1/2}$ is the Lorentz factor and $v^2 =
(v^{x})^2 + (v^{y})^2 + (v^{z})^2$. When the adiabatic index is constant it
is possible to reduce the computation of $n$, $e$, $v^{x}$, $v^{y}$ and
$v^{z}$ to the solution of the following quartic equation:
\begin{eqnarray}
& & \Bigl\lbrack\Gamma v \left(E-Mv\right)-M \left(1-v^2\right)\Bigr\rbrack^2
\nonumber \\
& & - \left(1-v^2\right)v^2\left(\Gamma-1\right)^2R^2=0 ,
\end{eqnarray}
where $M^2 = (M^{x})^2 + (M^{y})^2 + (M^{z})^2$. This quartic is solved at
each cell several times during the update of a given mesh using
Newton-Raphson iteration.
Our scheme is generally of second order accuracy, which is achieved by taking
the state variables as piecewise linear in each cell, and computing fluxes at
the half-time step. However, in estimating the laboratory frame values on
each cell boundary, it is possible that through discretization, the lab frame
quantities are unphysical -- they correspond to rest frame values $v>1$ or
$p<0$. At each point where a transformation is needed, we check that certain
conditions on $M/E$ and $R/E$ are satisfied, and if not, recompute cell
interface values in the piecewise constant approximation. We find that such
`fall back to first order' rarely occurs.
The relativistic HLLE (RHLLE) method discussed above constitutes the basic
flow integration scheme on a single mesh. Our past and planned work also
utilizes adaptive mesh refinement (AMR) in order to gain spatial and temporal
resolution.
The AMR algorithm used is a general purpose mesh refinement scheme which is
an outgrowth of original work by Berger (1982) and Berger and Colella
(1989). The AMR method uses a hierarchical collection of grids consisting of
embedded meshes to discretize the flow domain. We have used a scheme which
subdivides the domain into logically rectangular meshes with uniform spacing
in the three coordinate directions, and a fixed refinement ratio of $\times
3$. The AMR algorithm orchestrates i) the flagging of cells which need
further refinement, assembling collections of such cells into meshes; ii) the
construction of boundary zones so that a given mesh is a self-contained
entity consisting of the interior cells and the needed boundary information;
iii) mechanisms for sweeping over all levels of refinement and over each mesh
in a given level to update the physical variables on each such mesh; and iv)
the transfer of data between various meshes in the hierarchy, with the
eventual completed update of all variables on all meshes to the same final
time level. The adaption process is dynamic so that the AMR algorithm places
further resolution where and when it is needed, as well as removing
resolution when it is no longer required. Adaption occurs in time, as well
as in space: the time step on a refined grid is less than that on the coarser
grid, by the refinement factor for the spatial dimension. More time steps are
taken on finer grids, and the advance of the flow solution is synchronized by
interleaving the integrations at different levels. This helps prevent any
interlevel mismatches that could adversely affect the accuracy of the
simulation.
In order for the AMR method to sense where further refinement is needed, some
monitoring function is required. We have used a combination of the gradient
of the laboratory frame mass density, a test that recognizes the presence of
contact surfaces, and a measure of the cell-to-cell shear in the flow, the
choice of which functions to use being determined by which part of the flow
is of most significance in a given study. Since the tracking of shock waves
is of paramount importance, a buffer ensures the flagging of extra cells at
the edge of meshes, ensuring that important flow structures do not `leak'
out of meshes during the update of the hydrodynamic solution. The combined
effect of using the RHLLE single mesh solver and the AMR algorithm results in
a very efficient scheme. Where the RHLLE method is unable to give adequate
resolution on a single coarse mesh the AMR algorithm places more cells,
resulting in an excellent overall coverage of the computational domain.
\subsubsection{Vectorization or Parallelization Implementation}
While there exist many implementations of distributed Adaptive Mesh
Refinement in 3D (e.g. DAGH, PYRAMID), the problems associated with
implementing an efficient method remain largely unsolved. Specifically, the
issues of load balancing and data locality present large obstacles in
constructing an efficient AMR implementation in a distributed memory
environment. We have based our AMR implementation on a simple Fortran 90
derived data type. We plan to eventually employ multiple processor
technology at two different levels. First, we plan to utilize recent
advances in High Performance Fortran (HPF) compiler technology in our code.
This will enable us to access distributed memory architectures. Second, we
plan to implement the shared memory parallelism offered by the SGI
Origin 2000.
\subsubsection{Performance Analysis Reports}
Code compilation has been performed with extensive but safe optimization
(typically -O2). We have avoided certain options (e.g., the
-OPT:IEEE$\_$ arithmetic=3 option of the Origin 2000) to minimize potential
rounding error induced problems. The code makes few calls to intrinsic math
functions and performs no standard tasks such as linear equation solving or
fast Fourier transformation. We have thus not called upon libraries with
optimized functions or packages.
Profiling has been used to very good effect: in an early version of the code
profiling revealed that interpolation to populate newly generated grids -- in
particular to provide boundary values during advance of the solution to a
common time on all grid levels -- took 58.5\% of the processor time. Recoding
to perform this interpolation on a grid-by-grid, rather than point-by-point
basis reduced the time spent in this task to 6.5\% in a fiducial problem,
with 77\% of the time spent on core calculations pertaining to the
calculation of fluxes and updating conserved variables.
{\it perfex} data from a run on the Origin 2000 -- a machine well-suited to
the current project -- showed a graduated instructions per cycle value of
0.78, somewhat, but not substantially, below 1.0. Graduated loads (and
stores) per issued loads (and stores) were close to 1.0; both L1 and L2 data
cache hit rates were above 0.93. In general, {\it perfex} indicates that the
code suffers no significant inefficiencies.
\section{Bibliography}
\begin{thebibliography}{XXX}
\bibitem{} Begelman, M. C., Blandford, R. D., \& Rees, M. J. 1984, Rev. Mod. Phys., 56, 255
\bibitem{} Berger, M. 1982, Dissertation, Stanford University
\bibitem{} Berger, M., \& Colella, P. 1989, J. Comp. Phys., 82, 67
\bibitem{} Dar, A. 1998, ApJ (Letters), 500, L93
\bibitem{} Duncan, G. C., \& Hughes, P. A. 1994, ApJ, 436, L119
\bibitem{} Einfeldt, B. 1988, SIAM J. Numerical Analysis, 25, 294
\bibitem{} Godunov, S. K. 1959, Mat. Sb., 47, 271
\bibitem{} Hardee, P. E., Rosen, A., Hughes, P. A., \& Duncan, G. C. 1998, ApJ, 500, 599
\bibitem{} Harten, A., Lax, P., \& Van Leer, B. 1983, SIAM Rev., 25, 35
\bibitem{} Heinz, S., \& Begelman, M.C. 1997, ApJ, 490, 653
\bibitem{} Hjellming, R.M., \& Rupen, M.P. 1995, Nature, 375, 464
\bibitem{} Lister, M.L., \& Marscher, A.P. 1999, ApJ, in press
\bibitem{} Lobanov, A. P., \& Zensus, J. A. 1996, in Energy Transport in Radio Galaxies and Quasars, ed. P. E. Hardee, A. H. Bridle, \& J. A. Zensus (San Francisco: ASP ), 109
\bibitem{} Marscher, A.P. et al. 1997, BAAS, 190, 5106
\bibitem{} Mioduszewski, A. J., Hughes, P. A., \& Duncan, G. C. 1997, ApJ, 476, 649
\bibitem{} Mirabel, I.F., \& Rodr\'\i guez, L.F. 1994, Nature, 371, 46
\bibitem{} Mirabel, I.F., \& Rodr\'\i guez, L.F. 1995, Annals of the New York Academy of Sciences, eds. H. B\"ohringer, G.E. Morfil, \& J. Tr\"umper, 759, 21
\bibitem{} Mutel, R. L., Phillips, R. B., Su, B., \& Bucciferro, R. R. 1990, ApJ, 352, 81
\bibitem{} Polatidis, A.G., \& Wilkinson, P.N. 1998, MNRAS, 294, 327
\bibitem{} Rosen, A., Hughes, P. A., Duncan, G. C., \& Hardee, P. E. 1999, ApJ, in press
\bibitem{} Shaffer, D. B., Marscher, A. P., Marcaide, J., \& Romney, J. D. 1987, ApJ, 314, L1
\bibitem{} Tingay, S.J., et al. 1995, Nature, 374, 141
\bibitem{} Wardle, J. F. C., Cawthorne, T. V., Roberts, D. H., \& Brown, L. F. 1994, ApJ, 437, 122
\end{thebibliography}
\subsection{List of Publications and Reprints}
\begin{thebibliography}{XXX}
\bibitem{} Duncan, G. C., \& Hughes, P. A. 1994, ApJ, 436, L119
\bibitem{} Duncan, G. C., Hughes, P. A., \& Opperman, J. 1996, in Energy Transport in Radio Galaxies and Quasars, ed. Hardee, P. E., Bridle, A. H., \& Zensus, J. A., (ASP Conference Series), p. 143
\bibitem{} Hardee, P. E., Rosen, A., Hughes, P. A., \& Duncan, G. C. 1998, ApJ, 500, 599
\bibitem{} Hughes, P. A., Duncan, G. C., \& Mioduszewski, A. J. 1996, in Energy Transport in Radio Galaxies and Quasars, ed. Hardee, P. E., Bridle, A. H., \& Zensus, J. A., (ASP Conference Series), p. 137
\bibitem{} Mioduszewski, A. J., Hughes, P. A., \& Duncan, G. C. 1997, ApJ, 476, 649
\bibitem{} Rosen, A., Hughes, P. A., Duncan, G. C., \& Hardee, P. E. 1999, ApJ, in press
\end{thebibliography}
\section{Resume of Principal Investigator}
\end{document}