Code compilation has been performed with extensive but safe optimization
(typically -O2). We have avoided certain options (e.g., the
-OPT:IEEE_ arithmetic=3 option of the Origin 2000) to minimize potential
rounding error induced problems. The code makes few calls to intrinsic math
functions and performs no standard tasks such as linear equation solving or
fast Fourier transformation. We have thus not called upon libraries with
optimized functions or packages.
Profiling has been used to very good effect: in an early version of the code
profiling revealed that interpolation to populate newly generated grids -- in
particular to provide boundary values during advance of the solution to a
common time on all grid levels -- took 58.5 of the processor time. Recoding
to perform this interpolation on a grid-by-grid, rather than point-by-point
basis reduced the time spent in this task to 6.5 in a fiducial problem,
with 77 of the time spent on core calculations pertaining to the
calculation of fluxes and updating conserved variables.
<#95#>perfex<#95#> data from a run on the Origin 2000 -- a machine well-suited to
the current project -- showed a graduated instructions per cycle value of
0.78, somewhat, but not substantially, below 1.0. Graduated loads (and
stores) per issued loads (and stores) were close to 1.0; both L1 and L2 data
cache hit rates were above 0.93. In general, <#96#>perfex<#96#> indicates that the
code suffers no significant inefficiencies.