Code compilation has been performed with extensive but safe optimization (typically -O2). We have avoided certain options (e.g., the -OPT:IEEE_ arithmetic=3 option of the Origin 2000) to minimize potential rounding error induced problems. The code makes few calls to intrinsic math functions and performs no standard tasks such as linear equation solving or fast Fourier transformation. We have thus not called upon libraries with optimized functions or packages.
Profiling has been used to very good effect: in an early version of the code profiling revealed that interpolation to populate newly generated grids - in particular to provide boundary values during advance of the solution to a common time on all grid levels - took 58.5% of the processor time. Recoding to perform this interpolation on a grid-by-grid, rather than point-by-point basis reduced the time spent in this task to 6.5% in a fiducial problem, with 77% of the time spent on core calculations pertaining to the calculation of fluxes and updating conserved variables.
perfex data from a run on the Origin 2000 - a machine well-suited to the current project - showed a graduated instructions per cycle value of 0.78, somewhat, but not substantially, below 1.0. Graduated loads (and stores) per issued loads (and stores) were close to 1.0; both L1 and L2 data cache hit rates were above 0.93. In general, perfex indicates that the code suffers no significant inefficiencies.