pghpf Version 2.0 Release Notes

The Portland Group 9150 SW Pioneer Court, Suite H Wilsonville, Oregon 97070

While every precaution has been taken in the preparation of this document, The Portland Group, Inc. makes no warranty for the use of its products and assumes no responsibility for any errors which may appear, or for damages resulting from the use of the information contained herein. The Portland Group, Inc. retains the right to make changes to this information at any time, without notice. The software described in this document is distributed under license from The Portland Group, Inc and may be used or copied only in accordance with the terms of the license agreement. No part of this document may be reproduced or transmitted in any form or by any means, for any purpose other than the purchaser's personal use without the express written permission of The Portland Group, Inc. Commercial uses are strictly prohibited.

PGI, pghpf, pgf77, pgcc, pgprof, and pgdbg are trademarks of The Portland Group, Inc. Other brands and names are the property of their respective owners.

pghpf Version 2.0 Release Notes Copyright (c) 1995 The Portland Group, Inc. All rights reserved.
Printed in the United States of America

Printing History
October 1995: First Printing
January 1996: Second Printing

Part Number: 2401-990-990-1095

Phone: (503) 682-2806
Fax: (503) 682-2637
e-mail: trs@pgroup.com

pghpf Version 2.0 Release Notes

This document describes important issues relating to pghpf Version 2.0.

1 pghpf 2.0 Features

This section briefly lists the features of pghpf Version 2.0.

Most features of full HPF are included, with the few exceptions noted in these release notes. If you encounter any HPF feature that is not supported, and not listed in Section 6, "Restrictions and Omissions - pghpf 2.0", you should consider it a bug and report it to the e-mail address trs@pgroup.com. Supported HPF language features include:

Other features include:

1.1 Release 2.0 Changes from Previous Release 1.3

Fortran 90 Additions and Changes

Fortran 90 derived-types are supported in pghpf 2.0. The Fortran keywords TYPE and END TYPE are supported.

The Fortran CASE construct is now supported. The keywords SELECT CASE, CASE, CASE DEFAULT and END SELECT are now supported.

Support for MODULEs has been significantly enhanced from that available in the previous version of pghpf. Generic procedures are now supported. For details refer to Section 8, "Modules ".

The random number generator intrinsics RANDOM_NUMBER and RANDOM_SEED have been rewritten to provide, for a given seed, including the default seed, a generated sequence that is independent of the platform and number of processors used. The new, pghpf 2.0 random number intrinsics should be much faster than those provided with pghpf 1.3, and they replace all patch versions sent out subsequent to pghpf 1.3. For details refer to Section 11, "Random Number Generation".

A program not containing a PROGRAM statement will now have a PROGRAM statement added in the generated Fortran 77 code. The name of the program will be unnamed$main in the Fortran 77 code.

The search rules for the Fortran 90 INCLUDE and USE statements have changed. The directories where include files or modules may be found are the following:

1. Each -I directory specified on the command-line.

2. The directory containing the file that contains the INCLUDE/USE statement (the current working directory.)

3. The standard include area.

HPF Additions and Changes

The HPF REALIGN, REDISTRIBUTE, and DYNAMIC directives are now supported. The pghpf runtime now supports dynamic realignment and redistribution.

The HPF directive INDEPENDENT is now supported for DO loops. INDEPENDENT will operate on all DO loops without procedure calls and on loops where the procedure calls can be inlined. The -Mautopar option is no longer needed to parallelize INDEPENDENT DO loops. The pghpf option -Minline is required to inline procedures in INDEPENDENT DO loops. For more details on using the -Minline option and the INDEPENDENT directive, refer to Section 9, "INDEPENDENT DO Loops".

The HPF Library: GRADE_UP(ARRAY,DIM) and GRADE_DOWN(ARRAY,DIM) now support types other than INTEGER. These functions are not operational for CYCLIC distributions. Also, for these routines, the DIM argument is required. In pghpf 1.3, there were additional restrictions on these routines.

Other Additions and Changes from pghpf 1.3 to pghpf 2.0

Support for the MPI communications library has been added to the compiler. Refer to Section 10, "MPI Runtime" for more details on the MPI runtime library.

A new include file is available, named lib3f.h. Using the lib3f.h include file, programs can call standard 3F routines available on most platforms. The statement:

INCLUDE "lib3f.h"
is required when using 3F procedures.

Programs that use getarg() or iargc() in pghpf 2.0 require the INCLUDE statement. This was not required to use these routines in pghpf 1.3

The compiler option -Mnoautopar has been added. This option disables automatic parallelization of INDEPENDENT DO loops.

The compiler has been modified to issue a severe error if a reference occurs to an intrinsic function/subprogram and the number of arguments is incorrect or if the types of the arguments are incorrect. Most of the Fortran 90 and HPF intrinsics are checked for correctness.

The version of the pghpf 2.0 FLEXlm license manager software has changed. The new version is FLEXlm v4.1, Copyright 1988-1995, Globetrotter Software, Inc.

For more information on the license manager, contact the following web address:

http://www.globetrotter.com/faq.html

2 Getting started

Once pghpf has been installed, use the following steps to get started using the compiler (this assumes you are using csh or a variant of csh; for other shells the commands may differ). Assume that the compiler has been installed in the directory /usr/pgi on your system, the target is platform (for example, RS6000, SOLARIS, HP, SGI, etc.), and that a valid license.dat file has been placed in /usr/pgi:
% setenv PGI /usr/pgi 
% set path=($PGI/platform/bin $path) 
% setenv LM_LICENSE_FILE $PGI/license.dat
You should now be able to compile and run HPF programs as follows:
% pghpf hello.hpf  
% a.out options -pghpf pghpf_options
If you wish to link and run with a version of the pghpf runtime other than the default for your system, refer to the pghpf User's Guide for more details.

3 Optimization Features

The pghpf 2.0 compiler performs many optimizations. Some of these optimizations are only available using higher levels of optimization (-O1 or -O2 on the compiler command-line). These optimizations include:
1.
Generation of collective shift communication calls in the presence of appropriate indexing patterns and for CSHIFT calls. For example, the compiler inlines CSHIFT calls when the DIM argument is a compile-time constant.

2.
Generation of overlap shift communications in the presence of appropriate indexing patterns, and for CSHIFT calls. This optimization involves generation of overlap shift communication when certain compile-time specifications are met. For example:
		INTEGER, DIMENSION(N,N) :: A,B
!HPF$ DISTRIBUTE (BLOCK,BLOCK) :: A,B
		A = CSHIFT(B, DIM=1, SHIFT=2)
For this example, a temporary will not be created, and data requiring communication will be communicated through an overlap area.

3.
Generation of collective regular communication calls. For example:
		INTEGER, DIMENSION(N,N) :: A,B
!HPF$ DISTRIBUTE (BLOCK,BLOCK) :: A,B
		FORALL(I=1:N,J=1:N) A(I,J) = B(J,I)
This example will generate a call to a runtime routine that handles permutations of axes for communications.

4.
Generation of collective irregular communication calls in the presence of indexed array assignments or FORALL. For example:
	INTEGER, DIMENSION(N,N) :: A,B,C
!HPF$  DISTRIBUTE (BLOCK,BLOCK) :: A,B,C
	FORALL(I=1:N,J=1:N) A(C(1,I),C(2,J)) = B(J,I)
This scatter array access is recognized and a scatter communication call is generated.

5.
Sharing of runtime data descriptors for arrays of like size and shape that are identically aligned to a common template. For example:
	INTEGER, DIMENSION(N,N) :: A,B
!HPF$  DISTRIBUTE (BLOCK,BLOCK) :: A,B
One descriptor is created and shared for both arrays A and B. The compiler automatically aligns A and B to the same template. This alignment occurs even though the programmer did not align A and B with each other or align them to a common template.

6.
The compiler uses INTENT information to eliminate unnecessary copying of arguments at subroutine boundaries. Note, many Fortran 90 compilers currently ignore INTENT statements. If you compile code containing erroneous INTENT statements, your program may fail under pghpf.

7.
Common runtime call elimination across basic blocks. For example:
	INTEGER, DIMENSION(N,N) :: A,B,C,D
!HPF$  DISTRIBUTE (BLOCK,BLOCK) :: A,B,D
!HPF$  DISTRIBUTE (CYCLIC,CYCLIC) :: C
	A(:,:) = C(:,:)
	B(:,:) = C(:,:)
This code will generate a single runtime communication sequence, so any communication involved with the use of C will only happen once.

8.
Sharing of communications schedules including schedules generated for irregular communications. For example with arrays as above and an indirection array V:
	A = B(V)
	C = D(V)
This sequence could generate two gather communication sequences, one for B(V) and one for D(V). However, the compiler creates a single schedule for the communications since they both use the same communication pattern. This reduces the overhead of scheduling computation, especially in nested loops.

9.
Fusing is performed for Fortran 90 array assignments and FORALL statements. For example, with this optimization, the statements for arrays A, B, C and D will be fused in the same generated loop. Without this optimization, several sequences of loops will be generated for these statements:
		A = B
		C = D

10.
Invariant communication calls are hoisted out of loops. For example:
		DO I = 1,N
		   A(I) = B(1) + A(I)*2
		   CALL FOO(A(I)
		END DO
The communication of B(1) is loop invariant and will be hoisted out of the loop.
The following points may be helpful in enabling you to obtain the best possible performance with the 2.0 release:

4 Debugging

Debugging programs developed under pghpf 2.0 can be difficult. No HPF debugger is currently provided. However, if necessary the programmer can debug the generated SPMD Fortran 77 program on each node using multiple X windows. This can be particularly useful in obtaining a traceback on a program that is crashing unexpectedly.

To prepare an HPF program for debugging, use the -Mg compile-time option to pghpf. The generated Fortran 77 output will be saved and the Fortran 77 node compiler will be invoked with the -g compile-time option to provide symbolic information in the image file. If you wish to execute the program on only a single processor, use the -Mrpm1 compiler option when linking the program (this is not available on all platforms). You can then use a standard debugger on the image file.

If you need to execute the program on multiple processors, the following sequence is useful:

Another debugging technique is to use the compiler command-line option -Mprof=lines, and then run the program. If a control-C is used, a traceback will usually result.

Similar functionality is possible using PVM.

5 Profiling

PGI provides a graphical HPF profiling tool, pgprof, which allows function and line level profiling of HPF programs. Refer to the pgprof User's Guide for more information.

Profiling can be also be performed by hand by inserting calls to an appropriate timing function on each platform and including the appropriate timing libraries.

6 Restrictions and Omissions in pghpf 2.0

The following is a list of restrictions that apply to pghpf 2.0. Some of these restrictions are known bugs, others are Fortran 90 features that are not yet implemented, and others are HPF features that are not yet implemented.

6.1 Known Restrictions - Version 2.0

  1. The HPF_LIBRARY routines GRADE_UP and GRADE_DOWN require a DIM argument; these routines do not support cyclic distributions.

  2. A new include file is available, named lib3f.h. Using the lib3f.h include, programs can call standard 3F routines available on most platforms. The statement: INCLUDE "lib3f.h"
  3. is required when using 3F procedures. Programs that use getarg() or iargc() in pghpf 2.0 require the INCLUDE statement. This was not required to use these routines in pghpf 1.3 This is a change from pghpf 1.3, where getarg() and iargc() could be used without a USE statement.

  4. The compiler command-line option -g is a beta feature for debugger developers. Details of this option are available by contacting PGI at sales@pgroup.com. This option creates files named with a .stb extension. For a file filename.hpf, the -g option creates a file named filename.stb in the current directory.

  5. An object of derived type cannot be initialized with a DATA statement; use the f90-style form of initializing an object.

  6. HPF mapping directives cannot be used with an object of derived type or any of its components.
  7. A structure component may not be declared as an allocatable array.

Finally, where a and b are automatic arrays and they have extents n and m that are equal pghpf currently can not conclude that n and m are equal. If the prgogrammer gives them same extent, pghpf may perform more optimizations. For example:
        subroutine foo(a,b)
common /c1/ n,m
integer, dimension(n) :: a
integer, dimension(m) :: b
!hpf$ distribute (block) :: a,b
a(:) = b(:)
end
The assignment a(:) = b(:) says that a and b must be equal sized arrays because of the conformbility rule. If either n or m is used in the declaration for a and b, additional optimizations will be performed, as compared with the code shown above.

Module Restrictions

Refer to Section 8.2 "Limitations on Modules" for a description of MODULE restrictions for pghpf 2.0.

INDEPENDENT DO Restrictions

Refer to Section 9 "INDEPENDENT DO Loops" for a description of the pghpf 2.0 INDEPENDENT directive implementation.

6.2 Omissions - Version 2.0

This section lists Fortran 90 and HPF features that are omitted from pghpf 2.0.

Fortran 90 Language Omissions

HPF Language Omissions

PGHPF-W-3011-Non-replicated mapping for character/struct/union array, 
char_table, ignored (file.F: lineno)

6.3 Extrinsics Changes

Generic Extrinsic Routines - f77_local Changes

In pghpf 2.0, there is a change in the behavior of pghpf_csend and pghpf_crecv. This change may affect existing f77_local message-passing routines. For performance reasons, data transferred by pghpf_csend and pghpf_crecv may not be buffered as in the past, so programs that used to run under pghpf 1.3 may hang with release 2.0. The solution is to change the f77_local routine so that processors "pair off" when exchanging messages, when one processor calls pghpf_csend the partner processor must call pghpf_crecv. A simple way to decide who sends first is to compare the processor numbers, for example:

old:

	call pghpf_csend(partner, x, ...)
	call pghpf_crecv(partner, y, ...)
new:
	me = pghpf_myprocnum()
	if (partner .lt. me) then
	    call pghpf_csend(partner, x, ...)
	    call pghpf_crecv(partner, y, ...)
	else
	    call pghpf_crecv(partner, y, ...)
	    call pghpf_csend(partner, x, ...)
	end
Note that pghpf_csend and pghpf_crecv did not and still do not allow a processor to send a message to itself. The code must handle this case if it can arise in the user's algorithm. For example, the preceding example could be extended as shown here:
	me = pghpf_myprocnum()
	if (partner .eq. me) then
	    y = x
	else if (partner .lt. me) then
	    call pghpf_csend(partner, x, ...)
	    call pghpf_crecv(partner, y, ...)
	else
	    call pghpf_crecv(partner, y, ...)
	    call pghpf_csend(partner, x, ...)
	end

6.4 System Specific Notes

CRAY T3D Runtime

The execution of a T3D program depends on the policies of the host site. In general, programs are executed with:
%a.out mppexec_opt user_opt -pghpf HPF_opt
The mppexec options are described in the mppexec(1) man page. The -npes option is required and specifies the number of processors. The number of processors must be a power of 2.

The only supported HPF options are -stat and -np. The HPF -np option may be specified to reduce the number of processors from the value specified by the -npes option. The use of the -np option is not recommended as the unused processors are not available for other uses.

CRAY T3D Profiling

The profiler, pgprof, is not currently supported on the CRAY T3D. However, CRAY T3D programs can be compiled and run with the -Mprof options. The resulting pgprof.out file can be analyzed on any supported workstation platform.

CRAY T3D Compiler Options

There is a compiler option that is only available on the T3D. The option is
%pghpf -Ojump file1.hpf
The -Ojump switch will pass -Wf,"-ojump" to the T3D Fortran 77 compiler and link a version of the runtime library compiled with "-h jump". See the documentation on "-h jump" for the T3D C compiler for more details.

IBM SP2 Runtime

The MPI implementation used by pghpf is the IBM version (MPI-F 1.41). The execution of a SP2 MPI program depends on the policies of the host site. For example, programs could be executed with:
%mpirun -np numberofprocs a.out user_opt -pghpf HPF_opt
The only supported HPF option is -stat. The HPF -np option is not supported.

For the IBM SP2, the MPL communications library is also available. To use the MPL library, include the option -Mmpl on the compiler command line (this is loaded when linking occurs).

SGI Linking

Due to a known bug in the IRIX 6.0 linker, some HPF programs may fail to link and could produce the following link-time error:
ERROR 104: GOT page/offset relocation out of range: x.o
ERROR 104: GOT page/offset relocation out of range: x.o
where x.o is one of the object files being linked. This problem should not occur with the version of the linker included in IRIX 6.1. No known workaround is available.

Convex Exemplar

The pghpf compiler, running on the Convex Exemplar, requires the permissions on /dev/lan0 to be 0666 for licensing to work correctly. Using these permissions leads to possible security implications for the site.

The Convex Exemplar PVM runtime implementation has a limited buffer size. This may cause programs to fail. The buffer size can be increased. Refer to your system administrator or the bugs section in the PVM Readme.mp file for more information.

When compiling extrinsic routines, the Fortran 77 compiler option +ppu should be used. This option appends underscores at the end of definitions of and references to externally visible symbols. Since the caller appends underscores for extrnisc names, the callee extrinsic needs this option when it is compiled.

Solaris Systems

The installation directory for Sun Solaris systems was /usr/pgi/sparc in version 1.3 and previous versions of pghpf. For version 2.0, the default directory has changed to /usr/pgi/solaris.

Intel Paragon Systems

The pghpf 2.0 release supports cross development from various systems to Intel Paragon systems. To support this cross development environment, several variables need to be set. The environment variable PARAGON_XDEV needs to be set to use the Intel tools. This should be one directory above the Intel-supplied paragon directory. Intel's documentation should provide information on how to do this.

For example:

setenv PARAGON_XDEV /usr/local
The environment variable PGI needs to be set:
setenv PGI /usr/local/paragon/pgi
Then two elements need to be added to the path:
set path=($PARAGON_XDEV/paragon/bin.<arch> \     
          $PGI/pgon/bin.<arch> $path)
Where <arch> is the architecture on which the compilation is performed. Choices for arch include: sgi, solaris, and sun4, among others.

7 Bug Fixes Included in Release 2.0

This section briefly lists the bugs fixed from release 1.3 to 2.0.
    subroutine sub(a)
    implicit none
    common /c/ n
    integer n
    character*8 a(n)
    end

8 Modules

The compiler supports Fortran 90 modules. Modules can be independently compiled and used within programs using the USE statement. Use of Fortran 90 modules causes the compiler to create a filename.mod file in the current directory ( a .mod file). This file contains all the information the compiler needs concerning interface specifications and the data types for the routines defined in the module. When a program, routine, or another module encounters the USE statement, the .mod file is read and "included" in the program, using the scope rules defined in Fortran 90 for USE association. If you are using separate modules, this creates another step in the program development process. When a module is compiled, both a .mod and a .o file are created. The .mod file is used when a USE statement is encountered, and the .o file is used when the program is loaded.

For example, if module1.hpf contains a module with several procedures, and test1.hpf contains a USE statement that uses module1, the compilation would involve the steps.

% pghpf -c module1.hpf
% pghpf -otest1 test1.hpf module1.o
A .mod file is searched for in the following directories:

1. Each -I directory specified on the command-line.

2. The directory containing the file that contains the USE statement
(the current working directory.)

3. The standard include area.

Using the -I command-line option directories can be added to the search path for .mod files.

Note that if you currently have .mod files created with pghpf 1.3 and you are upgrading to pghpf 2.0, you will need recreate the .mod files using pghpf 2.0.

8.1 Modules with Generic Interfaces

The 2.0 version of pghpf now supports modules with generic procedures. For example, the following is a valid module which defines FUNC_ANY taking either a real or an integer argument.

	MODULE A  

	INTERFACE FUNC_ANY 
	    MODULE PROCEDURE FUNC1 
	    MODULE PROCEDURE FUNC2 
	END INTERFACE  

	CONTAINS  
	FUNCTION FUNC1(R) 
	    INTEGER R 
	END FUNCTION          
	FUNCTION FUNC2(R) 
	    REAL R 
	END FUNCTION 
	END MODULE A
 
	PROGRAM B 
	USE A 
	X = FUNC_ANY(1)      ! func1 
	Y = FUNC_ANY(1.0)    ! func2 
	END

8.2 Limitations on Modules

MODULE B
CONTAINS
FUNCTION G
.
.
.
  CALL H
END FUNCTION G
SUBROUTINE H
.
.
.
END SUBROUTINE H
END MODULE B
MODULE C
CONTAINS
SUBROUTINE H
.
.
.
END SUBROUTINE H
FUNCTION G
.
.
.
  CALL H
END FUNCTION G
END MODULE C

9 INDEPENDENT DO Loops

Pghpf includes a partial implementation of the INDEPENDENT directive. This directive is applied to a DO loop and instructs the compiler to generate parallel code. There are two phases of independent loop processing: the inline phase, and the auto parallelization phase. This section describes these two phases and the command line options that allow the user to control the processing of independent loops.

Pghpf parallelizes certain loops without using the INDEPENDENT directive (if the loops operate on distributed data). Such loops include those without the following: external procedure calls, array assignments, FORALL, WHERE, or ALLOCATE or DEALLOCATE statements. The INDEPENDENT directive allows the programmer to provide information to the compiler to broaden the class of parallelizable loops. Use of the INDEPENDENT directive on a loop nest provides assurance, by the programmer, that all procedure calls within the loop nest are parallelizable and each iteration can be performed independently. The compiler inlines procedure calls within INDEPENDENT loops when -Minline is used, so that the resulting statements can be parallelized.

There is no guarantee that loops within the scope of an independent directive will be parallelized. If the loops contain array assignments, for example, the auto-parallelizer phase of pghpf will not process the independent loop. On the other hand, inlining procedure calls is an essential step for the compiler to take in determining how to parallelize independent loops containing procedure calls.

9.1 Directives

Inlining of function and subroutine calls will only take place within loop nests that follow the INDEPENDENT directive. INDEPENDENT in pghpf 2.0 is only applicable for a DO loop. For example:
!HPF$ INDEPENDENT       
       DO I = 1, N
         A(I) = FUN(I) * B(I)
       END DO

9.2 Switches

When inlining is to occur within independent loops, use -Minline on the compilation command line. Inlining requires a preliminary extraction phase which saves compiler information about procedures. You can allow the compiler to create a temporary extraction, thus handling the inlining automatically, or you can create and maintain a directory of "extract" files using -Mextract. The compiler produces a message when it is creating its database of extract procedures. For example the following message indicates that the compiler is extracting the routine scatter_count.
pghpfc_ex: extracting scatter_count

The inline Switch

All forms of the pghpf -Minline switch only inline procedures within DO loop nests that follow an INDEPENDENT directive (the pghpf INDEPENDENT inliner is not a general purpose inliner). The full syntax of the -Minline switch is as follows:
-Minline[=[lib:dir,][levels:n,]{name:fun,}]
For most users, supplying -Minline on the command line will suffice, for example,
%pghpf -Minline filename.hpf
This instructs the compiler to perform inlining within loops with an INDEPENDENT directive and to use a temporary directory for the extract phase. Note that while the extract phase extracts all possible procedures, the inliner will only inline procedures in an INDEPENDENT DO loop.

Including the lib:dir parameter, assumes that an extract phase has been completed, and that the extracted procedures, if any, will be taken from directory dir (created using the -Mextract switch described below).

If the levels:n parameter is specified, inlining is repeated up to n times within any inlined loop so that calls up to n levels deep can be removed (the default for this value is one 1).

If the name:fun parameter is provided, only the function or subroutine fun will be inlined within the INDEPENDENT loop. Multiple name parameters can be provided in order to inline multiple procedures.

The extract Switch

The full syntax of the -Mextract pghpf switch is as follows:
-Mextract[=name1,name2...] -o dir    
This switch instructs the compiler to extract inlineable functions and subroutines using directory dir to store the extract files. Names of procedures to be extracted may be specified as parameters to the -Mextract switch. Note that while the extract phase extracts all possible procedures, the inliner will only inline procedures in an INDEPENDENT DO loop.

The noautopar Switch

By default, the compiler sets parallelization of INDEPENDENT loops on, which means that if possible FORALL statements will be generated for parallelizable loops. It is possible to disable this functionality using the -Mnoautopar compiler command-line switch. For example:
% pghpf -Mnoautopar test1.hpf

Compiler Information Switches

If you are using the pghpf inlining capability and you want to keep track of which functions are inlined, as well as whether parallelization is taking place for the independent loop, use the -Minfo=inline and -Minfo=autopar switches. For example:
% pghpf -Minline -Minfo=inline,autopar test1.hpf
    11, 1 FORALL  generated
    12, Inlining f

9.3 INDEPENDENT Inlining

This section covers further details of the process the compiler uses when creating an extract directory and when inlining a procedure.

Extract Directories

When the -Mextract switch is specified, an extract directory is created for holding extract files. An extract file is an ASCII file holding information created by the HPF compiler about a single procedure. The procedure's name is in the first line of the extract file. The extract directory contains a special table-of-contents file, named TOC. This ASCII file associates procedure names with extract file names.

Inlining Transformations

The inliner first inserts the statements of a called procedure into the calling program unit at the point of the call. If a function has been inlined, the function call is replaced by the variable holding the function's return value. If there are name conflicts between variables local to the inlined procedure and variables within the calling program unit, the inlined procedure's local variables will be renamed.

Assignments of actual arguments to dummy parameters with INTENT IN or INTENT OUT are made at the beginning of the inlined statements. Actual arguments that are identifiers or subscript expressions associated with dummy parameters with INTENT INOUT textually substitute for the dummy parameters where ever they occur within the inlined statements. If necessary, adjustments are made to array subscripts to accommodate array bounds that are different between the calling program unit and the called procedure.

9.4 Examples

Extracting to a temporary extract directory and inlining.
% pghpf -Minline test1.hpf
Creating an extract library.
% pghpf -Mextract -o exlib test1.hpf
Inlining with extract library exlib.
% pghpf -Minline=lib:exlib test1.hpf

9.5 Effects of Inlining

Consider the following program.
	SUBROUTINE FOO(X, Y, Z, A)  
	REAL, INTENT(IN) :: X  
	REAL, INTENT(INOUT) :: Y  
	REAL, INTENT(OUT) :: Z  
	REAL, DIMENSION(*) :: A 
	Y = X + 1.0  
	Z = Y + 2.0  
	A(3) = A(2) + 3.0 
	END

	REAL FUNCTION GOO(X, Y, Z, A) RESULT (RES) 
	REAL, INTENT(IN) :: X 
	REAL, INTENT(INOUT) :: Y 
	REAL, INTENT(OUT) :: Z 
	REAL, DIMENSION(*) :: A

	Y = X + 1.0 
	Z = Y + 2.0 
	A(3) = A(2) + 3.0 
	RES = 5.0 
	END

	PROGRAM TESTINLINE 
	REAL XMAIN, YMAIN, ZMAIN, Y1MAIN, Z1, MAIN,AMAIN(10) !HPF$ SEQUENCE :: AMAIN
!HPF$ INDEPENDENT 
	DO I = 1, 10 
	    CALL FOO(GOO(7.0, Y1MAIN, Z1MAIN,AMAIN(2)),
    &     YMAIN, ZMAIN, AMAIN(3)) 
	ENDDO 
	END
The independent loop will be rewritten to be the following:

	do i = 1, 10 
	    goo$x = 7.0 
	    y1main = goo$x + 1.0 
	    z = y1main + 2.0 
	    amain(4) = amain(3) + 3.0 
	    goo$goo = 5.0 
	    z1main = z 
	    ymain = goo$goo + 1.0 
	    foo$z = ymain + 2.0 
	    amain(5) = amain(4) + 3.0 
	    zmain = foo$z 
	enddo
Variables containing '$' are introduced by the compiler to disambiguate them from variables in the main program.

9.6 INDEPENDENT Parallelization

The compiler parallelizes INDEPENDENT DO loops. A loop can be parallelized when its variables are explicitly distributed using HPF directives, and when dependence analysis shows that array elements do not conflict across the indices of the loop (other dependence analysis is also performed).

Parallelization of parallelizable DO loops is disabled when the -Mnoautopar compiler option is supplied.

The compiler generates FORALL statements and calls to reduction intrinsics in INDEPENDENT parallel DO-loops with distributed arrays.

If the option -Mautopar is included on the command line, the compiler checks the entire program for loops that may be parallelized, and does not limit its search to only INDEPENDENT loops.

9.7 Limitations on INDEPENDENT Inlining and Parallelization

INDEPENDENT DO loops containing array assignments or FORALL statements are not parallelized. Procedures with multiple entries or returns, format lists, or using name list I/O are not extracted.

10 pghpf 2.0 Input/Output

The pghpf 2.0 implementation supports full Fortran 90 I/O semantics. This includes non-advancing I/O, namelist I/O, and I/O of array sections. There are no restrictions on using mapped arrays in list directed, formatted, or unformatted I/O statements. For example:
	INTEGER, DIMENSION(N,N) :: A,B
!HPF$  DISTRIBUTE (BLOCK,BLOCK) :: A,B
	FORALL(I=1:N,J=1:N) A(I,J) = B(J,I)
	PRINT *, A,B
	END
Variables used in namelist groups (NAMELIST) may not be mapped; the compiler issues a warning message if an attempt is made to map a variable in a namelist group:
PGHPF-W-0311-Non-replicated mapping for namelist
             array, name, ignored (test1.hpf:4)
Currently input and output is serialized. One processor reads or writes the data and sends or receives it to or from the other processors owning the data. Performance of I/O on mapped arrays is comparable to the I/O performance of a single-processor Fortran 90 compiler.

11 MPI Runtime Library

MPI consists of a number of transport mechanisms conforming to the Message Passing Interface standard. One version of MPI supported by pghpf is the Public Domain software available from Argonne National Laboratory and Mississippi State University. MPI is a software system that enables a collection of computers or processors of a single computer to be used as a coherent and flexible computational resource. PGI supports MPI version 1.10 (refer to section 12.3, "Retrieving Software and Documentation" for details on obtaining MPI).

The options in this section apply to programs using MPI for communication.

The executable runtime option -pghpf specifies that the following options are to be passed to the communication control portion of the executable program. The -pghpf option allows you to pass user-defined options to your application program and communications control runtime options to the pghpf runtime library.

In general, the command line format for a compiled program is:

% mpirun mpirun_options a.out user_options -pghpf pghpf_options
where:
mpirun
is the command used to execute MPI programs.
mpirun_options
are the options to the mpirun command.
a.out
is the executable program.
user_options
are the program's options.
pghpf_options
are any of the valid options to the pghpf runtime library.
Most systems require the use of the mpirun command or some other command to execute programs. Refer to your system's documentation for details.

Running an MPI program requires that MPI is installed on your system.

Table 11-1 shows the valid MPI executable options (-pghpf options).

Table 11-1 MPI Runtime Library Options and Variables

Option        Environment Variable  Purpose                            
-stat         PGHPF_STAT options    Print runtime statistics upon      
options                             program completion.                

12 Random Number Generation

The random number used in the 2.0 release now generates a 46 bit lagged fibonacci pseudo-random sequence with a short lag of 5 and a long lag of 17. For a given seed, including the default seed, the sequence generated is independent of the platform and number of processors. Due to limitations of some platforms' default integer type, the seed vector is of size 34. Only the least significant 23 bits of each element of the seed array are used, thus a seed array returned or used is portable between platforms. For non-degenerate seed arrays, the period of this generator is (217 - 1) * 245.
If all the odd elements of the seed array are even, the period will be shorter.[*]

The best performance on distributed arrays is for block distributions. The higher the order of the first distributed dimension, the better the performance will be.

13 Compiler Command-line Options

Table 13-1 provides a list of compiler command line options that are valid on many systems. Some systems do not support some of these options. In addition, most node compiler options are available for systems with a node compiler that is not supplied with pghpf. The -Marg pghpf specific options are described in the pghpf User's Guide.

Table 13-1 Compiler Command-line Options

Option               Description                                       
-c                   Stops after assembling (results placed in         
                     filename.o).                                      
-Dname[=val ]        Defines a preprocessor macro name with value      
                     val.                                              
-dryrun              Show but do not execute all commands created by   
                     the driver.                                       
-E                   Displays preprocessed HPF file to the standard    
                     output.                                           
-F                   Saves a preprocessed HPF file in filename.f.      
-help                Display the complete list of driver options.      
-Idirectory          Adds a directory directory to the search path     
                     for #include files.                               
-Ldirectory          Adds a directory directory to the search path     
                     for library files.                                
-llibrary            Loads the library, in addition to the standard    
                     libraries.                                        
-O[level]            Specifies code optimization at the specified      
                     level.                                            
-ofilename           Names the object file filename.                   
-r4                  Interpret DOUBLE PRECISION variables as REAL.     
-r8                  Interpret REAL variables as DOUBLE PRECISION.     
-time                Print execution times for the various compiler    
                     steps.                                            
-Uname               Undefine a preprocessor macro name.               
-V                   Displays the compiler phase version messages.     
-v                   Displays the compiler, assembler and linker       
                     phase invocation.                                 
-W0,arg              Passes arguments arg to the node compiler.        
-Wa,arg              Passes arguments arg to the assembler.            
-Wl,arg              Passes arguments arg to the linker.               
-Wh,arg              Passes arguments arg to the HPF compiler.         
-w                   Do not print warning messages.                    

14 Contacting PGI

The Portland Group, Inc. has the following mail address and telephone number. You can call PGI, or contact us by email as described below.
The Portland Group, Inc                                               
9150 SW Pioneer Ct, Suite H        +1-503-682-2806 (voice)            
Wilsonville, OR  97070             +1-503-682-2637 (FAX)              

14.1 Obtaining Sales Information

To obtain further information on pghpf 2.0, or on other PGI products, please send
e-mail to sales@pgroup.com or contact PGI at the address/number shown above.

The Portland Group, Inc. also maintains a WWW home page with information on PGI and its products; the URL is http://www.pgroup.com.

14.2 Reporting Bugs

To report bugs with the pghpf compiler or runtime, please send e-mail to trs@pgroup.com. If you are reporting a bug, it is best if you include a code sample that demonstrates the bug, a description of the system you are using, as well as the error message and the options used to compile and if a runtime error, or a problem with your program's results, the options used while running the program.

To obtain further assistance on pghpf 2.0, or on other PGI products, you can also use the address/number shown above.

14.3 Retrieving Software and Documentation

For information on the current version of MPI or PVM contact the following:
MPI
PGI currently supports version 1.10 of MPI. For information contact the following:
http://www.mcs.anl.gov/mpi/index.html
PVM
PGI currently supports the latest version of PVM version 3.3. For information on obtaining PVM, contact the following:
http://www.netlib.org/pvm3

14.4 pghpf 2.0 Online Documentation

Online documentation is available for pghpf 2.0 using a WWW web browser such as Mosaic. To access the online documentation, access the file pghpf.index.html. For example, using mosaic the command to bring up the online documents would be:
%xmosaic $PGI/doc/hpf/html/pghpf.index.html