------------------------------------------------------- Hewlett-Packard Company SPEC CPU2000 FLAG DESCRIPTIONS - Portland Group International (PGI) FORTRAN COMPILERS 5.1-6 - hp-20040621-PGI51-Windows.txt ---------------------------------------------------------------------------- Description of compiler flags for PGI Compiler 5.1 ---------------------------------------------------------------------------- The optimization levels and their meanings are as follows: -O0 A basic block is generated for each Fortran statement. No scheduling is done between statements. No global optimizations are performed. -O1 Scheduling within extended basic blocks is performed. Some register allocation is performed. No global optimizations are performed. -O2 All level 1 optimizations are performed. In addition, scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer. -O3 This level performs all level-one and level-two optimizations and enables more aggressive hoisting and scalar replacement optimizations. -fast Equivalent to "-O2 -Munroll -Mnoframe -Mlre" -fastsse Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" -Mcache_align Align unconstrained objects of length greater than or equal to 16 bytes on cache-line boundaries. An unconstrained object is a data object that is not a member of an aggregate structure or common block. This option does not affect the alignment of allocatable or automatic arrays. Note: To effect cache-line alignment of stack-based local variables, the main program or function must be compiled with -Mcache_align. -Mfixed Process source using Fortran90 freeform specifications. -Mflushz Set SSE MXCSR register to flush-to-zero mode. -Mipa=[option] Enables interprocedural analysis with the specified option. The valid options are: -Mipa=align Instructs the IPA to recognize when pointer targets are all cache-line aligned, allowing better SSE code generation. -Mipa=arg Instructs the IPA to remove arguments replaced by -Mipa=ptr,const -Mipa=const Enable propagation of constants across procedure calls. -Mipa=fast Equivalent to: -Mipa=const,globals,localarg,ptr,vestigial -Mipa=globals Instructs the IPA to optimize references to globals when not used in procedure calls. -Mipa=localarg Externalizes local variables for use with -Mipa=arg -Mipa=ptr Instructs the IPA to perform pointer disambiguation across procedure calls. -Mipa=vestigial Instructs the IPA to eliminate functions that are not called. -mp Enable OpenMP -Mnoframe Eliminate operations that set up a true stack frame pointer for functions. -Mnosmart Don't run the Smart assembly re-write tool to enable post-compilation linear assembly scheduling and optimization -Mscalarsse Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) Extensions) and SSE2 instructions to perform the operations coded. This assumes the user has an assembler capable of interpreting SSE/SSE2 instructions, as in later versions of Linux. This implies -Mflushz. -Munroll Invokes the loop unroller. This also sets the optimization level to 2 if the level is set to less than 2. c:m Instructs the compiler to completely unroll loops with a constant loop count less than or equal to m, a supplied constant. If this value is not supplied, the m count is set to 4. n:u Instructs the compiler to unroll u times, a loop which is not completely unrolled, or has a non-constant loop count. If u is not supplied, the unroller computes the number of times a candidate loop is unrolled. -Mvect=sse Instructs the vectorizer to search for loops, and where possible, use the SSE or SSE2 and prefetch instructions (depending on which processor is targeted). Portability options for CPU2000: ------------------------------- 176.gcc: -Dalloca=_alloca : so as to use the built-in optimized alloca -F10000000 : 176.gcc uses alloca and this options tells the linker to pre-allocate n bytes of stack. The default amount of stack allocated is not enough and 176.gcc crashes with a run-time error 178.galgel: -Mfixed : Fixed-format F90 source code. 186.crafty: -DNT_i386 : Specifies that it is a Windows NT Intel processor-based system which makes the compiler use "long long" as the 64-bit variable that 186.crafty needs. 253.perlbmk: -DSPEC_CPU2000_NTOS : This enables the code changes for porting to Windows get included -DPERLDLL : On Windows, we need a perl.exe instead of a perl.exe and perl.dll. This pre-define ensures that the changes necessary to get a single, UNIX-style executible without getting the indirect calls that can cause a 10% performance degradation. This allows the Windows-based executible to be as close as possible to the Unix-based one. -MT : Use the static multi-threaded library else it will not compile. 254.gap: -DSYS_HAS_CALLOC_PROTO : -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence of malloc and calloc prototypes.