SGI Flags Disclosure The following are the compiler switches/options used by SGI for the recent SPEC CPU2000 submissions. Portability Flags: -DUSG Specifies that the operating system is USG compliant. -Dalloca=__builtin_alloca Replace occurances of alloca() with __builtin_alloca. -DMIPS Specifies that this is a MIPS microprocessor. -DHOST_WORDS_BIG_ENDIAN Specifies that this is a big-endian host. -DSGI Compile for an SGI system. -DSPEC_CPU2000_SGI Compile for an SGI system. -DI_FCNTL Tells program to include . -DSYS_IS_USG Specifies that the operating system is USG compliant. -DSYS_HAS_TIME_PROTO Do not explicitly declare time(). -DSYS_HAS_SIGNAL_PROTO Do not explicitly #include -DSYS_HAS_IOCTL_PROTO Do not explicitly declare ioctl(). -DSYS_HAS_ANSI System is ANSI compliant. -DSYS_HAS_CALLOC_PROTO Do not explicitly declare calloc(). -DHAVE_SIGNED_CHAR System supports a "signed char" type. -fixedform Compiler flag, tells f90 compiler to use fixed format (F77 72 column format), instead of F90 free format. Optimization Flags: -bigp_off Disables the use of large pages within your program. This is the default for all optimization levels except -Ofast. -fb_create Used to specify that an instrumented executable program is to be generated. Such an executable is suitable for producing one or more .Counts files for feedback compilation. This is off by default. -fb_opt Used to specify a the path to the instrumented executable program previously generated using -fb_create. Using the path, the compiler will find the .Counts file that should be used to guide feedback compilation. The specified instrumented binary along with the .Counts file it produced will be used to generate a compiler feedback file, which will then be used to direct optimization of the program. This optimization is off by default. -CG:ld_latency= Specifies the assumed latency of load instruction in processor cycles to be used to determine optimal instruction scheduling. The default setting is 5. -INLINE:aggressive=on Tells the compiler to be more aggressive about inlining. The default is -INLINE:aggressive=off. -IPA[:...] IPA option group: control the inter-procedural analyses and transformations performed. Note that giving just the group name without any options, i.e. -IPA, will invoke IPA with the default settings. -IPA is off by default unless -Ofast is specified. -IPA:aggr_cprop[=(on/off)] Enable/disable aggressive interprocedural constant propagation. Attempt to avoid passing constant parameters, replacing the corresponding formal parameters by the constant values. This optimization is off by default. -IPA:callee_limit=(n) Functions whole size exceeds this limit will never be automatically inlined by the compiler. The default is 2000. -IPA:clone=on Allows IPA to clone procedures while inlining. This is off by default. -IPA:common_pad_size=n Specifies the amount by which to pad common block array dimensions. By default, the compiler automatically chooses the amount of padding to improve cache behavior for common block array accesses. -IPA:inline[=(on/off)] Controls whether compiler performs inter-file subprogram inlining during main IPA processing. This defaults to on if -IPA or -Ofast was specified, otherwise it is off. -IPA:linear=on Sets linearization of array references. setting can be ON or OFF. When inlining Fortran subroutines, IPA tries to map formal array parameters to the shape of the actual parameter. It may not always be able to always map it. In the case that it cannot map the parameter, it linearizes the array reference. By default, it will not inline such callsites because they may cause performance problems. The default is OFF. -IPA:maxdepth=n Directs IPA to not attempt to inline functions at a depth of more than n in the callgraph, where functions which make no calls are at depth 0, those which call only depth 0 functions are at depth 1, and so on. Inlining remains subject to overriding limits on code expansion. See also forcedepth, space, and plimit. -IPA:min_hot=(n) When feedback information is available, a call site to a procedure must be invoked with a frequency that exceeds the threshold specified by n before the procedure will be inlined at that call site. -IPA:multi_clone=n Specifies the maximum number of clones that can be created from a single procedure. By default, this value is 0. interprocedural optimization, but it also may significantly increase the code size. -IPA:node_bloat=n When used in conjunction with IPA:multi_clone, this specifies the maximum percentage growth of the total number of procedures relative to the original program. -IPA:pad=off Disables the automatic padding of common block arrays. Default is on when -IPA is specified, otherwise it is off. -IPA:plimit=(n) Inline calls to a procedure until the procedure has grown to size of n. -IPA:small_pu=(n) A procedure with size smaller than n is not subjected to the plimit restriction. -IPA:space=(n) Inline until a program expansion of n% is reached. This defaults to 100. -IPA:use_intrinsic[=(ON|OFF)] Enable/disable loading the intrinsic version of standard library functions. The default is off. -LANG:exceptions=(on/off) Enables or disables exception handling constructs in the language. Generally, code with and without exception handling cannot be mixed. Specifically, the scopes crossed between throwing and catching an exception must all have been compiled with exceptions=ON. Default is ON. -lfastm Causes the executable to be linked using libfastm.so, a faster, lower-precision versions of various routines from libm.so. -lmalloc Causes the executable to be linked using libmalloc.so, which has a high performance version of malloc(). -lscs Causes the executable to be linked using libscs.so. libscs is the SGI/Cray Scientific Library (SCSL) which contains the following high performance routines: BLAS1, BLAS2, BLAS3, LAPACK, and FFTs. -LNO:ap=(0/1/2) Controls automatic parallelization: 0 - no parallelization, 1 - normal parallelization, 2 - parallelize loops regardless of number of trip counts. The default is 1. -LNO:auto_dist=true On Origin systems, use a heuristic to distribute local and global arrays that are accessed in parallel. The heuristic is based on access patterns of the named arrays; access patterns of arrays used as dummy arguments are ignored. The default is off. -LNO:blocking[=(on/off)] Enable/disable the cache blocking transformation. The default is on at -O3 or higher. -LNO:cs2=(n) Specify size of second level cache (e.g. 4m equals 4 megabytes) Default is 4m. -LNO:fission=(n/on/off) Perform loop fission, n: 0 - off, 1 - conservative, 2 - aggressive. The default is 1. -LNO:fusion=(n/on/off) Perform loop fusion, n: 0 - off, 1 - conservative, 2 - aggressive. The default is 1. -LNO:interchange=(on/off) Perform loop interchange. This is on with -O3 or higher is specified, otherwise it is off. -LNO:local_pad_size=n Specifies the amount by which to pad local array dimensions. By default, the compiler automatically chooses the amount of padding to improve cache behavior for local array accesses. -LNO:opt= Controls the LNO optimization level. n can be one of the following: 0 Disables nearly all loop nest optimization. 1 Peforms full loop nest transformations. This is the default. -LNO:ou= Indicates that all outer loops for which unrolling is legal should be unrolled by , where is a positive integer. The compiler unrolls loops by this amount or not at all. -LNO:ou_prod_max=n Indicates that the product of unrolling of the various outer loops in a given loop nest is not to exceed n, where n is a positive integer. The default is 16. -LNO:outer_unroll_max,ou_max=(n) Outer_unroll_max indicates that the compiler may unroll outer loops in a loop nest by as many as n per loop, but no more. The default is 4. -LNO:pf2=(on/off) Enable/disable prefetch for second level cache. The default is on if -O3 or higher is specified, otherwise the default is off. -LNO:prefetch[=(0|1|2)] Specify level of prefetching. 0 = Prefetch disabled. 1 = Prefetch enabled but conservative, the default. 2 = Prefetch enabled and aggressive. -LNO:prefetch_ahead=[n] Prefetch n cache line(s) ahead. The default is 2. -LNO:pwr2[=(on/off)] If enabled, when the leading dimension of an array is a power of two, the compiler makes an extra effort to make the inner loop stride one. The default is on. -mips4 Generate code using the full MIPS IV instruction set which is supported on R10000, R5000 and R8000 systems, and search for mips4 libraries/objects at link-time. -n32 use high performance 32bit mips-ABI -O or -O2 Turn on extensive optimization. The optimizations at this level are generally conservative, in the sense that they (1) are virtually always beneficial, (2) provide improvements commensurate to the compile time spent to achieve them, and (3) avoid changes which affect such things as floating point accuracy. -O3 Turn on aggressive optimization. The optimizations at this level are distinguished from -O2 by their aggressiveness, generally seeking highest-quality generated code even if it requires extensive compile time. They may include optimizations which are generally beneficial but occasionally hurt performance. This sets -LNO:opt=1 -OPT:ro=2 and turns on some additional optimizations. -Ofast[=ipxx] Use optimizations selected to maximize performance for the given SGI target platform IPxx. The optimizations may differ for the various platforms, and will always enable the full instruction set of the target platform (e.g. -mips4 for an R10000). Although the optimizations are generally safe, they may affect floating point accuracy due to rearrangement of computations. This effectively turns on the following optimizations: -O3 -IPA -OPT:ro=3:Olimit=0:div_split=on:alias=typed -TARG:platform= -bigp_on. -Ofast=ip27 This flag is equivalent to the following optimizations: -O3 -IPA -OPT:ro=3:Olimit=0:div_split=on:alias=typed:unroll_times_max=8 -TARG:platform=ip27 -bigp_on. -OPT:alias= Specifies the pointer aliasing model to be used. By specifiying one or more of the following for , the compiler is able to make assumptions throughout the compilation: typed Assume that the code adheres to the ANSI/ISO C standard which states that two pointers of different types cannot point to the same location in memory. This is on by default when -Ofast is specified. restrict Specifies that distinct pointers are assumed to point to distinct, non-overlapping objects. This is off by default. disjoint Specifies that any two pointer expressions are assumed to point to distinct, non-overlapping objects. This is off by default. -OPT:div_split=(true/false) Enable/disable changing x/y into x*(recip(y)). This is on when -Ofast is specified, otherwise it is off by default. -OPT:fast_bit_intrinsics[=(on/off)] Disable/enable the check for the bit count being within range for Fortran bit intrinsics (e.g., BTEST, ISHFT). This is off. -OPT:Olimit=(n) Disable optimization when size of program unit is > n. When n is 0, program unit size is ignored and optimization process will not be disabled due to compile time limit. The default is 0 when -Ofast is specified, otherwise the default is 2000. -OPT:goto=(off/on) Disable/enable the conversion of GOTOs into higher level structures like FOR loops. The default is on for -O2 or higher. -OPT:IEEE_arith=(n) specify level of conformance to IEEE 754 floating pointing roundoff/overflow behavior. At level 3, all mathematically valid transformations are allowed. The default is 1. -OPT:ro=(n) Specify the level of acceptable deviation from source order floating point roundoff and overflow behavior. At level 3, any mathematically valid transformation is enabled. The default is 0. -OPT:unroll_times_max=(n) Unroll inner loops by a maximum of n. The default is 4. -OPT:unroll_size=(n) Sets the ceiling of maximum number of instructions for an unrolled inner loop. If n = 0, the ceiling is disregarded. -OPT:unroll_analysis=[on/off] Enable/disable unrolling of inner loops by analysing resource and processor specifics. The default is on. -pfa Turns on the MIPSpro Auto Parallelizing Option, which enables the compiler to automatically discover parallelism in the source code. This is off by default. -TARG:platform[=ipxx] Identify the target SGI platform for compilation, choosing various internal parameters (such as cache sizes) appropriately. The default is ip25. -TARG:platform=ip27 Turns on the following -TARG:madd=ON:isa=mips4:processor=r10000 -TARG:madd=flag Enable or disable transformations to use multiply/add instructions. Flag can be either ON or OFF. These instructions perform a multiply and an add with a single round- off. They are, therefore, more accurate than the usual discrete operations, and may cause results to not match baselines from other targets. Use this option to determine whether observed differences are due to madds. The default is -TARG:madd=ON for a MIPS IV target; it is ignored for others. -TARG:isa=value Identifies the target instruction set architecture for compilation, such as the set of instructions that are generated. value can be mips3 or mips4. Specify -TARG:isa=mips3 for code that must run on R4000 processors. This option is equivalent to specifying -mips3 or -mips4 (see those options for defaults). -TARG:processor=type Select the processor for which to schedule code. type can be either r4000, r5000, r8000, or r10000. The chosen processor must support the ISA specified (or implied by the ABI). -TENV:X=(0..5) Specify the level of enabled exceptions that will be assumed for purposes of performing speculative code motion (default level 1 at -O0..-O2, 2 at -O3). In general, an instruction will not be speculated (i.e. moved above a branch by the optimizer) unless any exceptions it might cause are disabled by this option. At level 0, no speculative code motion may be performed. At level 1, safe speculative code motion may be performed, with IEEE-754 underflow and inexact exceptions disabled. At level 2, all IEEE-754 exceptions are disabled except divide by zero. At level 3, all IEEE-754 exceptions are disabled including divide by zero. At level 4, memory exceptions may be disabled or ignored. -Wl,-x Passes the -x option to the linker. With this flag set, the linker will not preserve local (non-global) symbols in the output symbol table. The linker enters external and static symbols only. This option conserves space in the output file. This is off by default. The following are descriptions of the system tunable parameters used to enable large pages. systune -i Systune is a tool that enables a user to examine and configure your tunable kernel parameters. -i puts systune in interactive mode. percent_totalmem_4m_pages=n A system tunable parameter that can be set from within systune. It tells IRIX what the maximum percent of memory can be allocated to 1MB pages. nlpages_4m=n A system tunable parameter that can be set from within systune. It tells IRIX to statically allocate n 4MB pages at system boot time. PAGESIZE_DATA This environment variable tells IRIX what size pages to give your applications. Legal values for are 16, 256, 1024, 4096, and 16384 representing the size in kilobytes of the pages to be used. This only works for applications compiled either with -Ofast or with -bigp_on.