SGI Flags Disclosure

The following are the compiler switches/options used by SGI for the 
recent SPEC CPU2000 submissions.

Portability Flags:

-DUSG                         Specifies that the operating system is USG compliant.
-Dalloca=__builtin_alloca     Replace occurances of alloca() with __builtin_alloca.
-DMIPS                        Specifies that this is a MIPS microprocessor.
-DHOST_WORDS_BIG_ENDIAN       Specifies that this is a big-endian host.
-DSGI                         Compile for an SGI system.
-DSPEC_CPU2000_SGI            Compile for an SGI system.
-DI_FCNTL                     Tells program to include <fcntl.h>.
-DSYS_IS_USG                  Specifies that the operating system is USG compliant.
-DSYS_HAS_TIME_PROTO          Do not explicitly declare  time().
-DSYS_HAS_SIGNAL_PROTO        Do not explicitly #include <signal.h>
-DSYS_HAS_IOCTL_PROTO         Do not explicitly declare  ioctl().
-DSYS_HAS_ANSI                System is ANSI compliant.
-DSYS_HAS_CALLOC_PROTO        Do not explicitly declare  calloc().
-DHAVE_SIGNED_CHAR            System supports a "signed char" type.
-fixedform                    Compiler flag, tells f90 compiler to use fixed format
                              (F77 72 column format), instead of F90 free format.

Optimization Flags:

-bigp_off
                Disables the use of large pages within your program.
                This is the default for all optimization levels except
                -Ofast.

-fb_create <full path the the executable program>
                Used to specify that an instrumented executable program
                is to be generated. Such an executable is suitable for
                producing one or more .Counts files for feedback
                compilation.  This is off by default.
 
-fb_opt <full path the the executable program>
                Used to specify a the path to the instrumented executable
                program previously generated using -fb_create.  Using the
                path, the compiler will find the .Counts file that should
                be used to guide feedback compilation. The specified 
                instrumented binary along with the .Counts file it produced
                will be used to generate a compiler feedback file, 
                which will then be used to direct optimization of the
                program.  This optimization is off by default.

-CG:ld_latency=<n>
                Specifies the assumed latency of load instruction in processor 
                cycles to be used to determine optimal instruction scheduling.
                The default setting is 5.

-INLINE:aggressive=on
                Tells the compiler to be more aggressive about inlining.  The
                default is -INLINE:aggressive=off.

-IPA[:...]
                IPA option group:  control the inter-procedural analyses and
                transformations performed.  Note that giving just the group name
                without any options, i.e.  -IPA, will invoke IPA with the default
                settings.  -IPA is off by default unless -Ofast is specified.

-IPA:aggr_cprop[=(on/off)]
                Enable/disable aggressive interprocedural constant
                propagation.  Attempt to avoid passing constant parameters,
                replacing the corresponding formal parameters by the constant
                values.  This optimization is off by default.

-IPA:callee_limit=(n)
                Functions whole size exceeds this limit will never
                be automatically inlined by the compiler.  The default
                is 2000.

-IPA:clone=on
                Allows IPA to clone procedures while inlining.  This is off by
                default.

-IPA:common_pad_size=n
                Specifies the amount by which to pad common block array
                dimensions.  By default, the compiler automatically chooses
                the amount of padding to improve cache behavior for common
                block array accesses.


-IPA:inline[=(on/off)]
                Controls whether compiler performs inter-file subprogram inlining
                during main IPA processing.  This defaults to on if -IPA or -Ofast
                was specified, otherwise it is off.

-IPA:linear=on
                Sets linearization of array references.  setting can be ON
                or OFF.  When inlining Fortran subroutines, IPA tries to map
                formal array parameters to the shape of the actual
                parameter.  It may not always be able to always map it. In
                the case that it cannot map the parameter, it linearizes the
                array reference. By default, it will not inline such
                callsites because they may cause performance problems.  The
                default is OFF.

-IPA:maxdepth=n
                Directs IPA to not attempt to inline functions at a depth of
                more than n in the callgraph, where functions which make no
                calls are at depth 0, those which call only depth 0
                functions are at depth 1, and so on.  Inlining remains
                subject to overriding limits on code expansion.  See also
                forcedepth, space, and plimit.

-IPA:min_hot=(n)
                When feedback information is available, a call site to a 
                procedure must be invoked with a frequency that exceeds
                the threshold specified by  n  before the procedure
                will be inlined at that call site.

-IPA:multi_clone=n
                Specifies the maximum number of clones that can be created
                from a single procedure.  By default, this value is 0.
                interprocedural optimization, but it also may significantly
                increase the code size.

-IPA:node_bloat=n
                When used in conjunction with IPA:multi_clone, this
                specifies the maximum percentage growth of the total number
                of procedures relative to the original program.  

-IPA:pad=off
                Disables the automatic padding of common block arrays.  Default
                is on when -IPA is specified, otherwise it is off.

-IPA:plimit=(n)
                Inline calls to a procedure until the procedure has grown to
                size of  n.

-IPA:small_pu=(n)        
                A procedure with size smaller than  n  is not subjected to the
                plimit restriction.

-IPA:space=(n)  
                Inline until a program expansion of  n%  is reached.  This defaults
                to 100.

-IPA:use_intrinsic[=(ON|OFF)]
                Enable/disable loading the intrinsic version of standard library
                functions.  The default is off.

-LANG:exceptions=(on/off)
                Enables or disables exception handling constructs in the
                language.  Generally, code with and without exception
                handling cannot be mixed.  Specifically, the scopes
                crossed between throwing and catching an exception must
                all have been compiled with exceptions=ON.  Default is ON.

-lfastm
                Causes the executable to be linked using libfastm.so, a faster,
                lower-precision versions of various routines from libm.so.

-lmalloc        Causes the executable to be linked using libmalloc.so, which
                has a high performance version of malloc().

-lscs           Causes the executable to be linked using libscs.so.  libscs
                is the SGI/Cray Scientific Library (SCSL) which contains 
                the following high performance routines: BLAS1, BLAS2, BLAS3,
                LAPACK, and FFTs.

-LNO:ap=(0/1/2)
                Controls automatic parallelization: 0 - no parallelization, 
                1 - normal parallelization, 2 - parallelize loops regardless of 
                number of trip counts.  The default is 1.

-LNO:auto_dist=true
                On Origin systems, use a heuristic to distribute local and
                global arrays that are accessed in parallel.  The heuristic
                is based on access patterns of the named arrays; access
                patterns of arrays used as dummy arguments are ignored. 
                The default is off.

-LNO:blocking[=(on/off)]
                Enable/disable the cache blocking transformation.  The default
                is on at -O3 or higher.

-LNO:cs2=(n)
                Specify size of second level cache (e.g. 4m equals 4 megabytes)
                Default is 4m.

-LNO:fission=(n/on/off)
                Perform loop fission, n: 0 - off, 1 - conservative, 2 - aggressive.
                The default is 1.


-LNO:fusion=(n/on/off)
                Perform loop fusion, n: 0 - off, 1 - conservative, 2 - aggressive.
                The default is 1.

-LNO:interchange=(on/off)
                Perform loop interchange.  This is on with -O3 or higher is specified,
                otherwise it is off.

-LNO:local_pad_size=n
                Specifies the amount by which to pad local array
                dimensions.  By default, the compiler automatically
                chooses the amount of padding to improve cache behavior
                for local array accesses.

-LNO:opt=<n>
                Controls the LNO optimization level.  n can be one of the
                following:

                 0   Disables nearly all loop nest optimization.
                 1   Peforms full loop nest transformations.  This is the
                     default.
-LNO:ou=<n>
                Indicates that all outer loops for which unrolling is legal
                should be unrolled by <n>, where <n> is a positive integer.
                The compiler unrolls loops by this amount or not at all.

-LNO:ou_prod_max=n
                Indicates that the product of unrolling of the various outer
                loops in a given loop nest is not to exceed n, where n is a
                positive integer.  The default is 16.

-LNO:outer_unroll_max,ou_max=(n)
                Outer_unroll_max indicates that the compiler may unroll outer
                loops in a loop nest by as many as  n  per loop, but no more. 
                The default is 4.

-LNO:pf2=(on/off)
                Enable/disable prefetch for second level cache.  The default is
                on if -O3 or higher is specified, otherwise the default is off.

-LNO:prefetch[=(0|1|2)]
                Specify level of prefetching.
                     0 = Prefetch disabled.
                     1 = Prefetch enabled but conservative, the default.
                     2 = Prefetch enabled and aggressive.

-LNO:prefetch_ahead=[n]
                Prefetch  n  cache line(s) ahead.  The default is 2.

-LNO:pwr2[=(on/off)]
                If enabled, when the leading dimension of an array is a power of two,
                the compiler makes an extra effort to make the inner loop stride one.
                The default is on.

-mips4
                Generate code using the full MIPS IV instruction set which is
                supported on R10000, R5000 and R8000 systems, and search for mips4
                libraries/objects at link-time.  

-n32            use high performance 32bit mips-ABI

-O or -O2
                Turn on extensive optimization.  The optimizations at this level are
                generally conservative, in the sense that they (1) are virtually
                always beneficial, (2) provide improvements commensurate to the
                compile time spent to achieve them, and (3) avoid changes which
                affect such things as floating point accuracy.

-O3             
                Turn on aggressive optimization.  The optimizations at this level
                are distinguished from -O2 by their aggressiveness, generally
                seeking highest-quality generated code even if it requires extensive
                compile time.  They may include optimizations which are generally
                beneficial but occasionally hurt performance.  This sets -LNO:opt=1
                -OPT:ro=2 and turns on some additional optimizations.

-Ofast[=ipxx]   
                Use optimizations selected to maximize performance for the
                given SGI target platform IPxx.  The optimizations may differ
                for the various platforms, and will always enable the full
                instruction set of the target platform (e.g. -mips4 for
                an R10000).  Although the optimizations are generally safe,
                they may affect floating point accuracy due to rearrangement
                of computations.  This effectively turns on the following
                optimizations: -O3 -IPA -OPT:ro=3:Olimit=0:div_split=on:alias=typed
                -TARG:platform=<ipxx> -bigp_on.

-Ofast=ip27

	        This flag is equivalent to the following optimizations: -O3 -IPA 
	        -OPT:ro=3:Olimit=0:div_split=on:alias=typed:unroll_times_max=8
	        -TARG:platform=ip27 -bigp_on.

-OPT:alias=<name>
                Specifies the pointer aliasing model to be used.  By
                specifiying one or more of the following for <name>, the
                compiler is able to make assumptions throughout the compilation:
                typed        Assume that the code adheres to the ANSI/ISO C
                             standard which states that two pointers of different
                             types cannot point to the same location in memory.
                             This is on by default when -Ofast is specified.

                restrict     Specifies that distinct pointers are assumed
                             to point to distinct, non-overlapping objects.
                             This is off by default.

                disjoint     Specifies that any two pointer expressions are
                             assumed to point to distinct, non-overlapping objects.
                             This is off by default.

-OPT:div_split=(true/false)
                Enable/disable changing x/y into x*(recip(y)).  This is on when -Ofast
                is specified, otherwise it is off by default.

-OPT:fast_bit_intrinsics[=(on/off)]
                Disable/enable the check for the bit count being within
                range for Fortran bit intrinsics (e.g., BTEST, ISHFT).  This 
                is off.

-OPT:Olimit=(n)
                Disable optimization when size of program unit is > n. When n
                is 0, program unit size is ignored and optimization process
                will not be disabled due to compile time limit.  The default is
                0 when -Ofast is specified, otherwise the default is 2000.

-OPT:goto=(off/on)
                Disable/enable the conversion of GOTOs into higher level
                structures like FOR loops.  The default is on for -O2 or higher.

-OPT:IEEE_arith=(n)
                specify level of conformance to IEEE 754 floating pointing
                roundoff/overflow behavior. At level 3, all mathematically
                valid transformations are allowed.  The default is 1.

-OPT:ro=(n)
                Specify the level of acceptable deviation from source order
                floating point roundoff and overflow behavior.  At level 3,
                any mathematically valid transformation is enabled.  The
                default is 0.


-OPT:unroll_times_max=(n)
                Unroll inner loops by a maximum of  n.  The default is 4.

-OPT:unroll_size=(n)
                Sets the ceiling of maximum number of instructions for an
                unrolled inner loop. If n = 0, the ceiling is disregarded.

-OPT:unroll_analysis=[on/off]
                Enable/disable unrolling of inner loops by analysing resource
                and processor specifics.  The default is on.

-pfa
                Turns on the MIPSpro Auto Parallelizing Option, which 
                enables the compiler to automatically discover parallelism
                in the source code.  This is off by default.

-TARG:platform[=ipxx]
                Identify the target SGI platform for compilation, choosing
                various internal parameters (such as cache sizes)
                appropriately.  The default is ip25.

-TARG:platform=ip27 

                Turns on the following -TARG:madd=ON:isa=mips4:processor=r10000

-TARG:madd=flag 

                Enable or disable transformations to use multiply/add 
                instructions.  Flag can be either ON or OFF.  These 
                instructions perform a multiply and an add with a single round-
                off.  They are, therefore, more accurate than the usual discrete 
                operations, and may cause results to not match baselines from 
                other targets.  Use this option to determine whether observed 
                differences are due to madds. The default is -TARG:madd=ON for 
                a MIPS IV target; it is ignored for others.

-TARG:isa=value 

	        Identifies the target instruction set architecture for 
                compilation, such as the set of instructions that are generated. 
                value can be mips3 or mips4.  Specify -TARG:isa=mips3 for code 
                that must run on R4000 processors.  This option is equivalent 
                to specifying -mips3 or -mips4 (see those options for defaults).

-TARG:processor=type

                Select the processor for which to schedule code.  type can be 
                either r4000, r5000, r8000, or r10000.  The chosen processor 
                must support the ISA specified (or implied by the ABI).

-TENV:X=(0..5)
                Specify the level of enabled exceptions that will be assumed
                for purposes of performing speculative code motion (default
                level 1 at -O0..-O2, 2 at -O3).  In general, an instruction
                will not be speculated (i.e. moved above a branch by the
                optimizer) unless any exceptions it might cause are disabled
                by this option.  At level 0, no speculative code motion may
                be performed.  At level 1, safe speculative code motion may
                be performed, with IEEE-754 underflow and inexact exceptions
                disabled.  At level 2, all IEEE-754 exceptions are disabled
                except divide by zero.  At level 3, all IEEE-754 exceptions
                are disabled including divide by zero.  At level 4, memory
                exceptions may be disabled or ignored.

-Wl,-x
                Passes the -x option to the linker.  With this flag set, the
                linker will not preserve local (non-global) symbols in the output
                symbol table.  The linker enters external and static symbols
                only.  This option conserves space in the output file.  This is
                off by default.

The following are descriptions of the system tunable parameters used
to enable large pages.

systune -i
                Systune is a tool that enables a user to examine and configure
                your tunable kernel parameters.  -i puts systune in interactive
                mode.

percent_totalmem_4m_pages=n
               A system tunable parameter that can be set from within
               systune.  It tells IRIX what the maximum percent of memory
               can be allocated to 1MB pages. 

nlpages_4m=n
               A system tunable parameter that can be set from within
               systune.  It tells IRIX to statically allocate n 4MB pages
               at system boot time.

PAGESIZE_DATA <n>
               This environment variable tells IRIX what size pages to give
               your applications.  Legal values for <n> are 16, 256, 1024, 4096,
               and 16384 representing the size in kilobytes of the pages to be
               used.  This only works for applications compiled either with -Ofast
               or with -bigp_on.