| Sun SPEC CPU2000 Flag DescriptionsSun Studio 11, Sun Studio 10, Sun Studio 9 and Sun Studio 8Solaris 10, Solaris 9 9/04, Solaris 9 4/04, Solaris 9 12/03, Solaris 9 8/03, Solaris 9 4/03
 Last updated: 13-Oct-2005
 Note: This flags file is alphabetized by command 
or switch name, 
without regard to upper/lower case, without regard to the presence
or absence of a leading "-", and without regard to the software
component that uses the command or switch.   The component is mentioned in 
(parentheses) immediately after the name of the command or switch.  
 It is hoped that this order of presentation will make it easier to look
up commands or switches even if the reader does not already know what
software component they belong to.
 
 -Abcopy (optimizer) 
Increase the probability that the compiler will perform 
memcpy/memset transformations.
 -Addint:ignore_parallel (optimizer)
Ignore parallelization factors in loop
interchange heuristics.
 -Addint:sf=<n> (optimizer) 
When considering whether to interchange loops, 
set memory store operation weight to n.
A higher value of n indicates a greater performance
cost for stores.
 -Ainline[:cp=<n>][:cs=<n>][:inc=<n>][:irs=<n>]
[:mi][:recursion=1]
 (optimizer)
 
 
  | Control the optimizer's loop inliner: |  |  | cp=<n> | The minimum call site frequency counter
       in order to consider a routine for inlining. |  |  | cs=<n> | Set inline callee size limit to n.  The unit 
       roughly corresponds to the number of instructions. |  |  | inc=<n> | The inliner is allowed to increase the
       size of the program by up to n%. |  |  | irs=<n> | Allow routines to increase by up to n.  The
       unit roughly corresponds to the number of instructions. |  |  | mi | Perform maximum inlining (without considering code 
       size increase). |  |  | recursion=1 | Allow routines that are called recursively to still be
       eligible for inlining. |  -Aivsub3 (optimizer)
Increase the probability that loop induction variables will replaced,
so that some extraneous code can be eliminated from loops.
 -Aloop_dist:ignore_parallel (optimizer)
Ignore parallelization factors in loop
distribution heuristics.
 -Amemopt:arrayloc (optimizer)
Reconstruct array subscripts during memory allocation merging and
data layout program transformation.
 -Apf:llist=<n>:noinnerllist (optimizer) 
Do speculative prefetching for link-list data structures:
 llist=<n> perform prefetching n 
    iterations ahead
 noinnerllist do not attempt for innermost loops.
 -Apf:pdl=1 (optimizer) 
Do prefetching for one-level indirect memory references.
 -array_pad_rows,<n> (Fortran)
Enable padding of arrays by n.
 -Ashort_ldst (optimizer) 
Convert multiple short memory operations into single long
memory operations.
 -Atile:skewp[:b<n>] (optimizer) 
Perform loop tiling which is enabled by loop skewing.  Loop skewing is a 
transformation that transforms a non-fully interchangeable loop nest
to a fully interchangeable loop nest.  The optional b<n>
sets the tiling block size to n.
 -Aujam:inner=g (optimizer) 
Increase the probability that small-trip-count inner loops will
be fully unrolled.
 autoup=<n> (Unix)
When the file system flush daemon fsflush runs, it
will write to disk all modified file buffers that are more than
n seconds old.
 cc (C compiler)
Invoke the Sun ONE Studio 8 Compiler C
 CC (C++ compiler)
Invoke the Sun ONE Studio 8 Compiler C++
 cpu_bringup_set=<n> (Unix /etc/system)
Specifies which processors will be enabled at boot time. 
<n> represents a bitmap of the 
processors that will be brought online.
 -crit (optimizer) 
Enable optimization of critical control paths
 -dalign (C, C++, Fortran)
Assume data is naturally aligned.
 -Dalloca=__builtin_alloca (Portability: SPEC Tools)
Portability switch, used for 176.gcc:  allow use of compiler's internal 
builtin alloca.
 -depend (Fortran)
Synonym for -xdepend.
 -DHOST_WORDS_BIG_ENDIAN (Portability: SPEC Tools)
Portability switch, used for 176.gcc: controls how bytes are numbered within 
a word.
 disablecomponent (System Management Services)
This command is used prior to booting the system for a 1-cpu test.
The tester uses disablecomponent to add all other CPUs 
to the "blacklist",
which is a list of components that cannot be used at boot time.
 -D__MATHERR_ERRNO_DONTCARE (C)
Allows the compiler to assume that your code does not rely on setting
of the errno variable.
 -DSPEC_CPU2000_SOLARIS (Portability: SPEC Tools)
Portability switch, used for 253.perlbmk: selects header files and
code paths compatible with Solaris.
 -DFMAX_IS_DOUBLE (Portability: SPEC Tools)
Portability switch, used for 252.eon: fixes typedef issue.
 -DSUN (Portability: SPEC Tools)
Portability switch, used for 186.crafty: selects header files and code paths
compatible with solaris.
 -DSYS_HAS_CALLOC_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.
 -DSYS_HAS_IOCTL_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.
 -DSYS_HAS_SIGNAL_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.
 -DSYS_HAS_TIME_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.
 -DSYS_IS_USG (Portability: SPEC Tools)
Portability switch, used for 254.gap: selects code compatible with 
USG-based systems.
 -e (Portability, Fortran)
Portability switch, used for 178.galgel: allows source lines to be 
up to 132 characters long.
 f90 (Fortran compiler)
Invoke the Sun ONE Studio 8 Compiler Fortran 90
 -fast (C)
A convenience option, this switch selects the following switches that
are defined elsewhere in this page:
      -D__MATHERR_ERRNO_DONTCARE 
     -dalign
     -fns 
     -fsimple=2 
     -fsingle 
     -ftrap=%none 
     -xalias_level=basic 
     -xbuiltin=%all 
     -xdepend 
     -xlibmil 
     -xO5 
     -xprefetch=auto,explicit 
     -xtarget=native   -fast (C++)
A convenience option, this switch selects the following switches that
are defined elsewhere in this page:
      -dalign
     -fns
     -fsimple=2 
     -ftrap=%none 
     -xbuiltin=%all 
     -xlibmil 
     -xlibmopt 
     -xO5 
     -xtarget=native   -fast (Fortran)
A convenience option, this switch selects the following switches that
are defined elsewhere in this page:
      -dalign 
     -depend
     -fns
     -fsimple=2 
     -ftrap=common 
     -xlibmil 
     -xlibmopt 
     -xO5 
     -xpad=local 
     -xprefetch=auto,explicit 
     -xtarget=native
     -xvector=yes       -fixed (Portability, Fortran)
Portability switch, used for 178.galgel: assume fixed-format source input.
 -fns (C, C++, Fortran)
Selects faster (but nonstandard) handling of floating point 
arithmetic exceptions and gradual underflow.
 -fsimple=<n> (C, C++, Fortran)
Controls simplifying assumptions for floating point arithmetic:
 
   -fsimple=0 permits no simplifying assumptions. 
      Preserves strict IEEE 754 conformance.
   
   -fsimple=1 allows the optimizer to assume:
   
      The IEEE 754 default rounding/trapping modes do not change
         after process initialization.
      Computations producing no visible result other than potential
         floating-point exceptions may be deleted.
      Computations with Infinity or NaNs as operands need not
         propagate NaNs to their results. For example, x*0 may be replaced
         by 0.
      Computations do not depend on sign of zero.
   -fsimple=2 permits more aggressive floating point 
      optimizations that may cause
      programs to produce different numeric results due to changes in
      rounding. Even with -fsimple=2, the optimizer 
      still is not permitted to introduce a floating point exception 
      in a program that otherwise produces none.
    -fsingle (C)
Evaluate float expressions as single precision.
 -ftrap=common (C, C++, Fortran)
Sets the IEEE 754 trapping mode to common exceptions (invalid, division
by zero, and overflow).
 -ftrap=%none (C, C++, Fortran)
Turns off all IEEE 754 trapping modes.
 LD_LIBRARY_PATH=<directories> (linker)
LD_LIBRARY_PATH controls the search order for both the compile-time
and run-time linkers.  Usually, it can be defaulted; but testers may
sometimes choose to explicitly set it (as documented in the notes in the 
submission), in order to ensure that the correct versions of libraries
are picked up.
 LD_PRELOAD=mpss.so.1 (Unix)
Allow use of the mpss.so.1 shared object, which provides a means
by which preferred stack and/or heap page sizes can be selected.
 -library=iostream (Portability, C++)
Portability switch, used for 252.eon: allow use of the classic iostream 
library.
 -ll2amm (linker)
Include a library containing chip specific memory routines.
 -lm (linker)
Include the math library.
 -lmopt (linker)
Include the optimized math library.  This option usually generates
faster code, but may produce slightly different results.  Usually
these results will differ only in the last bit.
 MPSSHEAP=<n> (Unix)
Specify the preferred page size for heap.  The specified page size is
applied to all created processes.
 MPSSSTACK=<n> (Unix)
Specify the preferred page size for stack.  The specified page size is
applied to all created processes.
 -noex (C++)
Do not allow C++ exceptions.  A throw specification on a function is 
accepted but ignored; the compiler does not generate exception code.
 -O (Fortran)
A synomym for -xO3.
 PARALLEL=<n> (Unix)
Specify the requested number of processors for running programs
that have been compiled with -xautopar.
 priocntl -e -c RT -p 15 -t 20 (Unix)
Requests that the benchmarks be run at high priority, 
specifically in the Real Time scheduling category.  -p 
n indicates the priority, by default a number in the 
range of 0 to 59; -t n indicates the time 
quantum given to a process (if not preempted by a higher 
priority process), in units of milliseconds.
 -Qdepgraph-early_cross_call=1 (code generator)
There are several scheduling passes in the compiler.  This option
allows early passes to move instructions across call instructions.
 -Qeps:do_spec_load=1 (code generator)
Allow the enhanced pipeline scheduler (EPS) to use speculative
(non-faulting) loads.
 -Qeps:enabled=1 (code generator)
Use enhanced pipeline scheduling(EPS) and selective scheduling
algorithms for instruction scheduling.
 -Qeps:rp_filtering_margin=<n> (code generator)
The number of live variables allowed at any given point is n more 
than the number of physical registers. Setting n to a significantly 
large number (e.g., 100) will disable register pressure heuristics 
in EPS.
 -Qeps:ws=<n> (code generator)
Set the EPS window size, that is, the number of instructions it will
consider across all paths when trying to find independent instructions
to schedule a parallel group.  Larger values may result in better 
run time, at the cost of increased compile time.
 -Qgsched-T<n> (code generator)
Sets the aggressiveness of the trace formation, where n 
is 4, 5, or 6.  The higher the value of n, the lower 
the branch probability needed to include a basic block in a trace.
 -Qicache-chbab=1 (code generator)
Turn on optimization to reduce branch after branch penalty: nops
will be inserted to prevent one branch from occupying the delay slot of
another branch.
 -Qipa:valueprediction (code generator)
Use profile feedback data to predict values and attempt to
generate faster code along these control paths, even at the
expense of possibly slower code along paths leading to different
values. Correct code is generated for all paths.
 -Qiselect-funcalign=<n> (code generator)
Do function entry alignment at n-byte boundaries.
 -Qiselect-sw_pf_tbl_th=<n> (code generator)
Peels the most frequent test branches/cases off a switch until
the branch probability reaches less than 1/n. This is effective
only when profile feedback is used
 -Qlp[=<n>][-av=<n>][-pt=weak][-t=<n>][-fa=<n>][-fl=<n>] (code generator)
   
  | Control irregular loop prefetching: |  |  | lp=<n> | Turns the module on (1) or off (0) (default is on for F90; 
       off for C/C++) |  |  | -av=<n> | Sets the prefetch look ahead distance, in bytes.  Default is 256. |  |  | -pt=weak | Use weak prefetches in the general loop prefetch. |  |  | -t=<n> | Sets the number of attempts at prefetching.  If not
       specified, t=2 if -xprefetch_level=3 has been 
       set; otherwise, defaults to t=1. |  |  | -fa=<n> | 1=Force user settings to override internally computed values. |  |  | -fl=<n> | 1=Force the optimization to be turned on for all languages. |  -Qms_pipe+alldoall (code generator)
Specifies that all loops can be pipelined without needing to
be concerned about loop-carried dependencies.
 -Qms_pipe+intdivusefp (code generator)
In pipelined loops, use floating point divide instructions
for signed integer division.
 -Qms_pipe+prefolim=<n> (code generator)
Set number of outstanding prefetches in pipelined loops to <n>
 -Qms_pipe+unoovf (code generator)
Assert (to the pipeliner) that unsigned int computations will not overflow.
 -Qms_pipe-pref_prolog (code generator)
Turn off prefetching in the prolog of modulo scheduled loops.
 -Qms_pipe-prefst (code generator)
Turn off prefetching for stores in the pipeliner.
 -Qms_pipe-prefstrong=0 (code generator)
Turn off the use of strong prefetches in modulo scheduled loops.
 -Qoption cg -switch[,-switch...]  (C++, Fortran)
Send the listed switch(es) to the code generator.  See the definitions
of the individual switches elsewhere in this page (alphabetically 
ordered).
 -Qoption f90comp -switch[,-switch...]  (Fortran)
Send the listed switch(es) to the Fortran 90 front end.  See the definitions
of the individual switches elsewhere in this page (alphabetically 
ordered).
 -Qoption iropt -switch[,-switch...]  (C++, Fortran)
Send the listed switch(es) to the global optimizer.  See the definitions
of the individual switches elsewhere in this page (alphabetically 
ordered).
 -Qpeep-Sh0 (code generator)
Reduce the probability that the compiler will hoist sethi insructions 
out of loops.
 RM_SOURCES = lapak.f90 (SPEC tools)
This option allows building the benchmark 178.galgel without its
copy of the lapak sources; instead, the lapak entry points in
the sunperf library are used.
 rm -rf ./feedback.profile ./SunWS_cache (Unix)
Remove any profile feedback information from previous runs.
 STACKSIZE=<n> (Unix)
Set the size of the stack (temporary storage area) for each slave
thread of a multithreaded program.
 -stackvar (Fortran)
Allocate routine local variables on the stack.
 submit=echo 'pbind -b...' > dobmk; sh dobmk (SPEC tools, Unix)
When running multiple copies of benchmarks, the SPEC config file feature 
submit is sometimes used to cause individual jobs to be
bound to specific processors:
 
   submit= causes the SPEC tools to use this line 
       when submitting jobs.
   echo ...> dobmk causes the generated commands 
       to be written to a file, namely dobmk.
   pbind -b causes this copy's processes to be bound to 
       the CPU specified by the expression that follows it.  See the 
       config file used in the submission for the exact syntax, which
       tends to be cumbersome because of the need to carefully quote
       parts of the expression.  When all expressions are evaluated,
       each CPU ends up with exactly one copy of each benchmark.
       The pbind expression may include:
       
       $SPECUSERNUM: the SPEC tools-assigned number for
        this copy of the benchmark.
       expr: Calculate simple arithmetic expressions.
           For example, the effect of binding jobs to a 
           (quote-resolved) expression such as:
           expr ( $SPECUSERNUM / 4 ) * 8 + ($SPECUSERNUM % 4 ) )
 would be to send the jobs to processors whose numbers are:
 0,1,2,3, 8,9,10,11, 16,17,18,19 ...
psrinfo: find out what processors are available
       grep on-line: search the psrinfo
           output for information regarding on-line cpus
       awk...print \$1: Pick out the 
           line corresponding to this copy of the benchmark
           and use the CPU number mentioned at the start of this line.
       sh dobmk actually runs the benchmark.
    tune_t_fsflushr=<n> (Unix)
Controls the number of seconds between runs of the file system
flush daemon, fsflush.
 ulimit -s unlimited (Unix)
Allow stack size to grow without limit.
 -W2,-switch[,-switch...] (C)
Send the listed switch(es) to the global optimizer.  See the definitions
of the individual switches elsewhere in this page (alphabetically 
ordered).
 -Wc,-switch[,-switch...] (C)
Send the listed switch(es) to the code generator.  See the definitions
of the individual switches elsewhere in this page (alphabetically 
ordered).
 -xalias_level=[basic|std|strong] (C)
Allows the compiler to perform type-based alias analysis at the
specified alias level:
 
   basic assume that memory references 
        that involve different C basic types do not alias each 
	other.
   std assume aliasing rules described in 
       the ISO 1999 C standard.
   strong in addition to the restrictions
        at the std level, assume that pointers of 
	type char * are used only to access an object of 
	type char; and assume that there are no interior pointers.
    -xalias_level=compatible (C++)
Allows the compiler to assume that layout-incompatible types
are not aliased.
 -xarch=v8plusb (C, C++, Fortran)
Allow the compiler to use instructions from architecture level v8plusb
(UltraSPARC III, 32-bit mode).
 -xautopar (C, Fortran)
Turn on automatic parallelization for multiple processors.
 -xbuiltin=%all (C, C++)
Substitute intrinsic functions or inline system functions where 
profitable for performance.
 -xchip=ultra3 (C, C++, Fortran)
Specify that the target processor will be an UltraSPARC-III.
 -xdepend (C, Fortran)
Analyze loops for inter-iteration data dependencies, and do loop
restructuring.
 -xinline= (C, C++, Fortran)
Turn off inlining.
 -xipo[=2] (C, C++, Fortran)
Perform optimizations across all object files in the link step:
 0=off
 1=on
 2=performs whole-program detection and analysis.
At -xipo=2, the compiler performs inter-procedural 
aliasing analysis as well as optimization of memory 
allocation and layout to improve cache performance.
 -xlibmil (C, C++, Fortran)
Use inline expansion for math library, libm.
 -xlibmopt (C++, Fortran)
Select the optimized math library.
 -xlic_lib=sunperf (C, C++, Fortran)
Link with Sun supplied licensed sunperf library.
 -xlinkopt (C, C++, Fortran)
Perform link-time optimizations, such as branch optimization
and cache coloring.
 -xmemalign=4s (C, C++, Fortran)
Set maximum assumed data alignment to be at a 4 byte boundary and
raise signal SIGBUS in the case of misaligned data accesses.
 -xO<n> (C, C++, Fortran)
Specify optimization level n:
 
   -xO1 does only basic local optimizations (peephole.)
   -xO2 Do basic local and 
       global optimizations, such as induction variable 
       elimination, common subexpression elimination, constant 
       propogation, register allocation, and basic block merging.
   
   -xO3 Add global 
        optimizations at the function level, loop unrolling, 
	and software pipelining.
   
   -xO4 Adds automatic 
        inlining of functions in the same file.
   
   -xO5 Uses optmization 
        algorithms that may take significantly more compilation 
	time or that do not have as high a probability of improving 
	execution time, such as speculative code motion.
    -xpad=common[:<n>] (Fortran)
If multiple same-sized arrays are placed in common, 
insert padding between them for better use of cache.
n specifies the amount of padding to apply,
in units that are the same size as the array elements.
If no parameter is
specified then the compiler selects one automatically.
 -xpad=local (Fortran)
Pad local variables, for better use of cache.
 -xpagesize=<n> (C, Fortran)
Set the preferred page size for running the program.
 -xpagesize_stack=<n> (C, Fortran)
Set the preferred stack page size for running the program.
 -xprefetch=auto,explicit (C, C++, Fortran)
Allow generation of prefetch instructions.  
-xprefetch=yes and
-xprefetch are 
synonyms for -xprefetch=auto,explicit.
 -xprefetch=latx:<n> (C, C++, Fortran)
Adjust the compiler's assumptions about prefetch latency by
the specified factor.  Typically values in the range of 
0.5 to 2.0 will be useful.  A lower number might indicate
that data will usually be cache resident; a higher number
might indicate a relatively larger gap between the processor
speed and the memory speed (compared to the assumptions built
into the compiler).
 -xprefetch=no%auto (C, C++, Fortran)
Turn off prefetch instruction generation.
 -xprefetch_level=<n> (C, C++, Fortran)
Control the level of searching that the compiler does for prefetch
opportunities by setting n to 1, 2, or 3, where higher
numbers mean to do more searching.  The default for Fortran is 2. The 
default for C and C++ is 1.
 -xprofile=collect:./feedback (C, C++, Fortran)
Collect profile data for feedback-directed optimization, and store it in
a subdirectory of the current directory, named ./feedback.
 -xprofile=use:./feedback (C, C++, Fortran)
Use data collected for profile feedback.  Look for it in 
a subdirectory of the current directory, named ./feedback.
 -xreduction (C, Fortran)
Analyze loops for reductions such as dot products, maximum and
minimum finding.
 -xrestrict (C)
Treat pointer-valued function parameters as restricted pointers.
 -xsafe=mem (C, C++, Fortran)
Enables the use of non-faulting loads when used in conjunction
with -xarch=v8plus. Assumes that no memory 
based traps will occur.
 -xtarget=native (C, C++, Fortran)
Selects options appropriate for the system where the compile is
taking place, including architecture, chip, and cache sizes.  (These
can also be controlled separately, via -xarch, -xchip, and -xcache, 
respectively.)
 -xunroll=<n> (C, C++, Fortran)
Enable unrolling loops n times where possible.
 -xvector (C, C++, Fortran)
Allow the compiler to transform math library calls within loops 
into calls to the vector math library.   Specifying
-xvector is equivalent to -xvector=yes.
 |