FLAG DESCRIPTIONS
                     SUN C AND FORTRAN WS6

-dalign         Assume double-type data is double aligned
-[no]depend     [Disable] Enable all dependence based transformations
-dn             Specify static binding
-fns            Select non-standard floating point mode
-fsimple[=n]    Allows the optimizer to  make  simplifying  assumptions
          concerning  floating-point arithmetic. If n is present,
          it must be 0, 1, or 2.

          The defaults are:
          o  With no -fsimple[=n], the compiler uses -fsimple=0.
          o  With only -fsimple, no =n, the compiler uses  -fsim-
          ple=1.

          -fsimple=0
         Permits no simplifying  assumptions.  Preserves  strict
          IEEE 754 conformance.

          -fsimple=1
          Allows conservative simplifications. The resulting code
          does  not  strictly  conform  to  IEEE 754, but numeric
          results of most programs are unchanged.

          With -fsimple=1, the optimizer can assume  the  follow-
          ing:
          o  The IEEE 754 default rounding/trapping modes do  not
          change after process initialization.
          o Computations producing no visible result  other  than
          potential floating- point exceptions may be deleted.
          o Computations with Infinity or NaNs as  operands  need
          not  propagate  NaNs to their results. For example, x*0
          may be replaced by 0.
          o Computations do not depend on sign of zero.

          With -fsimple=1, the optimizer is not allowed to optim-
          ize  completely  without  regard  to roundoff or excep-
          tions. In particular, a floating-point computation can-
          not  be replaced by one that produces different results
          with rounding modes held constant at  run  time.  -fast
          implies -fsimple=1.

          -fsimple=2
          Permits aggressive floating  point  optimizations  that
          may  cause  many  programs to produce different numeric
          results due to changes in rounding. For  example,  per-
          mits  the  optimizer to replace all computations of x/y
          in a given loop with x*z, where x/y is guaranteed to be
          evaluated  at  least  once  in the loop, z=1/y, and the
          values of y and z are known  to  have  constant  values
          during execution of the loop.
          Even with -fsimple=2, the optimizer still is  not  per-
          mitted  to  introduce  a  floating point exception in a
          program that otherwise produces none.

-libmil         Use inline expansion templates for libm
-s              Strip symbol table from the executable file
-xlibmopt	This chooses the math library that is optimized for speed
		at the expense of some accuracy.


-xO4            Generate optimized code. See -O4 below.
-xO5            Generate optimized code. See -O5 below.
-pad            local variables or common blocks, or both, for
                efficient use of the cache
-[x]pad=local[:<n>]
                Pad local variables only, for better use of cache. n specifies
                the amount of padding to apply. If no parameter is specified
                then the compiler selects one automatically.
-[x]pad=common[:<n>]
                Pad common block variables, for better use of cache. n
                specifies the ammount of padding to apply. If no parameter is
                specified then the compiler selects one automatically.
-unroll=<n>     Suggestion to optimizer to unroll loops n times
-xarch=<a>      Limit the set of instructions the compiler may use to
                (generic,v7,v8a,v8,v8plus,v8plusa,v9,v9a,v9b)
-xcache=<c>     Define the cache properties for use by the optimizer
-xchip=<c>      Define the instruction scheduling properties for use by
                the optimizer
-xcrossfile     enable cross-file inlining.
-xprofile=use   Use data collected for profile feedback
-xprofile=collect
                Collect profile data for feedback directed optimizations.
-xparallel      Use parallel processing to improve performance
-xreduction     Parallelize loops containing reductions
-xsafe=mem      Enables the use of non-faulting loads when used in conjunction
                with -xarch=v8plus is set, assumes that no memory based traps
                will occur
-fast           Fast execution. Select the combination of compilation
                options that optimizes for speed of execution without
                excessive compilation time. This is a convenience option,
                and it chooses:

                o The -native best machine characteristics option
                  (-xarch=native, -xchip=native, -xcache=native)

                o Optimization level: -O5

                o A set of inline expansion templates (-libmil)

                o The -fsimple=2 option

                o The -dalign option (SPARC only)

                o The -xlibmopt option (SPARC only)
                
                o The -xdepend option (FORTRAN only)

                o Options to turn off all trapping (-fns -ftrap=%none)

-xO5            Besides what -xO4 does, enables speculative code motion.

-xO4:           Besides what -O3 does, this option does
                automatic inlining of functions in the same
                file. The code usually runs faster, but for
                some code, -O4 makes it run more slowly. -g
                suppresses automatic inlining. In general,
                -O4 results in larger code.

-xO3 		Performs like -xO2 but, also optimizes refer-
                ences or definitions for external variables.
                Loop unrolling and software pipelining are
                also performed. The -xO3 level does not trace
                the effects of pointer assignments. When com-
                piling either device drivers, or programs
                that modify external variables from within
                signal handlers, you may need to use the
                volatile type qualifier to protect the object
                from optimization.  In general, the -xO3
                level results in increased code size.


-xO2		Does basic local and global optimization.
                This is induction variable elimination, local
                and global common subexpression elimination,
                algebraic simplification, copy propagation,
                constant propagation, loop-invariant optimi-
                zation, register allocation, basic block
                merging, tail recursion elimination, dead
                code elimination, tail call elimination and
                complex expression expansion.

                The -xO2 level does not assign global, exter-
                nal, or indirect references or definitions to
                registers. It treats these references and
                definitions as if they were declared "vola-
                tile." In general, the -xO2 level results in
                minimum code size.

-xO1		Does basic local optimization (peephole).

-xvector        enable vectorization of loops with calls to math routines

-xprefetch      enable generation of prefetch instructions

-xstackvar      allocate routine local variables on stack (Fortran)

-xrestrict[=f1,...,f2,%all, %none]
                Treat pointer-valued function parameters as restricted
                pointers. This command-line option can be used on its
                own, but is best used with optimization.
                The default is %none. Specifying -xrestrict is
                equivalent to specifying -xrestrict=%all.

-xregs=syst
		Allows use of the system reserved registers %g6 and %g7,
		and %g5 if not already allowed by -xarch= value.

-Xc             Compile assuming strict ANSI C conformance
-Xa             Compile assuming ANSI C conformance, allow K & R extensions 
(default mode)
-Xt             Compile assuming K & R conformance, allow ANSI C


-Qoption <phase> <flags>
                Pass flags along to compiler phase:
                cg              Code generator
                f77pass1        Fortran first pass
                iropt           Internal representation optimizer
-W<phase>,<flags>
                Pass flags along to compiler phase:
                2               Second pass
                c               code generator

-Qoption cg -Qms_pipe+nfll=<n>
                specifies n as the latency of non-floating point load
                instructions.
-Qoption cg -Qms_pipe-off
	Turn off the software pipeliner.
-Qoption iropt -O4+ansi_alias
                Assume (more restrictive) ANSI C semantics for pointer aliasing
-Qoption iropt -O4+scalarrep
                disable scalar replacement optimization
-Qoption iropt -O4+algassoc
                enable floating point reassociation
-Qoption iropt -O4+unroll
                enable aggressive loop unrolling.
-Qoption iropt -Si<n>
                Sets n as the limit of general integer virtual registers 
		for register allocation optimization.  Default is 30.
-Qoption iropt -O4+bcopy
                enable vectorization of copy and memset loops
-Qoption iropt -O4+data_access
                enable optimizations based on data access patterns
-Qoption iropt -reroll=1
                enable automatic loop rerolling of completely unrolled loop
                nests
-Qoption iropt -O4+invccexp
		See -W2,-O4+invccexp below
-Qoption iropt -O4+pde
		See -W2,-O4+pde below
-Qoption iropt -whole
		See -W2,-whole below
-W2,-whole      do whole program optimizations
-W2,-fsimple=2  perform aggresive floating point simplification and
                optimizations.
-W2,-Mp<n>      Procedures with entry counts equal or greater than n
                become candidates for inlining.
-W2,-Mt<n>      The maximum size of a routine body elegible for inlining
                is limited to n triples.
-W2,-Mr<n>      maximum code increase due to inlining is limited to n triples
-W2,-Ma<n>      enable inlining of routines with frame size upto n
-W2,-Mm<n>	maximum module increase limit for inlining
-W2,-O4+pde     enable aggressive dead code elimination
-W2,-O4+cond_elim
                enable aggresive optimizations of conditional branches
-W2,-O4+bopt    enable aggresive optimizations of all branches
-W2,-O4+bmerge
                enable branch merge optimizations
-W2,-O4+invccexp
                enable hoisting of invariant branches
-W2,-ANSI_S     use ANSI semantics for routines with hidden control flow (e.g. 
setjmp)
-W2,-ldstr      enable hoisting of load and store instructions
-W2,-crit       enable optimization of critical control paths
-W2,-O4+ipa	perform interprocedural optimizations
-W2,-O4+heap	keep track of malloc like memory allocation calls

-Wc,-Qicache-L1-bsize=4-bbits=7
                Do  L1 instruction cache alignment.  The -L1 selects 
                loop boundaries.  The -bsize=4 selects the alignment 
                boundary == 16 bytes.  The -bbits=7 selects the bad 
                alignments, not the last three of the 4 instr's per 
                16 bytes.  This is really one option and not really
                partitionable.

                The default is to not do any I-cache alignment.

-Wc,-Qiselect-funcalign=<n>
                do function entry alignment at n-byte boundaries.
-Wc,-Qms_pipe+unoovf
                do software pipelining for loops with unsigned counters
-Wc,-Qiselect-sw_pf_tbl_th=<n>
                Peels the most frequent test branches/cases off a switch until
                the branch probability reaches less than 1/n. This is
                effective only when profile feedback is used.
-Wc,-Qdepgraph-early_cross_call=1
		Enable early cross-call instruction scheduling.

-lfast          Link in the fast system libraries.


                              Kernel Parameters
                              -----------------

consistent_coloring
                Consistent Coloring controls the page coloring policy.
                It can be set to one of the following:
                      0: (default) dynamic (uses various vaddr bits)
                      1: static (virtual=paddr)
                      2: bin hopping