FLAG DESCRIPTIONS

SUN C, C++ AND FORTRAN Forte[tm] Developer 6 update 2,

and Forte[tm] Developer 7 Early Access

12/4/01

Flag

Description

-D

Set definition for preprocessor.

-dalign

Assume double-type data is double aligned.

-dn

Specify static binding.

-e

Accept extended (132 character) input source lines (FORTRAN).

-fast

This is a convenience option for selecting a set of optimizations for performance, and it chooses:

o The -native best machine characteristics option (-xarch=native, -xchip=native, -xcache=native)

o Optimization level: -xO5

o A set of inline expansion templates (-libmil)

o The -fsimple=2 option

o The -dalign option

o The -xalias_level=basic option (C only)

o The -xlibmopt option

o The -xdepend option (FORTRAN only)

o The -xprefetch option (FORTRAN only)

o Options to turn off all trapping (-fns -ftrap=%none)

-fixed

Accept fixed-format input source files (FORTRAN).

-fns

Select non-standard floating point mode.

This flag causes the nonstandard floating point mode to be enabled when a program begins execution. By default, the nonstandard floating point mode will not be enabled automatically.

On some SPARC systems, the nonstandard floating point mode disables "gradual underflow", causing tiny results to be flushed to zero rather than producing subnormal numbers. It also causes subnormal operands to be silently replaced by zero. On those SPARC systems that do not support gradual underflow and subnormal numbers in hardware, use of this option can significantly improve the performance of some programs.

Warning: When nonstandard mode is enabled, floating point arithmetic may produce results that do not con- form to the requirements of the IEEE 754 standard. See the Numerical Computation Guide for more information.

-fsimple=0

Permits no simplifying assumptions. Preserves strict IEEE 754 conformance.

-fsimple=1

With -fsimple=1, the optimizer can assume the following:

o The IEEE 754 default rounding/trapping modes do not change after process initialization.

o Computations producing no visible result other than potential floating-point exceptions may be deleted.

o Computations with Infinity or NaNs as operands need not propagate NaNs to their results. For example, x*0 may be replaced by 0.

o Computations do not depend on sign of zero.

-fsimple=2

Permits aggressive floating point optimizations that may cause programs to produce different numeric results due to changes in rounding. Even with -fsimple=2, the optimizer still is not permitted to introduce a floating point exception in a program that otherwise produces none.

-fsimple[=n]

Allows the compiler to make simplifying assumptions concerning floating-point arithmetic.

-ftrap=t

Sets the IEEE 754 trapping mode in effect at startup.

t is a comma-separated list that consists of one or more of the following: %all, %none, common, [no%]invalid, [no%]overflow, [no%]underflow, [no%]division, [no%]inexact.

The default is -ftrap=%none.

This option sets the IEEE 754 trapping modes that are established at program initialization. Processing is left-to-right. The common exceptions, by definition, are invalid, division by zero, and overflow.

o %none, the default, turns off all trapping modes.

Do not use this option for programs that depend on IEEE standard exception handling; you can get different numerical results, premature program termination, or unexpected SIGFPE signals.

-libmil

Use inline expansion templates for libm.

-library=iostream

Use "classic" (pre 1998 C++ standard) iostream library

Prior to the C++ standard (1998), there was one iostream library, what is now often called "classic" iostreams. The C++ standard defines a different, but similar, iostream library, which we call "standard" iostreams. To get classic iostreams in standard (default) mode, you use the option "-library=iostream".

-ll2amm

Library containing chip specific memory routines.

-lm

Link with math library

-lmopt

This chooses the math library that is optimized for speed

-lprism32

Library to enable ISM (4MB page) usage.

-lsunperf

Link with the Sun Performance Library (netlib and SIAM routines)

-native

Select native machine characteristics for optimization.

-Qicache-chbab=1

See -Wc,-Qicache-chbab=1

-Qoption <phase> <flags>

Pass flags along to compiler phase:

f90comp Fortran first pass

iropt Global optimizer

cg Code generator

-Qoption cg <flags>

See -Wc,<flags> below.

-Qoption cg -Qlp=1-av=<nav> -t=<nt>-fa=1-fl=1

See -Wc,-Qlp=1-av=<nav> -t=<nt>-fa=1-fl=1

-Qoption f90comp -array_pad_rows,<n>

Enable padding of f90 arrays by n.

-Qoption f90comp -expansion

Enable f90 array expansion.

-qoption f90comp -O3

This reduces the optimization level of the f90 front/middle end to O3. The effect is to turn off loop cloning and unrolling (note that it has no effect on cg's loop unrolling).

-Qoption iropt <flags>

See -W2,<flags> below.

-Qoption iropt -Adata_access

enable optimizations based on data access patterns

-Qoption iropt -Addint:sf=9

Set memory store operation weight for loop interchange to 9

-Qoption iropt -Amemopt

See -W2,-Amemopt

-Qoption iropt -Ma<n>

See -W2,-Ma<n>

-Qoption iropt -Mm<n>

See -W2,-Mm<n>

-Qoption iropt -MR

Do not inline calls when parameters are arrays and actual array dimensions and formal array dimensions are mismatched

-Qoption iropt -Mr<n>

See -W2,-Mr<n>

-Qoption iropt -O4+scalarrep

disable scalar replacement optimization

-Qoption iropt -Rscalarrep,-MR

Same as -Qoption iropt -Rscalarrep plus -Qoption iropt -MR

-Qoption iropt -whole

See -W2,-whole

-stackvar

Allocate routine local variables on stack (FORTRAN).

-W<phase>,<flags>

Pass flags along to compiler phase (2=optimizer, c=code generator)

-W2,-Abopt

Enable aggressive optimizations of all branches.

-W2,-Adata_access

Enable optimizations based on data access patterns.

-W2,-Aheap

Allows the compiler to recognize malloc-like memory allocation functions.

-W2,-Ainline

Perform IPA-based inlining.

-W2,-Aivel:duplicate_loops

More aggresive strength reduction by replicating loops.

-W2,-Amemopt

Memory access optimization. This does whole-program mode inter-procedural memory access analysis, merges memory allocations, and performs cache conscious data layout program transformations.

-W2,-Amemopt:arrayloc

Reconstruct array subscripts during memory allocation merging and data layout program transformation

-W2,-Ashort_ldst

Convert multiple short memory operations into single long memory operations.

-W2,-Aunroll

Enables outer-loop unrolling.

-W2,-crit

Enable optimization of critical control paths

-W2,-Ma<n>

Enable inlining of routines with frame size up to n.

-W2,-Mm<n>

Maximum module increase limit for inlining.

-W2,-Mp<n>

Procedures with entry counts equal or greater than n become candidates for inlining.

-W2,-Mr<n>

Maximum code increase due to inlining is limited to n triples.

-W2,-Ms<n>

Maximum level of recursive inlining.

-W2,-Mt<n>

The maximum size of a routine body eligible for inlining is limited to n triples.

-W2,-O4+restrict_g

Assume that different global pointer variables point to their own memory locations.

-W2,-reroll=1

Turns on loop rerolling.

-W2,-whole

Do whole program optimizations.

-Wc,-Qdepgraph-early_cross_call=1

Enable early cross-call instruction scheduling.

-Wc,-Qeps:do_spec_load=1

Allow generating speculative load during EPS.

-Wc,-Qeps:enabled=1

Use enhanced pipeline scheduling(EPS) and selective scheduling algorithms for instruction scheduling.

-Wc,-Qeps:rp_filtering_margin=100

Turn off register pressure heuristic in EPS.

-Wc,-Qgsched-T4

Sets the aggressiveness of the trace formation.

-Wc,-Qgsched-trace_late=1

Turns on the late trace scheduler.

-Wc,-Qgsched-trace_spec_load=1

Turns on the conversion of loads to non-faulting loads inside the trace.

-Wc,-Qicache-chbab=1

Turn on optimization to reduce branch after branch penalty

-Wc,-Qinline_memcpy=<n>

Inline calls to memcpy with n bytes or fewer being copied

-Wc,-Qipa:valueprediction

Use profile feedback data to predict values and attempt to generate faster code along these control paths, even at the expense of possibly slower code along paths leading to different values. Correct code is generated for both paths.

-Wc,-Qiselect-funcalign=<n>

Do function entry alignment at n-byte boundaries.

-Wc,-Qiselect-sw_pf_tbl_th=<n>

Peels the most frequent test branches/cases off a switch until the branch probability reaches less than 1/n. This is effective only when profile feedback is used

-Wc,-Qlp=1-av=<nav> -t=<nt>-fa=1-fl=1

Control irregular loop prefetching.

lp lp=1 turns on the module (default is on for F90; off for C/C++)

fa fa=1 forces user settings to override internally computed values.

fl fl=1 forces the optimization to be turned on for all languages.

t Make <nt> attempts at prefetching.

av Sets the prefetch look ahead to <nav>.

-Wc,-Qms_pipe+intdivusefp

Use fp divide for signed integer division

-Wc,-Qms_pipe-pref

Turn off prefetching within modulo scheduling

-Wc,-Qpeep-Sh0

Disables the max live base registers algorithm for sethi hoisting.

-Xa

Assume ANSI C conformance, allow K & R extensions. (default mode)

-xalias_level=<a>

Allows compiler to perform type-based alias analysis at the given alias level.

basic assume ISO C9X aliasing rules for basic types only.

std assume ISO C9X aliasing rules.

strong assume all pointers are type safe (strongly typed).

-xarch=<a>

Limit the set of instructions the compiler may use to generic, generic64, native, native64, v7, v8a, v8, v8plus, v8plusa, v8plusb, v9, v9a, v9b. Typical settings include:

UltraSPARC-II, 32-bit mode: v8plusa

UltraSPARC-II, 64-bit mode: v9a

UltraSPARC-III, 32-bit mode: v8plusb

UltraSPARC-III, 64-bit mode: v9b

For more information, see the Fortran User's Guide at docs.sun.com or

ftp://192.18.99.138/806-7988/806-7988.pdf

-Xc

Assume strict ANSI C conformance.

-xcache=<c>

Defines the cache properties for use by the optimizer.

c must be one of the following:

o native (set parameters for the host environment)

o s1/l1/a1

o s1/l1/a1:s2/l2/a2

o s1/l1/a1:s2/l2/a2:s3/l3/a3

The si/li/ai are defined as follows:

si The size of the data cache at level i, in kilobytes.

li The line size of the data cache at level i, in bytes.

ai The associativity of the data cache at level i.

-xchip=<c>

Specifies the target processor for use by the optimizer. c must be one of: generic, generic64, native, native64, old, super, super2, micro, micro2, hyper, hyper2, powerup, ultra, ultra2, ultra2i, ultra3, 386, 486, pentium, pentium_pro, 603, 604.

-xcrossfile

Enable cross-file inlining.

-xdepend

Analyze loops for data dependencies.

-xF

Allow function reordering by the WorkShop Performance Analyzer

-xinline=

Turn off inlining

-xipo=n

Performs optimizations across all object files in the link step: 0=off, 1=on, 2=performs whole-program detection and analysis

-xlibmopt

This chooses the math library that is optimized for speed.

-xO1

Does basic local optimization (peephole).

-xO2

xO1 and more local and global optimizations.

-xO3

Besides what xO2 does, it optimizes references or definitions for external variables. Loop unrolling and software pipelining are also performed.

-xO4

xO3 plus function inlining.

-xO5

Besides what xO4 does, it enables speculative code motion.

-xpad=common[:<n>]

Pad common block variables, for better use of cache. n specifies the amount of padding to apply. If no parameter is specified then the compiler selects one automatically.

-xpad=local[:<n>]

Pad local variables only, for better use of cache. n specifies the amount of padding to apply. If no parameter is specified then the compiler selects one automatically.

-xparallel

Use parallel processing to improve performance.

-xprefetch

Enable prefetch instructions on those architectures that support prefetch, such as UltraSPARC II (-xarch=v8plus, v8plusa, v9plusb, v9, v9a, or v9b)

auto

Enable automatic generation of prefetch instructions

no%auto

Disable automatic generation of prefetch instructions

explicit

Enable explicit prefetch macros

no%explicit

Disable explicit prefetch macros

yes

-xprefetch=yes is the same as -xprefetch=auto,explicit

no

-xprefetch=no is the same as -xprefetch=no%auto,no%explicit

Defaults

If -xprefetch is not specified, -xprefetch=no%auto,explicit is assumed.

If only -xprefetch is specified, -xprefetch=auto,explicit is assumed.

-xprofile=collect

Collect profile data for feedback directed optimizations.

-xprofile=use

Use data collected for profile feedback.

-xreduction

Parallelize loops containing reductions.

-xregs=syst

Allows use of the system reserved registers %g6 and %g7, and %g5 if not already allowed by -xarch value.

-xrestrict[=f1,...,f2,%all, %none]

Treat pointer-valued function parameters as restricted pointers. The default is %none. Specifying -xrestrict is equivalent to specifying -xrestrict=%all.

-xsafe=mem

Enables the use of non-faulting loads when used in conjunction with -xarch=v8plus. Assumes that no memory based traps will occur.

-xsfpconst

Represents unsuffixed floating-point constants as single precision

-Xt

Assume K & R conformance, allow ANSI C.

-xtarget=native

Same as -native

Kernel Parameters

Flag

Description

shmsys:shminfo_shmmin

Minimum size of system V shared memory segment that can be created.

shmsys:shminfo_shmmax

Maximum size of system V shared memory segment that can be created. This parameter is an upper limit that is checked before the system sees if it actually has the physical resources to create the requested memory segment.

shmsys:shminfo_shmmni

System wide limit on number of shared memory segments that can be created.

shmsys:shminfo_shmseg

Limit on the number of shared memory segments that any one process can create.

Environment Variables

Flag

Description

LD_LIBRARY_PATH=<p>

Specify the locations to resolve dynamic link dependencies

By default, the runtime linker looks in only one standard place for dependencies: /usr/lib for 32-bit dependencies, or /usr/lib/64 for 64-bit dependencies. Any dependency specified as a simple filename is prefixed with this default directory name and the resulting pathname is used to locate the actual file.

If you have more than one release of the compilers installed, the environment variable LD_LIBRARY_PATH may be set to a colon-separated list of directories to enable dependencies to be resolved by any of them. Typical usages could include, depending on the location where you chose to install the compilers:

export LD_LIBRARY_PATH=/opt/SUNWspro/<release>/lib:$LD_LIBRARY_PATH

export LD_LIBRARY_PATH= /opt/SUNWspro/<release>/lib/v8plusb: $LD_LIBRARY_PATH

Note: <release> is specific for each release of Sun WorkShop software.

The latter example specifies the dynamic libraries specific to the v8plusb architecture (See -xarch). For more information see the C++ User's Guide at docs.sun.com or

ftp://192.18.99.138/806-7991/806-7991.pdf

PRISM_HEAP=<n>

Set the heap size limit for large pages

PRISM_MODE=2

Large page mode: Attempt to put text, data and heap all into large pages.