Flag description file for Sun compiled SPECcpu2000 binaries using the Sun Studio 11 Compiler and for the Solaris 10 OS. This file is for flags used with the Opteron based systems. ---------------------------------------------------------------------------- Sun Studio 11 compiler flags ---------------------------------------------------------------------------- Portability Flags: -DSPEC_CPU2000_LP64 Compile using LP64 programming model. -DFMAX_IS_DOUBLE Specifies whether FMAX is double or float. Used in 252.eon. -DSYS_HAS_ANSI System is ANSI compliant. Used in 254.gap. -DUNIX Compile for a Unix system. Use portability settings like host endianess, OS type, and ANSI language extensions to be compatible with an UNIX systems. -DUSE_STRERROR -Dalloca=__builtin_alloca (Portability: SPEC Tools) Portability switch, used for 176.gcc: allow use of compiler's internal builtin alloca. -DSPEC_CPU2000_SOLARIS_X86 (Portability: SPEC Tools) Portability switch, used for 253.perlbmk: selects header files and code paths compatible with Solaris. -DHOST_WORDS_LITTLE_ENDIAN Portability switch, used for 176.gcc: Host system is little-endian. -DLITTLE_ENDIAN_ARCH Portability switch, used for 186.crafty: Host architecture is little-endian. -DSYS_HAS_CALLOC_PROTO (Portability: SPEC Tools) Do not supply a prototype for calloc(). Portability switch, used for 254.gap: allows use of the designated prototype. -DSYS_HAS_MALLOC_PROTO (Portability: SPEC Tools) Do not supply a prototype for malloc(). Portability switch, used for 254.gap: allows use of the designated prototype. -DSYS_HAS_IOCTL_PROTO (Portability: SPEC Tools) Portability switch, used for 254.gap: allows use of the designated prototype. -DSYS_HAS_SIGNAL_PROTO (Portability: SPEC Tools) Portability switch, used for 254.gap: allows use of the designated prototype. -DSYS_HAS_TIME_PROTO (Portability: SPEC Tools) Portability switch, used for 254.gap: allows use of the designated prototype. -DSYS_IS_USG (Portability: SPEC Tools) Portability switch, used for 254.gap: selects code compatible with USG-based systems. -DHAS_LONGLONG (Portability: SPEC Tools) Portability switch, used for 186.crafty: allows use of the designated prototype. -DHAS_STDIO_PROTO (Portability: SPEC Tools) Portability switch, used for 254.gap: allows use of the designated prototype. -DSYS_HAS_READ_PROTO (Portability: SPEC Tools) Portability switch, used for 254.gap: allows use of the designated prototype. -DSYS_HAS_STRING_PROTO (Portability: SPEC Tools) Portability switch, used for 254.gap: allows use of the designated prototype. -e Accept extended (132 character) input source lines (FORTRAN) -fixed Accept fixed-format input source files (FORTRAN) -Xc Portability flag. Strictly conformant ISO C, without K&R C compati- bility extensions -Xt Portability flag. This option uses ISO C plus K&R C compatibility extensions without semantic changes required by ISO C. Optimization Flags: -D Set definition for preprocessor. -Ainline[:cp=][:cs=][:inc=][:irs=] [:mi][:recursion=1] (optimizer) Control the optimizer's loop inliner: cp= The minimum call site frequency counter in order to consider a routine for inlining. cs= Set inline callee size limit to n. The unit roughly corresponds to the number of instructions. inc= The inliner is allowed to increase the size of the program by up to n%. irs= Allow routines to increase by up to n. The unit roughly corresponds to the number of instructions. rs= The inliner only considers routines smaller than n pseudo instructions as possible inline candidates. mi Perform maximum inlining (without considering code size increase). recursion=1 Allow routines that are called recursively to still be eligible for inlining. -Wd,-iropt-prof Use iropt in the profile phase of the compiler iropt is the Global optimizer. -qoption CC -iropt-prof Use iropt in the profile phase of the compiler iropt is the Global optimizer. -Qoption ube -xcallee=no Do not assume callee-save registers are saved. -Qoption ube -xcallee=yes -xcallee=yes is the default. -Qoption iropt -Rloop_dist Do not perform loop distribution transformations. -W2,-Arestrict_g Assumes global pointers are not aliased (restricted). -Abcopy (optimizer) Increase the probability that the compiler will perform memcpy/memset transformations. -Ashort_ldst (optimizer) : Convert multiple short memory operations into single long memory operations. -Ashort_ldst:ldld: Convert multiple short memory loads into single long load operations. -Atile:skewp[:b] (optimizer) Perform loop tiling which is enabled by loop skewing. Loop skewing is a transformation that transforms a non-fully interchangeable loop nest to a fully interchangeable loop nest. The optional b sets the tiling block size to n. -dalign Selects generation of faster double word load/store instructions, and alignment of double and quad data on their natural boundaries in common blocks. -depend=yes Selects dependence analysis to better optimize DO loops. -fast This is a convenience option for selecting a set of optimizations for performance and it chooses the following switches that are defined elsewhere in this page: (C) -fns -fsimple=2 -fsingle -ftrap=%none -nofstore -xalias_level=basic -xbuiltin=%all -xdepend -xlibmil -xlibmopt -xO5 -xregs=frameptr -xtarget=native (Fortran) -xtarget=native -xO5 -xlibmil -fsimple=2 -dalign -xlibmopt -depend=yes -fns -ftrap=common -pad=local -xvector=yes -xprefetch=yes -xprefetch_level=2 -nofstore -fns Select non-standard floating point mode. This flag causes the nonstandard floating point mode to be enabled when a program begins execution. By default, the nonstandard floating point mode will not be enabled automatically. Warning: When nonstandard mode is enabled, floating point arithmetic may produce results that do not conform to the requirements of the IEEE 754 standard. See the Numerical Computation Guide for more information (see docs.sun.com). -fsimple=1 Select floating-point optimization preferences. Allow conservative simplifications. The resulting code does not strictly conform to IEEE 754, but numeric results of most programs are unchanged. With -fsimple=1, the optimizer can assume the following: IEEE 754 default rounding/trapping modes do not change after process initialization. Computations producing no visible result other than potential floating point exceptions might be deleted. Computations with Infinity or NaNs as operands need not propagate NaNs to their results; e.g., x*0 might be replaced by 0. Computations do not depend on sign of zero. With -fsimple=1, the optimizer is not allowed to optimize completely without regard to roundoff or exceptions. In particular, a floating-point computation cannot be replaced by one that produces different results with rounding modes held constant at run time. -fsimple=2 Selects aggressive floating-point optimizations. This option might be unsuited for programs requiring strict IEEE 754 standards compliance. -fsingle (-Xt and -Xs modes only) Causes the compiler to evaluate float expressions as single precision, rather than double precision. (This option has no effect if the compiler is used in either -Xa or -Xc modes, as float expressions are already evaluated as single precision.) -ftrap=t Sets the IEEE 754 trapping mode in effect at startup. t is a comma-separated list that consists of one or more of the following: %all, %none, common, [no%]invalid, [no%]overflow, [no%]underflow, [no%]division, [no%]inexact. The default is -ftrap=%none. This option sets the IEEE 754 trapping modes that are established at program initialization. Processing is left-to-right. common - invalid, division by zero, and overflow. %none - the default, turns off all trapping modes. Do not use this option for programs that depend on IEEE standard exception handling; you can get different numerical results, premature program termination, or unexpected SIGFPE signals. -lbsdmalloc General purpose memory allocation package supports routines malloc, free and realloc. They maintain a table of free blocks for efficient allocation and coalescing of free storage. When there is no suitable space already free, the allocation routines call sbrk(2) to get more memory from the system. Additional information from can be obtained from bsdmalloc man page and the follow section from the ld man page: -lm Link with math library -lmopt This chooses the math library that is optimized for speed -M Reads mapfile as a text file of directives to ld. This option can be specified multiple times. If mapfile is a directory, then all regular files, as defined by stat(2), within the directory are processed. See Linker and Libraries Guide for a description of mapfiles. Example mapfiles are provided in /usr/lib/ld. See FILES. -M /usr/lib/ld/map.bssalign Linker mapfile that enables the creation of a 'bss' segment, and aligns the segment at 4Mb. This effectively provides an appropriate alignment for large page mapping of the heap, and thus can be useful when building dynamic executables. See ppgsz(1) -nofstore Cancels forcing expressions to have the precision of the result. -pad=local Local padding to improve use of cache. -stackvar Force all local variables to be allocated on the stack. Allocates all the local variables and arrays in routines onto the memory stack unless otherwise specified. This option makes these variables automatic rather than static and provides more freedom to the optimizer when parallelizing loops with calls to subprograms. -xalias_level[=] where is one of:any, basic, weak, layout, strict, std, strong. It allows compiler to perform type-based alias analysis at the given alias level (C). If you do not supply with -xalias_level, the compiler assumes -xalias_level=any. any - The compiler assumes that all memory references can alias at this level. There is no type-based alias anaylysis. basic - assume ISO C9X aliasing rules for basic types only. std - assume ISO C9X aliasing rules. strong - assume all pointers are type safe (strongly typed). -xarch=isa This option limits the code generated by the compiler to the instructions of the specified instruction set architecture. generic This is the default. This option generates 32-bit applications. sse2 Adds the SSE2 instruction set amd64 Compile 64-bit Solaris x86 applications. native This is the default for the -fast option. The compiler chooses the appropriate setting for the current system processor it is running on and generates 32-bit applications. -xbuiltin=%all Substitute intrinsic functions or inline system functions where profitable for performance. -xcrossfile[=] Enable optimization and inlining across source files, n={0|1}. The default is -xcrossfile=0 which specifies that no cross file optimizations are performed. -xcrossfile is equivalent to -xcrossfile=1. Normally, the scope of the compiler's analysis is limited to each separate file on the command line. With -xcrossfile, the compiler analyzes all the files named on the command line as if they had been concatenated into a single source file. -xdepend Analyze loops for data dependencies. -xipo[=] Enable optimization and inlining across source files, n={0|1|2}. At -xipo=2, the compiler performs interprocedural aliasing analysis as well as optimiza- tion of memory allocation and layout to improve cache performance. -xlibmil selects inlining of certain math library routines. -xlibmopt Selects linking the optimized math library. -xlic_lib=sunperf Link in the Sun supplied performance libraries -O (Fortran) Use of -O (which implies -O3) -xO[n] Synonym for -O[n]. -xO1 Does basic local optimization (peephole). -xO2 xO1 and more local and global optimizations. -xO3 Besides what xO2 does, it optimizes references or definitions for external variables. Loop unrolling and software pipelining are also performed. -xO4 xO3 plus function inlining. -xO5 Besides what xO4 does, it enables speculative code motion. -xprefetch_level[=] Controls the aggressiveness of the -xprefetch=auto option (n={1|2|3}) -xprefetch_level=1 enables automatic generation of prefetch instructions. -xprefetch_level=2 enables additional generation beyond level 1 and -xprefetch=3 enables additional generation beyond level 2. -xprefetch[=val[,val]] Enable prefetch instructions on those architectures that support prefetch. auto Enable automatic generation of prefetch instructions. no%auto Disable automatic generation of prefetch instructions explicit Enable explicit prefetch macros no%explicit Disable explicit prefetch macros yes -xprefetch=yes is the same as -xprefetch=auto,explicit no -xprefetch=no is the same as -xprefetch=no%auto,no%explicit Defaults If -xprefetch is not specified, -xprefetch=no%auto,explicit is assumed. If only -xprefetch is specified, -xprefetch=auto,explicit is assumed. -xprofile Use the profile feature, shorthand used for the process below -xprofile=

Collect data for a profile or use a profile to optimize

={{collect,use}[:],tcov} collect[:name] Collects and saves execution frequency for later use by the optimizer with -xprofile=use. The compiler generates code to measure statement execution-frequency. use[:name] Uses execution frequency data to optimize strategically. The name is the name of the executable that is being analyzed. -xregs= Specify the usage of optional registers -xregs=r[,r...] Specify the usage of registers for the generated code. r is a comma-separated list of one or more of the following: [no%]appl, [no%]float, [no%]frameptr. [no%]frameptr (x86 only): [Does not] Allow the compiler to use the frame-pointer register (%ebp on IA32, %rbp on AMD64) as an unallocated callee-saves register. Using this register as an unallocated callee- saves register may improve program run time. However, it also reduces the capacity of some tools, such as the Performance Analyzer and dtrace, to inspect and follow the stack. This stack inspection capability is important for system performance measurement and tuning. Therefor, using this optimization may improve local program performance at the expense of global system performance. -xrestrict Treat pointer-valued function parameters as restricted pointers. -xtarget=native Selects options appropriate for the system where the compile is taking place, including architecture, chip, and cache sizes. -xvector Enable automatic generation of calls to the vector library functions. Specifying -xvector is equivalent to -xvector=yes. It permits the compiler to transform math library calls within DO loops into single calls to the equivalent vector math routines when such transformations are possible. This could result in a performance improvement for loops with large loop counts. -xvector=simd Automatic generation of the vector SIMD instructions -Qoption Pass option list to the compiler phase (Fortran, C++): f90comp Fortran first pass iropt Global optimizer cg Code generator -Qoption iropt -Aujam:inner=g Increase the probability that small-trip-count inner loops will be fully unrolled. -Qoption ube_ipa -inl_alt (Fortran x86) Invokes Interprocedural analyzer (x86). -W2,-switch[,-switch...] (C) Send the listed switch(es) to the global optimizer. See the definitions of the individual switches elsewhere in this page. -xpad=common[:] (Fortran) If multiple same-sized arrays are placed in common, insert padding between them for better use of cache. n specifies the amount of padding to apply, in units that are the same size as the array elements. If no parameter is specified then the compiler selects one automatically. -xpagesize= (C, Fortran) Set the preferred page size for running the program. -xpagesize_heap= (C, Fortran) Set the preferred heap page size for running the program. -xpagesize_stack= (C, Fortran) Set the preferred stack page size for running the program. -xprofile=

Collect or optimize with runtime profiling data

must be collect[:nm], use[:nm], or tcov. At runtime a program compiled with -xprofile=collect:nm will create the subdirectory nm.profile to hold the runtime feedback information. nm is an optional name. -xprofile=collect Collect profile data for feedback directed optimizations. -xprofile=use Use data collected for profile feedback. ulimit -s unlimited Set size of stack segment to unlimited submit=echo 'pbind -b...' > dobmk; sh dobmk (SPEC tools, Unix) When running multiple copies of benchmarks, the SPEC config file feature submit is sometimes used to cause individual jobs to be bound to specific processors: * submit= causes the SPEC tools to use this line when submitting jobs. * echo ...> dobmk causes the generated commands to be written to a file, namely dobmk. * pbind -b causes this copy's processes to be bound to the CPU specified by the expression that follows it. See the config file used in the submission for the exact syntax, which tends to be cumbersome because of the need to carefully quote parts of the expression. When all expressions are evaluated, each CPU ends up with exactly one copy of each benchmark. The pbind expression may include: o $SPECUSERNUM: the SPEC tools-assigned number for this copy of the benchmark. o expr: Calculate simple arithmetic expressions. For example, the effect of binding jobs to a (quote-resolved) expression such as: expr ( $SPECUSERNUM / 4 ) * 8 + ($SPECUSERNUM % 4 ) ) would be to send the jobs to processors whose numbers are: 0,1,2,3, 8,9,10,11, 16,17,18,19 ... o psrinfo: find out what processors are available o grep on-line: search the psrinfo output for information regarding on-line cpus o awk...print \$1: Pick out the line corresponding to this copy of the benchmark and use the CPU number mentioned at the start of this line. * sh dobmk actually runs the benchmark. Kernel Parameters (/etc/system): autoup= (Unix) When the file system flush daemon fsflush runs, it will write to disk all modified file buffers that are more than n seconds old. tune_t_fsflushr= (Unix) Controls the number of seconds between runs of the file system flush daemon,