{\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf410 {\fonttbl\f0\fswiss\fcharset77 Helvetica;} {\colortbl;\red255\green255\blue255;} \margl1440\margr1440\vieww16520\viewh19720\viewkind0 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural\pardirnatural \f0\fs24 \cf0 Rackable Systems, Inc.\ SPEC CPU2000 Compiler flag descriptions for:\ \ PathScale EKOPath(TM) Compiler Suite (Fortran, C and C++ compilers)\ \ \ \ \ Portability Flags:\ \ -DSPEC_CPU2000_LP64 Compile using LP64 programming model. \ -DLINUX_i386 Linux Intel system, use "long long" as\ 64bit variable. \ -DHAS_ERRLIST Prog env provides specification for\ "sys_errlist[]".\ -DSPEC_CPU2000_NEED_BOOL Use SPEC provided definition of the boolean type.\ -DSPEC_CPU2000_LINUX_I386 Compile for an I386 system running Linux.\ -DPSEC_CPU2000_GLIBC22 Compatibility with 2.2 & later versions of glibc\ -DSYS_IS_USG Specifies that the operating system is\ USG compliant. \ -DSYS_HAS_TIME_PROTO Do not explicitly declare time().\ -DSYS_HAS_IOCTL_PROTO Do not explicitly declare ioctl().\ -DSYS_HAS_CALLOC_PROTO Do not explicitly declare calloc().\ -fixedform tells f90 compiler to use fixed format\ (F77 72 column format), instead of F90 free format. \ \ \ Optimization Flags:\ \ Some suboptions either enable or disable the feature. To enable a feature, \ either specify only the suboption name or specify =1, =ON, or =TRUE. Disabling \ a feature, is accomplished by adding =0, =OFF, or =FALSE. These values are\ insensitive to case: 'on' & 'ON' mean the same thing. Below, ON & OFF indicate \ the enabling or disabling of a feature.\ \ -CG[:...] \ Code Generation option group: control the optimizations \ and transformations of the instruction-level code generator.\ \ -CG:cflow=(ON|OFF)\ A value of OFF disables control flow optimization in the code \ generation. Default is ON. \ \ -CG:gcm=(ON|OFF)\ Specifying OFF disables the instruction-level global code \ motion optimization phase. The default is ON.\ \ -CG:load_exe=n \ Specifies the threshold for subsuming a memory load operation into \ the operand of an arithmetic instruction. The value of 0 turns \ off this subsumption optimization. The default is 1, when this \ subsumption is performed only when the result of the load has only \ one use. This subsumption is not performed if the number of times \ the result of the load is used exceeds the value n, a non-negative \ integer.\ \ -CG:local_fwd_sched=(ON|OFF)\ Changes the instruction scheduling algorithm to work forward \ instead of backward for the instructions in each basic block.\ The default is OFF.\ \ -CG:movnti=N\ Convert ordinary stores to non-temporal stores when writing memory\ blocks of size larger than N KB. When N is set to 0, this \ transformation is avoided. The default value is 120 (KB).\ \ -CG:p2align=(ON|OFF)\ Align loop heads to 64-byte boundaries. The default is\ OFF.\ \ -CG:p2align_freq=n\ Aligns branch targets based on execution frequency. This option\ is meaningful only under feedback-directed compilation. The\ default value n=0 turns off the alignment optimization. Any \ other value specifies the frequency threshold at or above which \ this alignment will be performed by the compiler.\ \ -CG:prefetch=(ON|OFF) \ Turning this OFF suppresses any generation of prefetch instructions \ in the code generator. This has the same effect as -LNO:prefetch=0.\ The default is ON which implies using default prefetch algorithms.\ \ -CG:prefetchnta=(ON|OFF) \ Prefetch when data is non-temporal at all levels of the cache\ hierarchy. This is for data streaming situations in which the\ data will not need to be re-used soon. The default is OFF.\ \ -fb_create \ Used to specify that an instrumented executable program\ is to be generated. Such an executable is suitable for\ producing feedback data files with the specified prefix\ for use in feedback-directed compilation (FDO). The commonly \ used prefix is "fbdata". This is OFF by default.\ \ -fb_opt \ Used to specify feedback-directed compilation (FDO) by extracting\ feedback data from files with the specified prefix, which were\ previously generated using -fb_create. The commonly used prefix\ is "fbdata". This optimization is off by default.\ \ -fno-exceptions\ Tells the compiler that the program does not use exception\ handling, so it can perform more aggressive optimization in\ the code. The generation of exception handling constructs \ is also suppressed. Under this flag, code that uses exception\ handling cannot be guaranteed to work correctly. Note that\ the absence of exception handling construct does not mean\ that the function can be compiled with this flag. For\ exception handling to work preperly, the scopes\ crossed between throwing and catching an exception must\ all have been compiled with exceptions on. \ \ -fno-math-errno \ Do not set ERRNO after calling math functions that are executed\ with a single instruction, e.g., sqrt. A program that relies \ on IEEE exceptions for math error handling may want to use this \ flag for speed while maintaining IEEE arithmetic compatibility.\ This is implied by -Ofast. The default is -fmath-errno.\ \ -GRA:optimize_boundary=(ON|OFF)\ Allow the Global Register Allocator to allocate the same \ register to different variables in the same basic-block. \ Default is OFF. \ \ -INLINE:aggressive=(ON|OFF)\ Tells the compiler to be more aggressive about inlining. The\ default is -INLINE:aggressive=OFF.\ \ -IPA[:...]\ IPA option group: control the inter-procedural analyses and\ transformations performed. Note that giving just the group name\ without any options, i.e., -IPA, will invoke the interprocedural\ analyzer. -IPA is off by default unless -Ofast is specified.\ \ -ipa Same as -IPA alone.\ \ -IPA:callee_limit=(n)\ Functions whose size exceeds this limit will never be\ automatically inlined by the compiler. The default is n=2000.\ \ -IPA:ctype=(ON|OFF) \ Turns on optimizations that speed up interfaces to the constructs \ defined in ctype.h by assuming that the program will not be run \ in a multi-threaded environment. The default is OFF.\ \ -IPA:field_reorder=(ON|OFF)\ Enables the re-ordering of fields in large structs based\ on their reference patterns in feedback compilation to\ minimize data cache misses. The default is OFF.\ \ -IPA:linear=(ON|OFF)\ Controls conversion of a multi-dimensional array to a single\ dimensional (linear) array that covers the same block of memory.\ When inlining Fortran subroutines, IPA tries to map formal \ array parameters to the shape of the actual parameter. In the \ case that it cannot map the parameter, it linearizes the array \ reference. By default, IPA will not inline such callsites \ because they may cause performance problems. The default is OFF. \ \ -IPA:min_hotness=N\ When feedback information is available, a call site to a \ procedure must be invoked with a count that exceeds the \ threshold specified by N before the procedure will be inlined \ at that call site. The default is 10.\ \ -IPA:plimit=(n)\ Inline calls to a procedure until the procedure has grown to\ size of n. The default is 2500.\ \ -IPA:pu_reorder=(0|1|2)\ Controls the phase that optimizes the layout of the program \ units (functions) in the program.\ 0 = Disables procedure reordering (default)\ 1 = Reorder based on the frequency in which different \ procedures are invoked.\ 2 = Reorder based on caller-callee relationship.\ \ -IPA:small_pu=(n) \ A procedure with size smaller than n is not subjected to the\ plimit restriction.The default is n=30\ \ -IPA:space=N\ Inline until a program expansion of N% is reached. For example, \ -IPA:space=20 limits code expansion due to inlining to approx-\ imately 20%. Default is no limit.\ \ -IPA:use_intrinsic[=(ON|OFF)]\ Enable/disable loading the intrinsic version of standard library\ functions. The default is OFF.\ \ -L/opt/acml2.7.0/pathscale64/lib -lacml\ The flags above are needed to use the PathScale compiler to link \ with the ACML (AMD Core Math Library) 2.7.0 library. The \ PathScale-compiled, 64-bit version of ACML that gets installed \ at /opt/acml2.7.0/gnu64 by default. ACML is available as a free \ download from http://developer.amd.com/acml.aspx.\ \ -LNO:\ option group specifies options and transformations performed\ on loop nests. The -LNO: option group is enabled only if the -O3\ option is also specified on the compiler command line.\ \ -LNO:blocking[=(ON|OFF)]\ Enable/disable the cache blocking transformation. The default\ is on at -O3 or higher.\ \ -LNO:fission=(0|1|2)\ This option controls loop fission. The options can be one of the \ following:\ \ 0 = Disables loop fission (default)\ \ 1 = Performs normal fission as necessary\ \ 2 = Specifies that fission be tried before fusion\ \ If -LNO:fission=1:fusion=1 or -LNO:fission=2:fusion=2 are spec- \ ified, then fusion is performed.\ \ \ -LNO:full_unroll,fu=N\ Fully unroll innermost loops with trip_count <= N inside LNO. \ N can be any integer between 0 and 100. The default value for N \ is 5. Setting this flag to 0 disables full unrolling of small \ trip count loops inside LNO.\ \ -LNO:full_unroll_size=N\ Fully unroll innermost loops with unrolled loop size <= N inside \ LNO. N can be any integer between 0 and 10000. The conditions\ implied by the full_unroll option must also be satisfied for \ the loop to be fully unrolled. The default value for N is 1600.\ \ -LNO:full_unroll_outer=(ON|OFF)\ Control the full unrolling of loops with known trip count that \ do not contain a loop and are not contained in a loop. The \ conditions implied by both the full_unroll and the \ full_unroll_size options must be satisfied for the loop to be \ fully unrolled. The default is OFF.\ \ -LNO:fusion=n\ Perform loop fusion, n: 0 - off, 1 - conservative, 2 - aggressive.\ The default is 1.\ \ -LNO:prefetch[=(0|1|2|3)]\ Specify level of prefetching.\ 0 = Prefetch disabled.\ 1 = Prefetch is done only for arrays that are always \ referenced in each iteration of a loop, the default.\ 2 = Prefetch is done without the above restrictions.\ 3 = Most aggressive.\ \ -LNO:prefetch_ahead=n\ Prefetch n cache line(s) ahead. The default is 2.\ \ -LNO:simd=(0|1|2)\ This option enables or disables inner loop vectorization.\ \ 0 = Turn off the vectorizer.\ \ 1 = (Default) Vectorize only if the compiler can determine that\ there is no undesirable performance impact due to sub-optimal \ alignment. Vectorize only if vectorization does not introduce \ accuracy problems with floating-point operations.\ \ 2 = Vectorize without any constraints (most aggressive).\ \ -m32 \ Generates code according to the 32-bit ABI, also known as x86 \ or IA32.\ \ -m64 \ Compile for 64-bit ABI, also known as AMD64, x86_64, or IA32e. \ This is the default.\ \ -m3dnow Enable use of 3DNow instructions. The default is OFF.\ \ -mcpu=(opteron|athlon64|athlon64fx|em64t|pentium4|xeon|anyx86|auto)\ Compiler will optimize code for selected platform. auto means to \ optimize for the platform that the compiler is running on, which \ the compiler determines by reading /proc/cpuinfo. anyx86 means a \ generic 32-bit x86 processor without SSE2 support. The \ default is opteron.\ \ -msse2 \ Enable use of SSE2 instructions. This is the default under \ both -m64 and -m32.\ \ -mno-sse2 \ This flag is only applicable to -m32. -mno-sse2 is ignored \ under -m64 with a warning.\ \ -O or -O2\ Turn on extensive optimization. The optimizations at this level are\ generally conservative, in the sense that they (1) are virtually\ always beneficial, (2) provide improvements commensurate to the\ compile time spent to achieve them, and (3) avoid changes which\ affect such things as floating point accuracy.\ \ -O3 \ Turn on aggressive optimization. The optimizations at this level\ are distinguished from -O2 by their aggressiveness, generally\ seeking highest-quality generated code even if it requires extensive\ compile time. They may include optimizations which are generally\ beneficial but occasionally hurt performance. This includes but \ is not limited to turning on the Loop Nest Optimizer, -LNO:opt=1, \ and setting -OPT:ro=1:IEEE_arith=2:Olimit=9000.\ \ -Ofast Equivalent to "-O3 -ipa -OPT:Ofast -fno-math-errno." -OPT:Ofast is\ described below.\ \ -OPT:alias=\ Specifies the pointer aliasing model to be used. By\ specifiying one or more of the following for , the\ compiler is able to make assumptions throughout the compilation:\ typed Assume that the code adheres to the ANSI/ISO C\ standard which states that two pointers of different\ types cannot point to the same location in memory.\ This is on by default when -Ofast is specified.\ \ restrict Specifies that distinct pointers are assumed\ to point to distinct, non-overlapping objects.\ This is off by default.\ \ disjoint Specifies that any two pointer expressions are\ assumed to point to distinct, non-overlapping objects.\ This is off by default.\ \ -OPT:div_split=(ON|OFF)\ Enable/disable changing x/y into x*(recip(y)). This is \ OFF by default but is enabled by -OPT:Ofast or \ -OPT:IEEE_arithmetic=3.\ \ -OPT:early_intrinsics=(ON|OFF)\ When ON, this option causes calls to intrinsics to be \ expanded to inline code early in the backend compilation.\ This may enable more vectorization opportunities if vector\ forms of the expanded operations exist. Default is OFF.\ \ -OPT:fast_complex=(ON|OFF)\ Setting fast_complex=ON enables fast calculations for values \ declared to be of type complex. When this is set to ON, \ complex absolute value (norm) and complex division use fast \ algorithms that are more likely to overflow or underflow than\ the standard algorithms. OFF is the default. fast_complex=ON \ is enabled if -OPT:roundoff=3 is in effect.\ \ -OPT:fast_nint=(ON|OFF)\ This option uses a hardware feature to implement NINT and ANINT \ (both single- and double-precision versions). Default is OFF but \ fast_nint=ON is enabled by default if -OPT:ro=3 is in effect.\ \ -OPT:goto=(OFF|ON)\ Disable/enable the conversion of GOTOs into higher level\ structures like FOR loops. The default is ON for -O2 or higher.\ \ -OPT:IEEE_arithmetic,IEEE_arith,IEEE_a=(n)\ specify level of conformance to IEEE 754 floating pointing\ roundoff/overflow behavior. n can be one of the following:\ \ 1 Adheres to IEEE accuracy. This is the default when \ optimization levels -O0, -O1 and -O2 are in effect.\ \ 2. May produce inexact result not conforming to IEEE 754.\ This is the default when -O3 is in effect.\ \ 3. All mathematically valid transformations are allowed. \ \ -OPT:IEEE_NaN_Inf=(ON|OFF)\ OFF specifies non-IEEE-754 results in operations that might \ have IEEE 754 NaN or infinity operands; this enables many\ optimizations which would be invalid for NaN or infinity\ operands. The default is ON.\ \ -OPT:transform_to_memlib=(ON|OFF)\ When ON, this option enables transformation of loop constructs \ to calls to memcpy or memset. Default is ON when target \ processor is EM64T, OFF otherwise.\ \ -OPT:Ofast\ Use optimizations selected to maximize performance. \ Although the optimizations are generally safe,\ they may affect floating point accuracy due to rearrangement\ of computations. This effectively turns on the following\ optimizations: -OPT:ro=2:Olimit=0:div_split=ON:alias=typed.\ \ -OPT:Olimit=(n)\ Disable optimization when size of program unit is > n. When n\ is 0, program unit size is ignored and optimization process\ will not be disabled due to compile time limit. The default is\ 0 when -Ofast is specified, otherwise the default is 6000\ under -O2 and 9000 under -O3.\ \ -OPT:roundoff,ro=(n)\ Specifies the level of acceptable departure from source\ language floating-point, round-off, and overflow semantics. n\ can be one of the following:\ \ 0 Inhibits optimizations that might affect the\ floating-point behavior. This is the default when\ optimization levels -O0, -O1, and -O2 are in effect.\ \ 1 Allows simple transformations that might cause limited\ round-off or overflow differences. Compounding such\ transformations could have more extensive effects.\ This is the default level when -O3 is in effect.\ \ 2 Allows more extensive transformations, such as the\ reordering of reduction loops. This is the default \ level when -Ofast is specified.\ \ 3 Enables any mathematically valid transformation.\ \ -OPT:treeheight=(ON|OFF)\ The value ON turns on re-association in expressions to reduce \ the expressions' tree height. The default value is OFF.\ \ -OPT:unroll_analysis=(ON|OFF)\ The default value of ON lets the compiler analyze the\ content of the loop to determine the best unrolling\ parameters, instead of strictly adhering to the\ -OPT:unroll_times_max and -OPT:unroll_size parameters.\ \ -OPT:unroll_times_max,unroll_times=(n)\ Unroll inner loops by a maximum of n. The default is 4.\ \ -OPT:unroll_size=(n)\ Sets the ceiling of maximum number of instructions for an\ unrolled inner loop. If n = 0, the ceiling is disregarded.\ \ -static\ Suppresses dynamic linking at run-time for shared libraries; \ uses static linking instead.\ \ -TENV:X=(0|1|2|3|4)\ Specify the level of enabled exceptions that will be assumed\ for purposes of performing speculative code motion (default\ is 1 at all optimization levels). In general, an instruction\ will not be speculated (i.e. moved above a branch by the\ optimizer) unless any exceptions it might cause are disabled\ by this option. At level 0, no speculative code motion may\ be performed. At level 1, safe speculative code motion may\ be performed, with IEEE-754 underflow and inexact exceptions\ disabled. At level 2, all IEEE-754 exceptions are disabled\ except divide by zero. At level 3, all IEEE-754 exceptions\ are disabled including divide by zero. At level 4, memory\ exceptions may be disabled or ignored.\ \ -TENV:frame_pointer=(ON|OFF)\ Default is ON for C++ and OFF otherwise.\ Local variables in the function stack frame are addressed via \ the frame pointer register. Ordinarily, the compiler will \ replace this use of frame pointer by addressing local variables \ via the stack pointer when it determines that the stack pointer \ is fixed throughout the function invocation. This frees up the \ frame pointer for other purposes. Turning this flag on forces \ the compiler to use the frame pointer to address local variables. \ This flag defaults to on for C++ because the exception handling \ mechanism relies on the frame pointer register being used to \ address local variables. This flag can be turned off for C++ \ for programs that do not throw exceptions. \ \ -Wl,-x\ Passes the -x option to the linker. With this flag set, the\ linker will not preserve local (non-global) symbols in the output\ symbol table. The linker enters external and static symbols\ only. This option conserves space in the output file. This is\ OFF by default.\ \ -WOPT:aggstr=N\ This controls the aggressiveness of the strength reduction optimiz-\ ation performed by the scalar optimizer, in which induction\ expressions within a loop are replaced by temporaries that are \ incremented together with the loop variable. When strength\ reduction is overdone, the additional temporaries increase \ register pressure, resulting in excessive register spills that\ decrease performance. The value specified must be a positive \ integer value, which specifies the maximum number of induction\ expressions that will be strength-reduced across an index variable \ increment. When set at 0, strength reduction is only per-\ formed for non-trivial induction expressions. The default is 11.\ \ -WOPT:if_conv=(0|1|2)\ Controls the optimization that translates simple IF statements \ to conditional move instructions in the target CPU. Setting to\ 0 suppresses this optimization. The value of 1 designates \ conservative if-conversion, in which the context around the IF\ statement is used in deciding whether to if-convert. The value \ of 2 enables aggressive if-conversion by causing it to be per-\ formed regardless of the context. The default is 1.\ \ -WOPT:mem_opnds=(ON|OFF) \ ON makes the scalar optimizer preserve any memory operands of \ arithmetic operations so as to help bring about subsumption of \ memory loads into the operands of arithmetic operations. Load \ subsumption is the combining of an arithmetic instruction and \ a memory load into one instruction. The default is OFF.\ \ -WOPT:retype_expr=(ON|OFF) \ ON enables the optimization in the compiler that converts 64-bit\ address computation to use 32-bit arithmetic as much as\ possible. The default is OFF.\ \ -WOPT:unroll=(0|1|2)\ Control the unrolling of innermost loops in the scalar optimizer.\ Setting to 0 suppresses this unroller. The default is 1, which \ makes the scalar optimizer unroll only loops that contain IF\ statements. Setting to 2 makes the unrolling to also apply to \ loop bodies that are straight line code, which duplicates the\ unrolling done in the code generator, and is thus unnecessary.\ The default setting of 1 makes this unrolling complementary to\ what is done in the code generator. This unrolling is not \ affected by the unrolling options under the -OPT group.\ \ -WOPT:val=(0|1|2)\ Controls the number of times the value-numbering optimization is\ performed in the global optimizer, with the default being 1.\ This optimization tries to recognize expressions that will \ compute identical run-time values and changes the program to avoid \ re-computing them.\ \ \ \ taskset Utility\ ---------------\ NAME\ taskset - retrieve or set a processes's CPU affinity\ \ SYNOPSIS\ taskset [options] [mask | list ] [pid | command\ [arg]...]\ \ DESCRIPTION\ taskset is used to set or retrieve the CPU affinity of\ a running process given its PID or to launch a new COM-\ MAND with a given CPU affinity. CPU affinity is a\ scheduler property that "bonds" a process to a given\ set of CPUs on the system. The Linux scheduler will\ honor the given CPU affinity and the process will not\ run on any other CPUs. Note that the Linux scheduler\ also supports natural CPU affinity: the scheduler\ attempts to keep processes on the same CPU as long as\ practical for performance reasons. Therefore, forcing\ a specific CPU affinity is useful only in certain\ applications.\ \ The CPU affinity is represented as a bitmask, with the\ lowest order bit corresponding to the first logical CPU\ and the highest order bit corresponding to the last\ logical CPU. Not all CPUs may exist on a given system\ but a mask may specify more CPUs than are present. A\ retrieved mask will reflect only the bits that corre-\ spond to CPUs physically on the system. If an invalid\ mask is given (i.e., one that corresponds to no valid\ CPUs on the current system) an error is returned. The\ masks are typically given in hexadecimal. For example,\ \ 0x00000001\ is processor #0\ \ 0x00000003\ is processors #0 and #1\ \ 0xFFFFFFFF\ is all processors (#0 through #31)\ \ When taskset returns, it is guaranteed that the given\ program has been scheduled to a legal CPU.\ \ OPTIONS\ -c, --cpu-list\ specifiy a numerical list of processors instead\ of a bitmask. The list may contain multiple\ items, separated by comma, and ranges. For\ example, 0,5,7,9-11.\ \ \ }