Description of compiler flags for Intel C Compiler 5.0.1 -------------------------------------------------------- /O1 optimize for speed, but disable some optimizations which increase code size for a small speed benefit. Includes inline expansion except for intrinsic functions, global optimizations, string pooling optimizations. /O2 Optimizes for speed (DEFAULT). The -O2 option includes O1 optimizations and in addition enables inlining of intrinsics and more speed optimizations. /O3 Builds on -01 and -02 optimizations by enabling high-level optimization. This level does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times. /Oa[-] assume [do not assume] no aliasing in program /Ob{0|1|2} Controls the compiler's inline expansion. 0: disable inlining. 1: disables inlining unless /Qip or /Ob2 are specified. 2: enables inlining of any function. However, the compiler decides which functions are inlined. This option enables interprocedural optimizations and has the same effect as specifying the /Qip option. /Oi[-] enable/disable inline expansion of intrinsic functions /Ow assume no aliasing in program but assume aliasing across function calls. This switch tells the compiler that no aliasing occurs within function bodies but might occur across function calls. After each function call, pointer variables must be reloaded from memory. /Qax generate code specialized for processor extensions specified by while also generating generic IA-32 code. includes one or more of the following characters: i Pentium Pro and Pentium II processor instructions M MMX(TM) instructions K streaming SIMD extensions (implies i and M above) W Pentium 4 processor with Streaming SIMD Extensions 2 (implies i, M and K) /Qx generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. -QxK and -QaxK ensure consistent floating point arithmetic. /Qfp_port round fp results at assignments & casts (some speed impact) /Qip enable single-file IP optimizations (within files, same as /Ob2) /Qipo multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propagation - dead code elimination - propagation of function characteristics - passing arguments in registers - loop-invariant code motion /Qwp_ipo enable multi-file IP optimizations (between files) and make "whole program" assumption that all variables and functions seen in the compiled sources are referenced only within those sources; the user must guarantee that this assumption is safe /Qprefetch is warned and ignored by the Intel C/C++ Compiler /Qprof_gen instrument program for profiling for the first phase of two-phase profile guided optimization /Qprof_use Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file. Without any other options, the current directory is searched for .dyn files /Qrcd The -Qrcd option disables the change to truncation of the rounding mode for all floating point calculations, including floating point-to-integer conversions. Turning on this option can improve performance /GX Enables the full C++ Exception Handling unwind semantics. /GR Enables C++ Runtime Type Information (RTTI). shlW32M.lib: MicroQuill SmartHeap Library 5.0 available from http://www.microquill.com/ Description of compiler flags for Intel FORTRAN Compiler 5.0.1 -------------------------------------------------------------- /O1 optimize for speed, but disable some optimizations which increase code size for a small speed benefit. Includes inline expansion except for intrinsic functions, global optimizations, string pooling optimizations. /O2 Optimizes for speed (DEFAULT). The -O2 option includes O1 optimizations and in addition enables inlining of intrinsics and more speed optimizations. /O3: Builds on -01 and -02 optimizations by enabling high-level optimization. This level does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times. /Oa[-] assume [do not assume] no aliasing in program /Oi[-] enable/disable inline expansion of intrinsic functions /Qauto Causes all variables to be allocated on the stack, rather than in local static storage. /Qax generate code specialized for processor extensions specified by while also generating generic IA-32 code. includes one or more of the following characters: i Pentium Pro and Pentium II processor instructions M MMX(TM) instructions K streaming SIMD extensions (implies i and M above) W Pentium 4 processor with Streaming SIMD Extensions 2 (implies i, M and K above) /Qx generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. /Qfp_port round fp results at assignments & casts (some speed impact) /Qip enable single-file IP optimizations (within files, same as /Ob2) /Qipo multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propagation - dead code elimination - propagation of function characteristics - passing arguments in registers - loop-invariant code motion /Qwp_ipo enable multi-file IP optimizations (between files) and make "whole program" assumption that all variables and functions seen in the compiled sources are referenced only within those sources; the user must guarantee that this assumption is safe /Qprefetch[-] enable(DEFAULT)/disable prefetch insertion (requires /O3) /Qprof_gen instrument program for profiling for the first phase of two-phase profile guided optimization /Qprof_use Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file. Without any other options, the current directory is searched for .dyn files /Qrcd The -Qrcd option disables the change to truncation of the rounding mode for all floating point calculations, including floating point-to-integer conversions. Turning on this option can improve performance /Qscalar_rep[-] Enables (DEFAULT) [disables] scalar replacement performed during loop transformations. /Qunroll[n] Specifies the maximum number of times to unroll a loop. Omit n to let the compiler decide whether to perform unrolling or not. Use n = 0 to disable unroller. Other Notes: ------------ "/" and "-" are both allowable starting tokens for flags passed to the compiler i.e. -QxK and /QxK are identical switches. Portability options for CPU2000: ------------------------------- 176.gcc: -Dalloca=_alloca : so as to use the built-in optimized alloca /Fn : 176.gcc uses alloca and this options tells the linker to pre-allocate n bytes of stack. The default amount of stack allocated is not enough and 176.gcc crashes with a run-time error 178.galgel: -FI : Fixed-format F90 source code. -link -stack:32000000 : Same as with 176.gcc, pre-allocates a 32MB stack -Fe$@ : Tells the compiler to call the executable from pass1 as $@, which evaluates to galgel.exe Without this flag, since we are using the "-link -stack:.." parameter using LDOPT, we would loose the -Fe$@ as defined by the SPEC tools so we just add it in again. 186.crafty: -DNT_i386 : Specifies that it is a Windows NT Intel processor-based system which makes the compiler use "long long" as the 64-bit variable that 186.crafty needs. 253.perlbmk: -DSPEC_CPU2000_NTOS : This enables the code changes for porting to Windows get included -DPERLDLL : On Windows, we need a perl.exe instead of a perl.exe and perl.dll. This pre-define ensures that the changes necessary to get a single, UNIX-style executable without getting the indirect calls that can cause a 10% performance degradation. This allows the Windows-based executable to be as close as possible to the Unix-based one. /MT : Use the static multi-threaded library else it will not compile. 254.gap: -DSYS_HAS_CALLOC_PROTO : -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence of malloc and calloc prototypes. Flag disclosure for the Compaq Visual Fortran 6.6 ********************************************************************* /[no]optimize Syntax: /optimize[:level], /nooptimize, /Od, /Ox, or /Oxp The /optimize option controls the level of optimization performed by the compiler. To provide efficient run-time performance, Visual Fortran increases compile time in favor of decreasing run time. If an operation can be performed, eliminated, or simplified at compile time, the compiler does so rather than have it done at run time. Also, the size of object file usually increases when certain optimizations occur (such as with more loop unrolling and more inlined procedures). In the visual development environment, specify the Optimization Level in the General or Optimizations Compiler Option Category. The /optimize options are: /optimize:0 or /Od /optimize:1 /optimize:2 /optimize:3 /optimize:4, /Ox, and /Oxp /optimize:5 The /optimize options: /optimize:0 or /Od Disables nearly all optimizations. This is the default if you specify /debug (with no keyword). Specifying this option causes certain /warn options to be ignored. Specifying /Od sets the /optimize:0 and /math_library:check options. /optimize:1 Enables local optimizations within the source program unit, recognition of common subexpressions, and expansion of integer multiplication and division (using shifts). /optimize:2 Enables global optimization. This includes data-flow analysis, code motion, strength reduction and test replacement, split-lifetime analysis, and instruction scheduling. Specifying /optimize:2 includes the optimizations performed by /optimize:1. /optimize:3 Enables additional global optimizations that improve speed (at the cost of extra code size). These optimizations include: Loop unrolling, including instruction scheduling Code replication to eliminate branches Padding the size of certain power-of-two arrays to allow more efficient cache use. Specifying /optimize:3 includes the optimizations performed by /optimize:1 and /optimize:2. /optimize:4, /Ox, and /Oxp Enables interprocedure analysis and automatic inlining of small procedures (with heuristics limiting the amount of extra code). Specifying /optimize:4 includes the optimizations performed by /optimize:1 /optimize:2, and /optimize:3. For the DF command, /optimize:4 is the default unless you specify /debug (with no keyword). Specifying /Ox sets: /optimize:4, /math_library:fast, and /assume:nodummy_aliases. Specifying /Oxp sets: /optimize:4, /math_library:check, /assume:nodummy_aliases, and /fltconsistency. /optimize:5 On ia32 systems, activates the loop transformation optimizations (also set by /transform_loops). The loop transformation optimizations are a group of optimizations that apply to array references within loops. These optimizations can improve the performance of the memory system and can apply to multiple nested loops. Loop transformation optimizations include loop blocking, loop distribution, loop fusion, loop interchange, loop scalar replacement, and outer loop unrolling. To determine whether using /optimize:5 benefits your particular program, you should compare program execution timings for the same program (or subprogram) compiled at levels /optimize:4 and /optimize:5. Specifying /optimize:5 includes the optimizations performed by /optimize:1, /optimize:2, /optimize:3, and /optimize:4. /fast Syntax: /fast The /fast option sets several options that generate optimized code forfast run-time performance. Specifying this option is equivalent to specifying: /alignment:(dcommons, records, sequence) /architecture:host /assume:noaccuracy_sensitive /math_library:fast /tune:host /[no]alignment Syntax: /alignment[:keyword...], /noalignment, or /Zpn The /alignment option specifies the alignment of data items in common blocks, record structures, and derived-type structures. The /Zpn option specifies the alignment of data items in derived-type or record structures. The /alignment options are: /align:[no]commons The /align:commons option aligns the data items of all COMMON data blocks on natural boundaries up to four bytes. The default is /align:nocommons (unless /fast is specified), which does not align data blocks on natural boundaries. /align:dcommons The /align:dcommons option aligns the data items of all COMMON data blocks on natural boundaries up to eight bytes. The default is /align:nocommons (unless /fast is specified), which does not align data blocks on natural boundaries. Specifying /fast sets /align:dcommons. /align:[no]records The /align:records option (the default) requests that components of derived types and fields of records be aligned on natural boundaries up to 8 bytes (for derived types with the SEQUENCE statement, see /align:[no]sequence below). The /align:norecords option requests that components and fields be aligned on arbitrary byte boundaries, instead of on natural boundaries up to 8 bytes. /align:[no]sequence The /align:sequence option requests that components of derived types with the SEQUENCE statement will obey whatever alignment rules are currently in use (default alignment rules will align unsequenced components on natural boundaries). The default value (unless /fast is specified) is /align:nosequence, which means that components of derived types with the SEQUENCE property will be packed, regardless of whatever alignment rules are currently in use. Specifying /fast sets /align:sequence. /align:recNbyte or /Zpn The /align:recNbyte or /Zpn options request that fields of records and components of derived types be aligned on the smaller of: The size byte boundary (N) specified. The boundary that will naturally align them. Specifying /align:recNbyte, /Zpn, or /align:[no]records does not affect whether common block fields are naturally aligned or packed. Specifying Is the Same as Specifying /Zp /alignment:records or /align:rec8byte /Zp1 /alignment:norecords or /align:rec1byte /Zp2 /align:rec2byte /Zp4 /align:rec4byte /alignment /Zp8 with /align:dcommons, /alignment:all, or /alignment:(dcommons, records) /noalignment /Zp1, /alignment:none, or /alignment:(nocommons,nodcommons, norecords) /align:rec1byte /align:norecords /align:rec8byte /align:records When you omit the /alignment option, records and components of derived types are naturally aligned, but fields in common blocks are packed. This default is equivalent to: /alignment=(nocommons,nodcommons,records,nosequence) /architecture Syntax: /architecture:keyword The /architecture (/arch) option controls the types of processor specific instructions generated for this program unit. The /arch:keyword option uses the same keywords as the /tune:keyword option. All processors of a certain architecture type (ia32) implement a core set of instructions. Certain (more recent) processor versions include additional instruction extensions. Whereas the /tune:keyword option is primarily used by certain higher level optimizations for instruction scheduling purposes, the /arch:keyword option determines the type of machine-code instructions generated for the program unit being compiled. For ia32 (Intel and AMD) systems, the supported /arch keywords are: /arch:generic Generates code (sometimes called blended code) that is appropriate for processor generations for the architecture type in use. This is the default. Programs compiled on an ia32 system with the generic keyword will run on all ia32 systems. /arch:host Generates code for the processor generation in use on the system being used for compilation. Depending on the host system used on systems, the program may or may not run on other systems: Using /arch:host on a: Intel Pentium processor system selects the pn1 keyword Intel Pentium Pro, Intel Pentium II, or AMD K6 processor system selects the pn2 keyword Intel Pentium III processor system selects the pn3 keyword AMD K6_2 or AMD K6_III processor system selects the k6_2 keyword AMD Athlon processor system selects the k7 keyword Intel Pentium 4 processor system selects the pn4 keyword /arch:pn1 Generates code for the Pentium processor systems. Programs compiled with the pn1 keyword will run correctly on Pentium, Pentium Pro, Pentium II, Pentium III, AMD K6, and higher processors, but should not be run on 486 processors. The pn1 keyword replaces the p5 keyword (specifying /arch:pn1 and /arch:p5 are equivalent). /arch:pn2 Generates code for the Pentium Pro, Pentium II, and AMD K6 processor systems only. Programs compiled with the pn2 or k6 keyword will run correctly on Pentium Pro, Pentium II, AMD K6, Pentium III, and higher processors, but should not be run on 486 or Pentium processors. The pn2 keyword replaces the p6 keyword (specifying /arch:pn2 and /arch:p6 are equivalent). /arch:k6 Generates code for the AMD K6 (same as Pentium II systems) processor systems only. Programs compiled with the k6 or pn2 keyword will run correctly on Pentium Pro, Pentium II, AMD K6, Pentium III, and higher processors, but should not be run on 486 or Pentium processors. /arch:pn3 Generates code for the Pentium III, AMD K6_2, and AMD K6_III processor systems only. Programs compiled with the pn3 keyword will run correctly on Pentium III, AMD K6_2, AMD K6_III, Pentium 4, and higher processors, but should not be run on 486, Pentium, Pentium Pro, Pentium II, or AMD K6 processors. The pn3 keyword replaces the p6p keyword (specifying /arch:pn3 and /arch:p6p are equivalent). /arch:k6_2 Generates code for the AMD K6_2 and AMD K6_III processor systems. Programs compiled with the k6_2 keyword will run correctly on AMD K6_2, AMD K6_III, and AMD AthlonTM processors, but should not be run on 486, Pentium, Pentium Pro, Pentium II (same as AMD K6), Pentium III, or Pentium 4 processors. /arch:k7 Generates code for the AMD Athlon processor systems only. Programs compiled with the k7 keyword will run correctly on AMD Athlon processors, but should not be run on 486, Pentium, Pentium Pro, Pentium II (same as AMD K6), Pentium III, Pentium 4, AMD K6_2, or AMD K6_III processors. /arch:pn4 Generates code for the Pentium 4 processor systems only. Programs compiled with the pn4 keyword will run correctly on Pentium 4 processors, but should not be run on 486, Pentium, Pentium Pro, or Pentium II (same as AMD K6), Pentium III, AMD K6_2, AMD K6_III, or AMD Athlon processors. /assume:[no]accuracy_sensitive Specifying /assume:noaccuracy_sensitive allows the compiler to reorder code based on algebraic identities (inverses, associativity, and distribution) to improve performance. In the visual development environment, specify Allow Reordering of Floating-Point Operations in the Optimizations Compiler Option Category. The numeric results can be slightly different from the default (/assume:accuracy_sensitive) because of the way intermediate results are rounded. Numeric results with /assume:noaccuracy_sensitive are not categorically less accurate. They can produce more accurate results for certain floating-point calculations, such as dot product summations. For example, the following expressions are mathematically equivalent but may not compute the same value using finite precision arithmetic. X = (A + B) - C X = A + (B - C) If you omit /assume:noaccuracy_sensitive and omit /fast, the compiler uses a limited number of rules for calculations, which might prevent some optimizations. If you specify /assume:noaccuracy_sensitive, or if you specify /fast and omit /assume:accuracy_sensitive, the compiler can reorder code based on algebraic identities to improve performance. For more information on /assume:noaccuracy_sensitive, see Arithmetic Reordering Optimizations. /math_library:fast The /math_library option specifies whether argument checking of math routines is done on ia32 systems and the type of math library routines used on Alpha systems. On ia32 systems, /math_library:fast improves performance by not checking the arguments to the math routines. Using /math_library:fast makes tracing the cause of unexpected exceptional values results difficult. On ia32 systems, /math_library:fast does not change the accuracy of calculated floating-point numbers. /tune Syntax: /tune:keyword The /tune option specifies the type of processor-specific machine code instruction tuning for implementations of the processor architecture in use. Tuning for a specific implementation can improve run-time performance; it is also possible that code tuned for a specific processor may run slower on another processor. Regardless of the /tune:keyword option you use, the generated code runs correctly on all implementations of the processor architecture. If you omit /tune:keyword, /tune:generic is used. For ia32 (Intel and AMD) systems, the /tune keywords are: /tune:generic Generates and schedules code (sometimes called blended code) that will execute well for all ia32 systems. This provides generally efficient code for those applications where all ia32 processor generations are likely to be used. This is the default. /tune:host Generates and schedules code optimized for the processor type in use on the processor system being used for compilation. /tune:pn1 Generates and schedules code optimized for the Pentium (586) processor systems. The pn1 keyword replaces the p5 keyword (specifying /tune:pn1 and /tune:p5 are equivalent). /tune:pn2 Generates and schedules code optimized for Pentium Pro, Pentium II, and AMD K6 processor systems. The pn2 keyword replaces the p6 keyword (specifying /tune:pn2 and /tune:p6 are equivalent). /tune:k6 Generates and schedules code optimized for AMD K6, Pentium Pro, and Pentium II processor systems (/tune:pn2 and /tune:k6 are the same). /tune:pn3 Generates and schedules code optimized for Pentium III, AMD K6_2, and AMD K6_III processor systems. The pn3 keyword replaces the p6p keyword (specifying /tune:pn3 and /tune:p6p are equivalent). /tune:k6_2 Generates and schedules code optimized for AMD K6_2 and AMD K6_III processor systems. /tune:k7 Generates and schedules code optimized for AMD Athlon processor systems. /tune:pn4 Generates and schedules code optimized for Pentium IV processor systems.