Description of compiler flags for Intel C++ Compiler 8.1 -------------------------------------------------------- -O1 optimize for speed, but disable some optimizations which increase code size for a small speed benefit. Includes inline expansion except for intrinsic functions, global optimizations, string pooling optimizations. -O2 Optimizes for speed. The -O2 option includes the following options: -Og, Ot, -Oy, -Ob1, and -Gs This options defaults to ON. This option also enables. * inlining of intrinsics * Intra-file interprocedural optimizations including: * inlining * constant propagation * forward substitution * routine attribute propagation * variable address-taken analysis * dead static function elimination * removal of unreferenced variables. * The following performance optimizations: * copy propogation. * dead-code elimination * global register allocation * global instruction scheduling and control speculation * loop unrolliing * optimized code selection * partial redundancy elimination * strength reduction/induction variable simplification * variable renaming * exception handling optimizations * tail recursions * peephole optimizations * structure assignment lowering and optimizations * dead store elimination -O3: Optimizes for speed. Enables high-level optimization. This level does not guarantee higher performance. Using this option may increase the compilation time. Impact on performance is application dependent, some applications may not see a performance improvement. The optimizations include: * All optimizations done with -O2 * loop unrolling, including instruction scheduling * code replication to eliminate branches * padding the size of certain power-of-two arrays to allow more efficient cache use. * When used with -Qax or -Qx, it causes the compiler to perform more aggressive data dependency analysis than for -O2. -Oa[-] assume [do not assume] no aliasing in program -Qax generate code specialized for processor extensions specified by while also generating generic IA-32 code. includes one or more of the following characters: i Pentium Pro and Pentium II processor instructions M MMX(TM) instructions K streaming SIMD extensions (implies i and M above) W Pentium 4 processor with Streaming SIMD Extensions 2 (implies i, M and K) N Pentium 4 processor with Streaming SIMD Extensions 2 P Pentium 4 processor with Streaming SIMD Extensions 3 -Qx generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. ---------------------------------------------------------------------------------- Additional Notes on /QxN and /QxP: ---------------------------------------------------------------------------------- -Qx{N|P} The /QxN and /QxP options target your program to run on Intel Pentium 4 and compatible Intel processors. The resulting code might contain unconditional use of features that are not supported on other processors. Programs, where the function main() is compiled with this option, will detect non compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor specific optimizations. These options also enable advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors. ---------------------------------------------------------------------------------- -Ob{0|1|2} Controls the compiler's inline expansion. 0: disable inlining. 1: disables inlining unless -Qip or -Ob2 are specified. 2: enables inlining of any function. However, the compiler decides which functions are inlined. This option enables interprocedural optimizations and has the same effect as specifying the -Qip option. -Qip enable single-file IP optimizations (within files, same as -Ob2) -Qipo multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propogation - dead code elimination - propagation of function characteristics - passing arguments in registers - loop-invariant code motion -fast The /fast option enhances execution speed across the entire program by including the following options that can improve run-time performance: /O3 (maximum speed and high-level optimizations) /Qipo (enables interprocedural optimizations across files) /QxP (generate code specialized for Intel Pentium 4 processor with Streaming SIMD Extensions 3) To override one of the options set by /fast, specify that option after the /fast option on the command line. The options set by /fast may change from release to release. -Qansi_alias Directs the compiler to assume that the program adheres to the type-based aliasing rules defined in Section 6.5 of the ISO C Standard. If your program adheres to these rules, this option will allow the compiler to optimize more aggressively. If it doesn't adhere to these rules, it can cause the compiler to generate incorrect code. -Qprof_gen instrument program for profiling for the first phase of two-phase profile guided otimization -Qprof_use Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file. Without any other options, the current directory is searched for .dyn files -Qrcd The Intel compiler uses the -Qrcd option to improve the performance of code that requires floating-point-to-integer conversions. The system default floating point rounding mode is round-to-nearest. This means that values are rounded during floating point calculations. However, the C language requires floating point values to be truncated when a conversion to an integer is involved. To do this, the compiler must change the rounding mode to truncation before each floating point-to-integer conversion and change it back afterwards. The -Qrcd option disables the change to truncation of the rounding mode for all floating point calculations, including floating point-to-integer conversions. Turning on this option can improve performance, but floating point conversions to integer will not conform to C semantics. -Qunroll[n] Specifies the maximum number of times to unroll a loop. Omit n to let the compiler decide whether to perform unrolling or not. Use n = 0 to disable unroller. If n is not specified, the compiler automatically chooses the maximum number of times to unroll a loop. -GX Enables the full C++ Exception Handling unwind semantics. -GR Enables C++ Runtime Type Information (RTTI). -Qcxx_features Enables both -GX and -GR as described above so C++ Runtime Type Information and Exception Handling are both enabled -Zp{1|2|4|8|16} Specifies the strictest alignment constraint for structure and union types as one of the following: 1, 2, 4, 8, or 16 (default) bytes. -Qprefetch[-] Enables [disables] the insertion of software prefetching by the compiler. Default is /Qprefetch. shlW32M.lib: MicroQuill SmartHeap Library available from http://www.microquill.com/ Description of compiler flags for Intel FORTRAN Compiler 8.1 ------------------------------------------------------------- -O1 optimize for speed, but disable some optimizations which increase code size for a small speed benefit. Includes inline expansion except for intrinsic functions, global optimizations, string pooling optimizations. -O2 This is the default level of optimization. Optimizes for speed. The -O2 option includes O1 optimizations and in addition enables inlining of intrinsics and more speed optimizations. -O3: Builds on -01 and -02 optimizations by enabling high-level optimization. This level does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times. -Qax generate code specialized for processor extensions specified by while also generating generic IA-32 code. includes one or more of the following characters: i Pentium Pro and Pentium II processor instructions M MMX(TM) instructions K streaming SIMD extensions (implies i and M above) W Pentium 4 processor with Streaming SIMD Extensions 2 (implies i, M and K) N Pentium 4 processor with Streaming SIMD Extensions 2 P Pentium 4 processor with Streaming SIMD Extensions 3 -Qx generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. ---------------------------------------------------------------------------------- Additional Notes on /QxN and /QxP: ---------------------------------------------------------------------------------- -Qx{N|P} The /QxN and /QxP options target your program to run on Intel Pentium 4 and compatible Intel processors. The resulting code might contain unconditional use of features that are not supported on other processors. Programs, where the function main() is compiled with this option, will detect non compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor specific optimizations. These options also enable advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors. ---------------------------------------------------------------------------------- -Qip enable single-file IP optimizations (within files, same as -Ob2) -Qipo multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propogation - dead code elimination - propagation of function characteristics - passing arguments in registers - loop-invariant code motion -fast The /fast option enhances execution speed across the entire program by including the following options that can improve run-time performance: -O3 (maximum speed and high-level optimizations) -Qipo (enables interprocedural optimizations across files) -QxP (generate code specialized for Intel Pentium 4 processor with Streaming SIMD Extensions 3) To override one of the options set by /fast, specify that option after the /fast option on the command line. The options set by /fast may change from release to release. -Qansi_alias Enables (default) or disables the compiler to assume that the program adheres to the ANSI Fortran type aliasablility rules. For example, an object of type real cannot be accessed as an integer. You should see the ANSI standard for the complete set of rules -Qprof_gen instrument program for profiling for the first phase of two-phase profile guided otimization -Qprof_use Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file. Without any other options, the current directory is searched for .dyn files -Qrcd Enables fast float-to-int conversion. -Qscalar_rep(-) Enables(disables) scalar replacement performed during loop transformations (requires /O3). -Qauto Causes all variables to be allocated on the stack, rather than in local static storage. Does not affect variables that appear in an EQUIVALENCE or SAVE statement, or those that are in COMMON. Makes all local variables AUTOMATIC, same as /4Ya. -Qprefetch[-] Enables [disables] the insertion of software prefetching by the compiler. Default is /Qprefetch. Other Notes: ------------ "/" and "-" are both allowable starting tokens for flags passed to the compiler i.e. -QxK and /QxK are identical switches. Portability options for CPU2000: ------------------------------- 176.gcc: -Dalloca=_alloca : so as to use the built-in optimized alloca -Fn : 176.gcc uses alloca and this options tells the linker to pre-allocate n bytes of stack. The default amount of stack allocated is not enough and 176.gcc crashes with a run-time error 178.galgel: -FI : Fixed-format F90 source code. -F32000000 : Same as with 176.gcc, pre-allocates a 32MB stack 186.crafty: -DNT_i386 : Specifies that it is a Windows NT Intel processor-based system which makes the compiler use "long long" as the 64-bit variable that 186.crafty needs. 253.perlbmk: -DSPEC_CPU2000_NTOS : This enables the code changes for porting to Windows get included -DPERLDLL : On Windows, we need a perl.exe instead of a perl.exe and perl.dll. This pre-define ensures that the changes necessary to get a single, UNIX-style executible without getting the indirect calls that can cause a 10% performance degradation. This allows the Windows-based executible to be as close as possible to the Unix-based one. -MT : Use the static multi-threaded library else it will not compile. 254.gap: -DSYS_HAS_CALLOC_PROTO : -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence of malloc and calloc prototypes. Description of compiler flags for Intel FORTRAN Compiler 8.1 ------------------------------------------------------------- -fast The -fast option enhances execution speed across the entire program by including the following options that can improve run-time performance: -O3 (maximum speed and high-level optimizations). -Qipo (enables interprocedural optimizations across files). -QxP (specific optimization for Intel Pentium 4 processor with Streaming SIMD Extensions 3). The -fast option does not include -QxP when compiling on ItaniumŪ-based systems. To override one of the options set by -fast, specify that option after the /fast option on the command line. To target -fast optimizations for a specific processor, use one of the -Qx options. For example: prompt>icl -fast -QxW source_file.cpp The options set by -fast may change from release to release. -Gs[n] Disables stack-checking for routines with n or more bytes of local variables and compiler temporaries. Default: n=4096 -inline:speed Enable speed optimizations (same as -Ob2 -Ot) -[no]f77rtl Specifies that the FORTRAN-77-specific run-time support should be used. -[no]fpp Determines whether the Fortran preprocessor is run on source files prior to compilation. -O1 Optimizes to favor code size and code locality. Disables loop unrolling. /O1 may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops. In most cases /O2 is recommended over /O1. IA-32 systems: Enables options /Og, /Oi-, /Os, /Oy, /Ob1, and /Gs. Disables intrinsics inlining to reduce code size. -O2 This is the default level of optimization. Optimizes for code speed. This is the generally recommended optimization level. IA-32 systems: Enables options /Og, /Os, /Oy, /Ob1, and /Gs.. -O3 Enables /O2 optimizations and more aggressive optimizations such as loop and memory access transformation. The /O3 optimizations may slow down code in some cases compared to /O2 optimizations. Recommended for applications that have loops with heavy use of floating-point calculations and process large data sets. IA-32 systems: In conjunction with /Qax{K|W|N|B|P} and /Qx{K|W|N|B|P} options, this option causes the compiler to perform more aggressive data dependency analysis than for /O2. This may result in longer compilation times. -Oa[-] Assume [do not assume] no aliasing in program -Obn Controls the compiler's inline expansion. The amount of inline expansion performed varies with the value of n as follows: 0: Disables inlining. 1: Enables (default) inlining of functions declared with the __inline keyword. Also enables inlining according to the C++ language. 2: Enables inlining of any function. However, the compiler decides which functions to inline. Enables interprocedural optimizations and has the same effect as /Qip. -Og Enables global optimizations. -Oi[-] Enables [disables] inline expansion of intrinsic functions. -Os Enables most speed optimizations, but disable optimizations that increase code size for a small speed benefit. -Ot Enables all speed optimizations. -Oy[-] Enables [disables] the use of the EBP register in optimizations. When you disable with /Oy-, the EBP register is used as frame pointer. -Qansi_alias Enables (default) or disables the compiler to assume that the program adheres to the ANSI Fortran type aliasablility rules. For example, an object of type real cannot be accessed as an integer. You should see the ANSI standard for the complete set of rules -Qauto Causes all variables to be allocated on the stack, rather than in local static storage. Does not affect variables that have the SAVE attribute or appear in an EQUIVALENCE statement or common block; same as -automatic, -auto or -4Ya. Opposite of -Qsave. If -recursive or -Qopenmp is specified, the default is -Qauto. -Qax Generate code specialized for processor extensions specified by while also generating generic IA-32 code. includes one or more of the following characters: i Pentium Pro and Pentium II processor instructions M MMX(TM) instructions K streaming SIMD extensions (implies i and M above) W Pentium 4 processor with Streaming SIMD Extensions 2 (implies i, M and K) N Pentium 4 processor with Streaming SIMD Extensions 2 P Pentium 4 processor with Streaming SIMD Extensions 3 -Qx Generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. ---------------------------------------------------------------------------------- Additional Notes on /QxN and /QxP: ---------------------------------------------------------------------------------- -Qx{N|P} The /QxN and /QxP options target your program to run on Intel Pentium 4 and compatible Intel processors. The resulting code might contain unconditional use of features that are not supported on other processors. Programs, where the function main() is compiled with this option, will detect non compatible processors and generate an error message during execution. This option also enables new optimizations in addition to Intel processor specific optimizations. These options also enable advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors. ---------------------------------------------------------------------------------- -Qip enable single-file IP optimizations (within files, same as -Ob2) -Qipo[n] Enables multifile interprocedural (IP) optimizations (between files). When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files. n is an optional integer that specifies the number of object files the compiler should create. Any integer greater than or equal to 0 is valid. If n is 0, the compiler decides whether to create one or more object files based on an estimate of the size of the object file. It generates one object file for small applications, and two or more object files for large applications. If n is greater than 0, the compiler generates n object files, unless n exceeds the number of source files (m), in which case the compiler generates only m object files. If you do not specify n, the default is 1. Multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propogation - monitoring module-level static variables - dead code elimination - propagation of function characteristics - multifile optimization - passing arguments in registers - loop-invariant code motion -Qoption, tool,list Passes an argument list to another program in the compilation sequence, such as the assembler or linker. The parameter 'tool' can be: fpp Specifies the Intel Fortran preprocessor f Specifies the Fortran compiler asm Specifies the assembler link Specifies the linker -Qoption can be used with the -Qipo flag to refine IPO. The valid options that can be used for this purpose are: -ip_args_in_regs=0 Disables the passing of arguments in registers. -ip_ninl_max_stats=n Sets the valid max number of intermediate language statements for a function that is expanded in line. The number n is a positive integer. The number of intermediate language statements usually exceeds the actual number of source language statements. The default value for n is 230. The compiler uses a larger limit for user inline functions. -ip_ninl_min_stats=n Sets the valid min number of intermediate language statements for a function that is expanded in line. The number n is a positive integer. The default values for ip_ninl_min_stats are: IA-32 compiler: ip_ninl_min_stats = 7 -ip_ninl_max_total_stats=n Sets the maximum increase in size of a function, measured in intermediate language statements, due to inlining. n is a positive integer whose default value is 2000. -Qparallel Automatically detects loops capable of being executed safely in parallel and generates multithreaded code for these loops. -Qprec Improves floating-point precision. Some speed impact. -Qprefetch[-] Enables [disables] the insertion of software prefetching by the compiler (requires -O3). Default is /Qprefetch. -Qprof_gen Instrument program for profiling for the first phase of two-phase profile guided otimization -Qprof_use Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file. Without any other options, the current directory is searched for .dyn files -Qrcd Enables fast float-to-int conversion. -Qscalar_rep[-] Enables [disables] scalar replacement performed during loop transformations (requires -O3). Additional Libraries Used ------------------------- Supplied by MicroQuill: shlW32M.lib: MicroQuill SmartHeap Library 7.0 available from http://www.microquill.com/ Other Notes: ------------ "/" and "-" are both allowable starting tokens for flags passed to the compiler i.e. -QxK and /QxK are identical switches. Portability options for CPU2000: ------------------------------- 176.gcc: -Dalloca=_alloca : so as to use the built-in optimized alloca -F10000000 : 176.gcc uses alloca and this options tells the linker to pre-allocate n bytes of stack. The default amount of stack allocated is not enough and 176.gcc crashes with a run-time error 178.galgel: -FI : Fixed-format F90 source code. -F32000000 : Same as with 176.gcc, pre-allocates a 32MB stack 186.crafty: -DNT_i386 : Specifies that it is a Windows NT Intel processor-based system which makes the compiler use "long long" as the 64-bit variable that 186.crafty needs. 253.perlbmk: -DSPEC_CPU2000_NTOS : This enables the code changes for porting to Windows get included -DPERLDLL : On Windows, we need a perl.exe instead of a perl.exe and perl.dll. This pre-define ensures that the changes necessary to get a single, UNIX-style executible without getting the indirect calls that can cause a 10% performance degradation. This allows the Windows-based executible to be as close as possible to the Unix-based one. -MT : Use the static multi-threaded library else it will not compile. 254.gap: -DSYS_HAS_CALLOC_PROTO : -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence of malloc and calloc prototypes.