=============================================== HP-UX Flag Descriptions for OMP2001 - Mar 2002 =============================================== ----------------------------------------------- Flags for HP-UX C Compiler ----------------------------------------------- +Oall Apply maximum optimization to achieve the best runtime performance. This option is equivalent to specifying +Oaggressive and +Onolimit on the same command line. The +Oall option automatically invokes the highest level of optimization. (+O4 for C) +Olevel Invoke optimizations selected by level. These can be preceded by either +O or -O. Defined values for level are: 0 Perform no optimizations. 1 Perform optimizations within basic blocks only. This is the default. 2 Perform level 1 and global optimizations. Same as -O and +O. 3 Perform level 2 as well as interprocedural global optimizations. 4 Perform level 3 as well as doing link time optimizations. NOTE: +Oprocelim is the general default at all levels, unless the users says +ild +ildrelink or -b. NOTE: +O4 is only supported with +P. Otherwise, attempts to activate +O4 will cause the compiler to automatically drop to +O3. +Oaggressive Apply aggressive optimizations. These include new optimizations as well as optimizations invoked by the following option settings: +Oentrysched +Olibcalls +Onofltacc +Onoinitcheck +FPD +Oentrysched Perform instruction scheduling on a subprogram's entry and exit code sequences. This option can be used at optimization level 1 and higher. The default is +Onoentrysched. +Olibcalls Use low-call-overhead versions of select library routines. This option can be used at any level. At optimization level 0 or 1, the default is +Onolibcalls; at optimization level 2 or higher, the default is +Olibcalls. +Onofltacc Enable floating-point optimizations that can result in numerical differences. Including contractions, such as fused multiply-add (FMA), also allows floating point optimizations which may affect the generation and propagation of infinities, NaNs, and the sign of zero. Permits optimizations, such as reordering of expressions, even if parenthesized, that may affect a rounding error. FMA instructions can improve performance of floating-point applications and are available only on PA-RISC 2.0 systems or later. +Onoinitcheck Disable initialization of any local, scalar, automatic variable that is found to be uninitialized. This option can be used at optimization level 2 and higher. The default is to enable initialization if the variable is uninitialized with respect to every path leading to its use. +FPD Specify how the run time environment for floating-point operations should be initialized at program start up. The default is that all trapping behaviors are disabled. See ld(1) for specific values of flags. To dynamically change these settings at run time, refer to fesetenv(3M). D (d) Enable sudden underflow (flush to zero) of denormalized values. +Onolimit Do not suppress optimizations that significantly increase compile-time or consume enormous amounts of memory. +DA2.0W (+DAmodel) Generate code for a specific version of the PA- RISC architecture. For a PA-RISC 2.0 64-bit executable specify +DA2.0W. +Oopenmp Enable Openmp Directives. +Oinfo Provide feedback information about the optimization process. This option is most useful at optimization levels 3 and 4. The default is +Onoinfo. -Wl,-aarchive (ld option -a search) Specifies library search order. Archive causes archive libraries to be searched only rather than shared libraries. Specify whether shared or archive libraries are searched with the -l option. The value of search should be one of archive, shared, archive_shared, shared_archive, or default. This option can appear more than once, interspersed among -l options, to control the searching for each library. The default is to use the shared version of a library if one is available, or the archive version if not. If either archive or shared is active, only the specified library type is accepted. If archive_shared is active, the archive form is preferred, but the shared form is allowed. If shared_archive is active, the shared form is preferred but the archive form is allowed. [Profile Feedback Related Options] +I Instrument the application for profile-based optimization. See ld(1), +P, and +pgm for more details. The +I option is incompatible with the -G, +P, and -S options. +I is equivalent to +Oprofile=collect. See ld(1), +P, and +pgm for more details. The +I option is incompatible with the -G, +P, and -S options. It is incompatible with the -g option only during compile time. +P Optimize the application based on profile data found in the database file flow.data, produced by compilation with +I. +P is equivalent to +Oprofile=use or +Oprofile=use:filename. See ld(1), +I, and +df, for more details. The +P option is incompatible with the +I and -S options. It is incompatible with the -g option only during compile time. +Ostatic_prediction Enables [disables] the use of static branch prediction for decision on conditional branchs. More applicable to large programs with poor locality. Available at optimization level 3 and above. +O[no]procelim Enable [disable] the elimination of functions that are not referenced by the application. Only functions with the hidden export class may be eliminated. The default is +Oprocelim. +Oshortdata[=size] All objects of size size bytes or smaller will be placed in the short data area, and references to such data will assume it resides in the short data area. Valid values of n are 0, or a decimal number between 8 and 4,194,304 (4MB). If no size is specified, all data is placed in the short data area. If size is 0, no data will be placed in the short data area, and all data references will use long offsets. The default is +Oshortdata=8. Kernel Tunables (Listed separately) ----------------------------------------------- Flags for HP-UX F90 Compiler ----------------------------------------------- +Ooptlevel Specify the level of optimization. Higher levels include optimizations performed at lower levels. Many other options beginning with +O enable specific optimizations; see the OPTIMIZATION section. optlevel can be one of the following: 0 Minimal optimization, fastest compile time, best debugging support. This is the default. 1 Block-level optimizations, moderately fast compile time, moderate improvement in runtime performance. 2 Full optimization within each subprogram in a file. Marked improvement in runtime performance, noticeably longer compile time, program transformations more pronounced than at lower levels. 3 Full optimization across all subprograms within the source file, including subprogram cloning and inlining. This level of optimization can greatly improve the runtime performance of programs that make frequent procedure calls. +Oall Apply maximum optimization to achieve the best runtime performance. This option is equivalent to specifying +Oaggressive and +Onolimit on the same command line. The +Oall option automatically invokes the highest level of optimization. (+O3 for F90) +Oaggressive Apply aggressive optimizations. These include new optimizations as well as optimizations invoked by the following option settings: +Oentrysched +Olibcalls +Onofltacc +Onoinitcheck +FPD +Oentrysched Perform instruction scheduling on a subprogram's entry and exit code sequences. This option can be used at optimization level 1 and higher. The default is +Onoentrysched. +Olibcalls Use low-call-overhead versions of select library routines. This option can be used at any level. At optimization level 0 or 1, the default is +Onolibcalls; at optimization level 2 or higher, the default is +Olibcalls. +Onofltacc Enable floating-point optimizations that can result in numerical differences. Including contractions, such as fused multiply-add (FMA), also allows floating point optimizations which may affect the generation and propagation of infinities, NaNs, and the sign of zero. Permits optimizations, such as reordering of expressions, even if parenthesized, that may affect a rounding error. FMA instructions can improve performance of floating-point applications and are available only on PA-RISC 2.0 systems or later. +Onoinitcheck Disable initialization of any local, scalar, automatic variable that is found to be uninitialized. This option can be used at optimization level 2 and higher. The default is to enable initialization if the variable is uninitialized with respect to every path leading to its use. -Wl,+FPD Specify how the run time environment for floating-point operations should be initialized at program start up. The default is that all trapping behaviors are disabled. See ld(1) for specific values of flags. To dynamically change these settings at run time, refer to fesetenv(3M). D (d) Enable sudden underflow (flush to zero) of denormalized values. +Onolimit Do not suppress optimizations that significantly increase compile-time or consume enormous amounts of memory. +Oinfo Provide feedback information about the optimization process. This option is most useful at optimization levels 3 and 4. The default is +Onoinfo. +Oopenmp Enable Openmp Directives. -Wl,-aarchive (ld option -a search) Specifies library search order. Archive causes archive libraries to be searched only rather than shared libraries. Specify whether shared or archive libraries are searched with the -l option. The value of search should be one of archive, shared, archive_shared, shared_archive, or default. This option can appear more than once, interspersed among -l options, to control the searching for each library. The default is to use the shared version of a library if one is available, or the archive version if not. If either archive or shared is active, only the specified library type is accepted. If archive_shared is active, the archive form is preferred, but the shared form is allowed. If shared_archive is active, the shared form is preferred but the archive form is allowed. -N Mark output from the linker unshared, so that up to 2 gigabytes of memory can be addressed as data in a 32 bit process. This allows quadrants I and II to be combined such that the data segment tarts at the end of the text segment in quadrant I and extends to the end of quadrant II. For details and system defaults, see ld(1) +O[no]loop_block Enable [disable] loopblocking for data cache optimizations. Available at optimization level 3 +O[no]inline Request [disable] inlining and cloning. This option can be used at optimization level 3 and higher. The default is +Oinline. ----------------------------------------------- Descriptions of Portability Flags ----------------------------------------------- +[no]extend_source Allow [do not allow] up to 254 characters on a single source line. The default, +noextend_source, is 72 characters for fixed format and 132 for free format. +source={fixed|free|default} Accept source files in fixed format (+source=fixed) or free format (+source=free). The default, +source=default, is free for .f90 files and fixed for .f and .F source files. ----------------------------------------------- Descriptions of Kernel Tunables ----------------------------------------------- dbc_max_pct Maximum dynamic buffer cache size as a percent of system memory dbc_min_pct Minimum dynamic buffer cache size as a percent of system memory maxdsiz Maximum data size maxdsiz_64bit Maximum data size for 64 bit applications maxssiz Maximum stack size maxssiz_64bit Maximum stack size for 64 bit applications vps_ceiling Maximum System-Selected Page Size (in Kbytes) vps_pagesize Default user page size (in Kbytes) swapmem_on Swap to memory flag. ----------------------------------------------- Descriptions of Environment Variables ----------------------------------------------- CPS_STACK_SIZE A default stack size of 8 megabytes is used for additional threads created for an OpenMP program. The stack region is allocated from the program heap which is part of the data segment. The default stack size for OpenMP threads can be modified prior to program invocation by setting the environment variable CPS_STACK_SIZE to a desired number of K bytes. export CPS_STACK_SIZE=128 will establish a 128K byte stack region for each thread. OMP_NUM_THREADS Specifies the number of threads to use during execution. By default, an OpenMP application will use an implied value equal to the number of processors on the system.