================================================================================ Fujitsu PRIMEPOWER flags/tunables description (November 7, 2003) ================================================================================ History June 2001 Matthijs van Waveren Initial version June 2003 Matthijs van Waveren Update to Parallelnavi 2.2 November 2003 Matthijs van Waveren Update to Parallelnavi 2.3 Added -Kprefetch_cache_level and -O5 Extended section on CPU_USE and -Kmfunc ================================================================================ Fujitsu Parallelnavi 2.3 compiler flag description Parallelnavi consists of the following three main function groups: (a) Program development and execution environment functions; (b) High-speed application execution functions; and (c) Job operation management functions. The first group consists of a program development environment that supports Fortran, C, and C++ in accordance with international standards, and allows the use of parallel technology. Mathematical libraries, incl. the parallel version, and program development support tools are included. The second group contains an harmonized scheduling function to ensure a uniform level of execution performance for a parallelized program and a large page function, which improves the execution performance of applications that handle large volumes of data. The third group provides for the execution control and management of jobs, including parallel jobs, using the network queuing system (NQS). NQS allows one to submit batch jobs to queues on local or remote machines and have the log file returned to the originating machine or another machine. It is possible to disallow the sharing of CPUs by multiple jobs. The system resources can be effectively applied using the NQS scheduling functions, and a flexible scheduling for each user and group is provided by the Network Queuing System Job Manager (NQS-JM). The following compiler options have been used for both Fortran 90 and C compilers, except where specified otherwise. Compiler options Remark -------------------------------------------------------------------------------- -KOMP Recognizes OpenMP directives in the source code and generates multiprocessing code. -Kfast_GP2[={1|2|3}] This performs optimization for the SPARC64 V series. 1: This performs optimization suitable for SPARC64 V including the global instruction scheduling. 2: This performs reordering of expression evaluation in addition to -Kfast_GP2=1. 3: This option specifies the optimization of moving the evaluation of invariant expressions beyond the branch in addition to -Kfast_GP2=2. -KV9 Indicates that SPARCV9 instructions are generated. -Klargepage[={1|2}] Specifies the generation of an executable program which utilizes the Parallelnavi largepage facility. 1: The largepage facility applies to the data and heap areas 2: The largepage facility applies to the data, heap, and stack areas -Khardbarrier Specifies the generation of an executable program which utilizes the Parallelnavi thread hardware barrier facility. The synchronization performance is improved. -x- Inline expansion, instead of function calls, is performed. -Kprefetch_line=N When the prefetch instruction is generated, the target of prefetch is cache after N line. It means -Kprefetch={2|3|4|5} option is in effect. -Kprefetch[={1|2|3|4|5}] Generate prefetch instruction correspond to each prefetch level. -Kprefetch option is valid when -KV9 option is effective. 1: Basic level prefetch for array elements in only the inner-most loop. However, in case of multiloop, only the array data in the most inner loop is targeted. 2: In addition to the -Kprefetch=1, generate the prefetch instruction for array elements within the loop pre-header which accesses the first iteration in the loop. 3: In addition to the -Kprefetch=2, when the stride of access for array elements are larger than cache line size, compiler generates prefetch instruction for each cache line size access. 4: In addition to -Kprefetch=3, prefetch with address calculation is executed. 5: In addition to -Kprefetch=4, prefetching is applied to array data which are accessed indirectly. -Kprefetch_cache_level=N This option specifies the cache-level to prefetch data. It means the -KSPARC64_GP2 and -Kprefetch={2|3|4|5} options are in effect. N can be specified as follows (F90 only): 1 : Data is prefetched in the first cache. Prefetch instruction is used normally. 2 : Data is prefetched only in the second cache. 3 : Level 1 and 2 functions are in effect. Two kind of prefetch instructions are used, so that the prefetch becomes high level. -KSPARC64_GP2 The -KSPARC64_GP2 option creates an object program that uses SPARC64 V instructions. The -KSPARC64_GP2 option is not supported to run on a system equipped with CPU other than SPARC64 V. -O [opt_lvl] opt_lvl:{0|1|2|3|4|5} The -O option specifies the optimization level used by the compiler. The system provides four optimization levels: 0, 1, 2, 3, 4, and 5. If the argument is omitted from the -O option, level 3 is used. If the -O option is not specified, level 2 is used. The following description holds for F90 only. 0 The -O0 option creates an object program without applying optimization. A program compiled with the zero optimization level requires the least compile time and memory. Specify this argument to debug compilation errors in Fortran source programs. 1 The -O1 option creates an object program by applying basic optimization. The run time of the resulting executable program will be shorter. The size of an object program created using level 1 is smaller than the size of an object program created using level 2. 2 The -O2 option creates an object program by applying the basic optimization of level 1 plus loop unrolling. Compared with level 1, the run time of an executable program that includes many DO loops can be reduced by using level 2. 3 The -O3 option creates an object program by applying the optimizations of level 2 plus modification of the structure of nested loops, loop tiling and software pipelining. Also the optimizations of level 1 are applied repeatedly. Compared with level 2, the execution performance of an object program is improved by using level 3. 4 The -O4 option creates an object program by applying further optimizations of loop restructuring in addition to the -O3 option. Optimizations of this level contain full unrolling of nested loops, splitting for promoting loop exchange and optimization about arrays with constant subscripts. Although compiling time increases more than -O3, this option can be used to obtain better execution performance. 5 The -O5 option creates an object program by applying further optimizations of register allocation in addition to -O4 option. Although compiling time increases further more than -O4, this option can be used to obtain better execution performance. -Nautoobjstack Allocates an automatic data object on the stack. (F90 only) -Am Required if a source file contains modules which will be referenced by USE statements in other source files or if a source file contains USE statements that reference modules in another source file. (F90 only) -Fixed Specifies that Fortran source programs are written in fixed source form. (F90 only) -w In fixed source form, the length of all source lines is 255 characters. (F90 only) -Kfast_GP[={0|1|2}] This performs optimization for SPARC64 GP series. 0: This performs optimization suitable for SPARC64 GP including the global instruction scheduling. 1: This generates multiply and add instruction in addition to -Kfast_GP=0. (default) 2: This performs reordering of expression evaluation in addition to -Kfast_GP=1. -KV9FMADD Indicates that SPARCV9 instructions and multiply add/subtract are generated. -D name[=tokens] Associates name with the specified tokens in the same way as for a #define preprocessing directive. If =tokens is not specified, the token 1 is used. -KOMP_fast_reduction Use a reduction facility which is valid only if the parallel regions are not nested. -Kgs Perform global instruction scheduling. This is activated if -Kfast_GP2 is specified. -Kparallel Specifies automatic parallelization. -Knogs Do not perform global instruction scheduling. -Kpreex This option specifies the optimization of moving the evaluation of invariant expressions beyond the branch. -Kloop Restructures DO loops by performing interchange and blocking for better cache use. -Kilfunc This option replaces single and double precision mathematical functions, sin, cos, log10, log and exp with compiler built-in functions. -Kcfunc This uses high speed mathematical functions and library functions (malloc, calloc, realloc, free) prepared by this compilation system. (C only) -Kmfunc This uses high-speed mathematical functions prepared by this compilation system. The mathematical functions are the trigonometric functions, logarithmic functions, and gamma functions. (C only) -lmtmalloc Use Solaris mtmalloc library. malloc() and free() provide a simple general-purpose memory allocation package that is suitable for use in high performance multithreaded applications. -SSL2 Combine with SSL II library. This library contains the optimized BLAS functions. ================================================================================ ================================================================================ Solaris system configuration information file description All flags are specified in '/etc/system'. System Tunables Remark -------------------------------------------------------------------------------- shmsys:shminfo_shmmax Maximum size of system V shared memory segment that can be created. shmsys:shminfo_shmmni System wide limit on number of shared memory segments that can be created. shmsys:shminfo_shmseg Limit on the number of shared memory segments that any one process can create. autoup Period of execution of the fsflush daemon in units of second. This daemon periodically scans memory to find unwritten data and meta-data and writes this into disks. memscrub_period_sec Period of execution of the memory patrol daemon in units of seconds. This daemon periodically scans memory to confirm no ECC error. ================================================================================ ================================================================================ Fujitsu Parallelnavi 2.3 Large page management information file description All flags are specified in '/etc/opt/FJSVpnrm/lpg.conf'. Tunables Remark -------------------------------------------------------------------------------- JOB=size[unit] Size of total memory, to be used for large page segments. At start of the system, this amount of memory is reserved and initialized for NQS jobs. "unit" can be M for mega-byte and G for giga-byte. SHMSEGSIZE=size[unit] Size of large page segment. "unit" can be M for mega-byte and G for giga-byte. ================================================================================ ================================================================================ Fujitsu Parallelnavi 2.3 CPU resource information file description All flags are specified in '/etc/opt/FJSVpnrm/cpursc.conf'. Tunables Remark -------------------------------------------------------------------------------- CPU_USE=tss-cpuid:io-cpuid Parallelnavi allows the grouping of CPUs into CPUs that are used for job execution, and CPUs that are used for other types of processing (interactive, daemon). It is also possible to specify which CPUs should, and which should not receive I/O interrupts. The CPUs not used for job execution, are specified by tss-cpuid and of these CPUs, the CPUs interruptible by I/O devices are specified by io-cpuid. tss-cpuid and io-cpuid can be ranges of two numbers separated by a dash or comma. For instance CPU_USE=0,1:0,1 means that CPUs 0 and 1 are reserved for system tasks, and both of these CPUS are enabled to receive I/O interrupts. All other CPUs are reserved for job execution. ================================================================================ ================================================================================ Fujitsu Parallelnavi 2.3 NQS parameters description The Network Queuing System (NQS) parameters are managed by the qmgr command. Selecting the execution mode 'Simplex' disallows the sharing of CPUs by multiple jobs. Tunables Remark -------------------------------------------------------------------------------- Per-process data size limit The limit of data segment size. Per-process permanent file size limit The limit of permanent file size. Per-process memory size limit The limit of memory size. Per-request memory size limit The limit of largepage memory size. Per-process number of cpus limit The limit of the number of CPUs Per-process stack size limit The limit of stack segment size. Per-process CPU time limit The CPU time limit for each process. Execution mode Selection of execution mode. Simplex means sharing CPUs by multiple jobs is not permitted. Jobclass Assigning sub group. On a non-clustered system, this value is zero. ================================================================================