Siemens Nixdorf RM series compiler flags (as of May 1996) ========================================================= The following is a list of short explanations of compiler/linker flags used for SPEC CPU result submissions for Siemens Nixdorf RM systems. It covers SPEC benchmark result publications for RM systems (series RM600, RM400, RM300, RM200), published in 1994, 1995, and 1996, with the following compiler versions: C: C-DS V1.0D, V1.1A Fortran: F77 V1.4A It is likely that future result submissions, if they use new compiler versions, will have different flags; then this flag description will be superseded by a new one. Note that not all flags have been used in all measurements; some are used for specific benchmark suites only (e.g. CINT95, CINT92), some are used for specific languages only (C, Fortran). The order of flags is significant; a later flag may overrule an earlier flag. 1. Specific optimization flags: ------------------------------- -FI Inlining for certain C or Fortran library functions whose names are listed in the compiler documentation (for Fortran: sqrt; for C: abs, fabs, alloca, sqrt, memset, memcpy). -FU Activate an additional intermediate language optimizer (ULS = Universal Language System). It performs elimination of unused code, branch optimization, common subexpression elimination etc. -Olimit 2000 Limit for the number of basic blocks in a subroutine that will be optimized by the global optimizer (Default: 1000). Compilation time is traded for higher execution speed. -Knopic More efficient addressing for global data via the "gp" register (no position-independent code). If this flag is used, "nopic" versions of the library routines are used automatically. Cannot be used if linking with shared objects (which requires dynamic linking) is required. -Kold Same as -Knopic, but no global data with 16-bit address offset are addressed via the gp register. Object modules can therefore be linked (statically) with PIC (position-independent code) objects. -Wm,-Gnum Make global data with a length up to "num" addressable with a 16 bit offset via the "gp" register (default: num = 0). If the parameter "num" is chosen in a way that more than 64 KByte of global data would need to be addressable in this way, the linker generates a message and does not produce a linked binary. In this case, a smaller number has to be chosen. -Wc,-afep,n Entry points of functions are aligned to 2**n bytes. "afep" means "Alignment function entry points". Code size is traded for higher execution speed (better usage of caches). Default: 2**2 = 4 bytes. -Wo,-loopunroll,n Unroll loops n times (default: 4). -Wo,-unrolllimit,n Limit for the number of statements in a loop that is to be unrolled (default: 320). -Wo,-docodehoist Move frequently executed code. -Krostr Place strings (arrays of "char", initialized with some string) in read-only memory. It is common programming practice to use such strings as read-only constants. However, there are legal C programs where the array contents can be changed. Therefore, this flag is an "assertion flag" in the sense of SPEC's baseline rules. -Kroconst Place constants that are declared with the (ANSI C) keyword "const" in read-only memory. Since ANSI C requires that such constants are not overwritten, this is not an assertion flag. -Wm,-i,[file_name] Perform inlining (overriding the compiler's default algorithm) on the basis of profile information contained in file [file_name]. This file is generated by an automatic tool supplied with the compilation system; it contains subroutine names together with a "+" or "-" sign. "+" indicates that inlining is to be performed for this subroutine. "-" indicates that inlining is not to be performed for this subroutine. The contents of the "+" and the "-" list are computed on the basis of the number of calls to the respective subroutines ("train" input used for SPEC CPU95, "short" input for SPEC CPU92). For subroutines not listed, inlining is left to the compiler's default algorithm (activated for optimization levels 3 or 4). For the generation of the profiling information in combination with the SPEC95 tools, see section 3 below. -ddopt Perform data dependency optimization for loops. This flag assumes that the loop index is only changed by the normal loop mechanism, not by other explicit or implicit assignments to the loop variable. In the case of Fortran77, this can be assumed (otherwise, the program is not a legal Fortran77 program). In the case of C, even though this is good programming practice, it is not guaranteed. Therefore, for C, the flag is an "assertion flag" in the sense of SPEC's baseline rules. For Fortran, it is not an assertion flag. -Wb,-mips3 Generate code for the R4000 CPU -Wb,-mips2 Generate code for the R3000 CPU -Wb,-r4000 Optimize for the R4000 pipeline -dn Static linking (no dynamic linking) The following flags are specific to Fortran: -fret1 Use the more efficient C-style calling sequence. The default is fret0, the older Fortran calling sequence. -fa Treat local variables as "automatic". -fu Portability flag: Lower case letters in program names are converted to upper case. The ANSI/ISO standard defines legal Fortran77 programs with upper-case names only. However, many programs, including some SPEC CFP programs, use lower case letters and assume that the names are equivalent to their upper-case counterparts. 2. General optimization flags: ------------------------------- -O3 General optimization level. It includes: Level 1: Optimizations local to subroutines (register, branch, integer constant folding) Level 2: Optimizations over all subroutines in a compilation unit (common subexpression elimination, elimination of partial redundancies, copy propagation, strength reduction, loop optimization, dead code elimination) Level 3: Inline expansion of user subroutines (subject to compiler-internal heuristics), global register allocation. Programs optimized with level 3 must be linked statically (linker option -dn) This flag implies moderately high compilation time, it is recommended for high performance while still maintaining executability on an R3000 and compatibility with code compiled in other environments (older interlanguage calling sequence). -O4 General optimization flags / convenience option, includes "-O3" plus the following: -Olimit 2000 -FI -FU -Wb,-mips3 -Wb,-r4000 -dn In the case of Fortran, it also includes -fa -fret1 This flag is recommended for high performance if compilation time, executability on an R3000 and compatibility with code compiled in other environments (older interlanguage calling sequence) are not an issue. It does not imply any "assertions" in the sense of SPEC's baseline rules. 3. Generation of the feedback information file ---------------------------------------------- The directory that contains the compiler (/opt/C/bin) also contains two short utilities make_freq_inline and make_freq_inline_pos that can be used to generate files containing profiling information which can later be used by the "-Wm,-i[file]" compiler flag (see above). "make_freq_inline_pos" is called with two parameters n1 and n2: n1: write a "+" sign (which directs: Do inline) for the top n1 functions, in the order of call frequency. n2: Write a "-" sign (which directs: Do not inline) for all functions below n2, in the order of call frequency. "make_freq_inline_pos", like other UNIX utilities, writes to stdout. Therefore, in order to use the tool in the way required for SPEC CPU95 (one invocation of "runspec" only, with two different flag lists given in PASS1_CFLAGS and PASS2_CFLAGS), the following flag settings are necessary in the configuration file: PASS1_CFLAGS = ... -qp fdo_post1 = /opt/C/bin/make_freq_inline_pos -p %binary% -m mon.out n1 n2 [continuation of above line] | tee /tmp/%benchname%.n1.n2.fb PASS2_CFLAGS = ... -Wm,-i,"/tmp/%benchname%.n1.n2.fb See the SPEC CPU95 documentation for details concerning %binary%, %benchname%, and fdo_post1. Remarks: 1. The name of the feedback information collection file (/tmp/%benchname%.n1.n2.fb) could be arbitrary. The name choosen for SPEC measurement runs indicates (and documents) the parameters with which make_freq_inline_pos was called. 2. The "tee" construct is somewhat akward, it is a consequence of the requirement that everything has to be activated from the SPEC-supplied Perl script, and that this script allows no output redirection other than what it has as the default (to the log file). A user outside the SPEC benchmarks would not need it and would use output redirection as it is used for many UNIX tools (make_freq_inline_pos writes to stdout). 4. Library flags ---------------- -lm Use the mathematical library libm.a -lm_r4000 Use a higher performance version of the mathematical library, optimized for the R4000 CPU. In accuracy and ANSI compliance, this library is equivalent to the standard library. In cases of runtime errors, the variable "errno" is not set in the way standardized by XPG. -lgen Use a general library libgen.a -lcurses Use the "curses" library libcurses.a -lf77 Use the Fortran77 library. This is implicit for Fortran programs, but if "-lm_r4000" is used, it needs to be stated explicitly because of the order of linker flags in the SPEC-supplied Makefile. 5. Questions? ------------- More details about these and other flags can be found in the appropriate documentation (manuals, release notes). Additional SPEC-specific questions can be directed to the Siemens Nixdorf SPEC representative Reinhold Weicker Siemens Nixdorf OEC HES PM4 33094 Paderborn Germany E-Mail: weicker.pad@sni.de