Siemens Nixdorf RM series compiler flags (as of May 1996)
=========================================================

The following is a list of short explanations of compiler/linker flags
used for SPEC CPU result submissions for Siemens Nixdorf RM systems.

It covers SPEC benchmark result publications for RM systems (series
RM600, RM400, RM300, RM200), published in 1994, 1995, and 1996, with the 
following compiler versions:
	C:		C-DS V1.0D, V1.1A
	Fortran:	F77 V1.4A
It is likely that future result submissions, if they use new compiler
versions, will have different flags; then this flag description will
be superseded by a new one.

Note that not all flags have been used in all measurements; some are used
for specific benchmark suites only (e.g. CINT95, CINT92), some are used
for specific languages only (C, Fortran).

The order of flags is significant; a later flag may overrule an
earlier flag.


1. Specific optimization flags:
-------------------------------

-FI
    Inlining for certain C or Fortran library functions whose names
    are listed in the compiler documentation (for Fortran: sqrt;
    for C: abs, fabs, alloca, sqrt, memset, memcpy).
-FU
    Activate an additional intermediate language optimizer (ULS =
    Universal Language System). It performs elimination of unused
    code, branch optimization, common subexpression elimination etc.
-Olimit 2000
    Limit for the number of basic blocks in a subroutine that will be
    optimized by the global optimizer (Default: 1000). Compilation
    time is traded for higher execution speed.
-Knopic
    More efficient addressing for global data via the "gp" register
    (no position-independent code). If this flag is used, "nopic" 
    versions of the library routines are used automatically. Cannot be 
    used if linking with shared objects (which requires dynamic linking) 
    is required.
-Kold
    Same as -Knopic, but no global data with 16-bit address offset are
    addressed via the gp register. Object modules can therefore be linked
    (statically) with PIC (position-independent code) objects.
-Wm,-Gnum
    Make global data with a length up to "num" addressable with a 16
    bit offset via the "gp" register (default: num = 0). If the
    parameter "num" is chosen in a way that more than 64 KByte of
    global data would need to be addressable in this way, the linker
    generates a message and does not produce a linked binary. In this 
    case, a smaller number has to be chosen.
-Wc,-afep,n
    Entry points of functions are aligned to 2**n bytes. "afep" means
    "Alignment function entry points". Code size is traded for higher
    execution speed (better usage of caches). Default: 2**2 = 4 bytes.
-Wo,-loopunroll,n
    Unroll loops n times (default: 4).
-Wo,-unrolllimit,n
    Limit for the number of statements in a loop that is to be
    unrolled (default: 320).
-Wo,-docodehoist
    Move frequently executed code.
-Krostr
    Place strings (arrays of "char", initialized with some string) in 
    read-only memory. It is common programming practice to use such
    strings as read-only constants. However, there are legal C programs
    where the array contents can be changed. Therefore, this flag
    is an "assertion flag" in the sense of SPEC's baseline rules.
-Kroconst
    Place constants that are declared with the (ANSI C) keyword "const" in 
    read-only memory. Since ANSI C requires that such constants are
    not overwritten, this is not an assertion flag.
-Wm,-i,[file_name]
    Perform inlining (overriding the compiler's default algorithm) on
    the basis of profile information contained in file [file_name].
    This file is generated by an automatic tool supplied with the
    compilation system; it contains subroutine names together with a
    "+" or "-" sign. "+" indicates that inlining is to be performed
    for this subroutine. "-" indicates that inlining is not to be
    performed for this subroutine. The contents of the "+" and the "-" 
    list are computed on the basis of the number of calls to the 
    respective subroutines ("train" input used for SPEC CPU95, "short" 
    input for SPEC CPU92). For subroutines not listed, inlining is left to 
    the compiler's default algorithm (activated for optimization levels 
    3 or 4). For the generation of the profiling information in combination
    with the SPEC95 tools, see section 3 below.
-ddopt
    Perform data dependency optimization for loops. This flag assumes that
    the loop index is only changed by the normal loop mechanism, not by
    other explicit or implicit assignments to the loop variable.
    In the case of Fortran77, this can be assumed (otherwise, the program
    is not a legal Fortran77 program). In the case of C, even though this
    is good programming practice, it is not guaranteed. Therefore, for C,
    the flag is an "assertion flag" in the sense of SPEC's baseline rules.
    For Fortran, it is not an assertion flag.
-Wb,-mips3
    Generate code for the R4000 CPU
-Wb,-mips2
    Generate code for the R3000 CPU
-Wb,-r4000
    Optimize for the R4000 pipeline
-dn
    Static linking (no dynamic linking)

The following flags are specific to Fortran:

-fret1
    Use the more efficient C-style calling sequence. The default is
    fret0, the older Fortran calling sequence.
-fa
    Treat local variables as "automatic".
-fu
    Portability flag: Lower case letters in program names are converted to
    upper case. The ANSI/ISO standard defines legal Fortran77 programs
    with upper-case names only. However, many programs, including
    some SPEC CFP programs, use lower case letters and assume that
    the names are equivalent to their upper-case counterparts.


2. General optimization flags:
-------------------------------

-O3
    General optimization level. It includes:
    Level 1:
    Optimizations local to subroutines (register, branch, integer
    constant folding)
    Level 2:
    Optimizations over all subroutines in a compilation unit
    (common subexpression elimination, elimination of partial
    redundancies, copy propagation, strength reduction, loop
    optimization, dead code elimination)
    Level 3:
    Inline expansion of user subroutines (subject to
    compiler-internal heuristics), global register allocation.
    Programs optimized with level 3 must be linked statically
    (linker option -dn)
    This flag implies moderately high compilation time, it is
    recommended for high performance while still maintaining
    executability on an R3000 and compatibility with code compiled in
    other environments (older interlanguage calling sequence).
-O4
    General optimization flags / convenience option, includes "-O3"
    plus the following:
	-Olimit 2000 -FI -FU -Wb,-mips3 -Wb,-r4000 -dn
    In the case of Fortran, it also includes
	-fa -fret1
    This flag is recommended for high performance if compilation
    time, executability on an R3000 and compatibility with code
    compiled in other environments (older interlanguage calling
    sequence) are not an issue. It does not imply any "assertions" in
    the sense of SPEC's baseline rules. 


3. Generation of the feedback information file
----------------------------------------------

The directory that contains the compiler (/opt/C/bin) also contains
two short utilities
	make_freq_inline
and
	make_freq_inline_pos
that can be used to generate files containing profiling information
which can later be used by the "-Wm,-i[file]" compiler flag (see above).
"make_freq_inline_pos" is called with two parameters n1 and n2:
n1:	write a "+" sign (which directs: Do inline) for the top n1 functions,
	in the order of call frequency.
n2:	Write a "-" sign (which directs: Do not inline) for all functions
	below n2, in the order of call frequency.
"make_freq_inline_pos", like other UNIX utilities, writes to stdout.
Therefore, in order to use the tool in the way required for SPEC CPU95
(one invocation of "runspec" only, with two different flag lists
given in PASS1_CFLAGS and PASS2_CFLAGS), the following flag settings
are necessary in the configuration file:

  PASS1_CFLAGS = ... -qp
  fdo_post1 = /opt/C/bin/make_freq_inline_pos -p %binary% -m mon.out n1 n2
	[continuation of above line]
	| tee /tmp/%benchname%.n1.n2.fb
  PASS2_CFLAGS = ... -Wm,-i,"/tmp/%benchname%.n1.n2.fb

See the SPEC CPU95 documentation for details concerning %binary%, %benchname%,
and fdo_post1.

Remarks:
1. The name of the feedback information collection file 
	(/tmp/%benchname%.n1.n2.fb)
could be arbitrary. The name choosen for SPEC measurement runs indicates 
(and documents) the parameters with which make_freq_inline_pos was called.
2. The "tee" construct is somewhat akward, it is a consequence of the
requirement that everything has to be activated from the SPEC-supplied
Perl script, and that this script allows no output redirection other 
than what it has as the default (to the log file). A user outside the 
SPEC benchmarks would not need it and would use output redirection as it 
is used for many UNIX tools (make_freq_inline_pos writes to stdout).


4. Library flags
----------------

-lm
   Use the mathematical library libm.a
-lm_r4000
   Use a higher performance version of the mathematical library,
   optimized for the R4000 CPU. In accuracy and ANSI compliance, this 
   library is equivalent to the standard library. In cases of runtime 
   errors, the variable "errno" is not set in the way standardized by XPG.
-lgen
   Use a general library libgen.a
-lcurses
   Use the "curses" library libcurses.a
-lf77
   Use the Fortran77 library. This is implicit for Fortran programs, but 
   if "-lm_r4000" is used, it needs to be stated explicitly because of
   the order of linker flags in the SPEC-supplied Makefile.


5. Questions?
-------------

More details about these and other flags can be found in the appropriate
documentation (manuals, release notes). Additional SPEC-specific questions
can be directed to the Siemens Nixdorf SPEC representative
	Reinhold Weicker
	Siemens Nixdorf
	OEC HES PM4
	33094 Paderborn
	Germany
    E-Mail:
	weicker.pad@sni.de