How SPEC's Open Systems Group Selects New Benchmarks

By Tom Morgan, Data General, Westborough, Mass. and
Jeff Reilly, Intel Corporation, Santa Clara, Calif.

Published June, 1995; see disclaimer.

The SPEC Open Systems Group (OSG) develops benchmark suites for comparing performance of computer systems, with emphasis on component and system level characteristics in the workstation/desktop and server markets. To allow the widest range of platforms to be compared in a fair manner (the level playing field), the suites contain common-source code benchmarks, with standardized workloads, and specific rules for running and reporting the benchmark results. But how are these benchmarks selected?

The OSG selects new benchmark programs based on several criteria. Some criteria are essential because of the desire to provide benchmarks for comparing a wide range of platforms. Others are preferences, weighed subjectively by the OSG's members, based on the intent and positioning of the suite.

While specifics may vary depending on the benchmark suite, the following is required for OSG products: OSG's requirements for a benchmark are most easily illustrated by the example of candidates for its compute-intensive CPU- component suites, CFP95 and CINT95. For such suites, on the "essential" list are:

the program must be freely distributable worldwide.
the author must give permission for OSG to modify the program, package it as a benchmark, and license it as an OSG product.
the program must be portable, in a "performance- neutral" manner, to the system and environment systems targeted by the benchmark suite.
the program and/or workload must be a real application, or based on a real application.

The preferred list varies even more with the intent of the benchmark suite. The type of requirements are best illustrated by a look at the "preferred criteria" for the upcoming SPEC95, CINT95 and CFP95. These suites are meant to be compute-intensive tests, stressing the CPU, memory and compiler components of the system:

the program structure and coding style represent those commonly found in that type of application.
the program is drawn from "production" application code.
the benchmark workload is sufficient for the program to run for several minutes on the fastest OSSC systems at the time of initial release of the suite.
the program, when run as a benchmark, should not have glaring weaknesses that would allow performance optimizations that would not be possible for typical production programs for the same application; for example: a high locality of reference of the program code.
the program should stress the intended target of the suite in a known configuration. For SPEC95, this means stressing compiler, processor and memory and not containing a noticeable I/O, graphics or networking component.
To maintain a "level-playing field" in SPEC95, 95 percent or more of the execution time should be spent in SPEC provided code.

These general criteria and other benchmark specific criteria are weighed within the OSG development groups. Often tradeoffs have to be made (i.e. -portability of a program versus application type or coding style) in the development to final product. Further information on the process and opportunities for participation can be obtained directly from SPEC.

Tom Morgan works for Data General in Westboro, Mass. and currently serves on the Board of Directors. Jeff Reilly is a project lead for Intel Corp. in Santa Clara, Calif. and is the SPEC CPU Subcommittee Chair.

Standard Performance Evaluation Corporation

How SPEC's Open Systems Group Selects New Benchmarks