This document specifies the guideline on how SPECsfs97_R1 is to be run for measuring and publicly reporting performance results. These rules have been established by the SPEC SFS Subcommittee and approved by the SPEC Open Systems Steering Committee. They ensure that results generated with this suite are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results). Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.
SPEC believes the user community will benefit from an objective series of tests, which can serve as common reference and be considered as part of an evaluation process. SPEC is aware of the importance of optimizations in producing the best system performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks and optimizations that specifically target the SPEC benchmarks. However, with the list below, SPEC wants to increase awareness of implementers and end users to issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking. SPEC expects that any public use of results from this benchmark suite shall be for Systems Under Test (SUTs) and configurations that are appropriate for public consumption and comparison. Thus, it is also required that:
To ensure that results are relevant to end-users, SPEC expects that the hardware and software implementations used for running the SPEC benchmarks adhere to following conventions:
SPEC reserves the right to investigate any case where it appears that these guidelines and the associated benchmark run and reporting rules have not been followed for a published SPEC benchmark result. SPEC may request that the result be withdrawn from the public forum in which it appears and that the benchmarker correct any deficiency in product or process before submitting or publishing future results. SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECsfs97_R1 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees if changes are made to the benchmark and will rename the metrics (e.g. from SPECsfs97_R1 to SPECsfs97_R1a). Relevant standards are cited in these run rules as URL references, and are current as of the date of publication. Changes or updates to these referenced documents or URL's may necessitate repairs to the links and/or amendment of the run rules. The most current run rules will be available at the SPEC web site at http://www.spec.org. SPEC will notify members and licensees whenever it makes changes to the suite.
SPEC encourages the submission of results for review by the relevant subcommittee and subsequent publication on SPEC's web site. Vendors may publish compliant results independently, however any SPEC member may request a full disclosure report for that result and the benchmarker must comply within 10 business days. Issues raised concerning a result's compliance to the run and reporting rules will be taken up by the relevant subcommittee regardless of whether or not the result was formally submitted to SPEC.
The SPECsfs97_R1 result produced in compliance with these run and reporting rules may be publicly disclosed and represented as valid SPECsfs97_R1 result. All SPECsfs97_R1 results that are submitted to SPEC will be reviewed by the SFS subcommittee. The review process ensures that the result is compliant with the run and disclosure rules set forth in this document. If the result is compliant then the result will be approved for publication on the SPEC web site. If the result is found to be non-compliant then the submitter will be contacted and informed of the specific problem that resulted in the non-compliant component of the submission. Any test result not in full compliance with the run and reporting rules must not be represented using the SPECsfs97_R1 metric name. The metric SPECsfs97_R1 must not be associated with any estimated results. This includes adding, multiplying or dividing measured results to create a derived metric.
Consistency and fairness are guiding principles for SPEC. To assure these principles are sustained, the following guidelines have been created with the intent that they serve as specific guidance for any organization (or individual) who chooses to make public comparisons using SPEC benchmark results. When any organization or individual makes public claims using SPEC benchmark results, SPEC requires that the following guidelines be observed:
The following paragraph(s) is an example of acceptable language when publicly using SPEC benchmarks for competitive comparisons:
Example:
SPEC® and the benchmark name SPECsfs®97_R1; are registered trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of Jan 12, 2001. The comparison presented above is based on the best performing 4-cpu servers currently shipping by Vendor 1, Vendor 2 and Vendor 3. For the latest SPECsfs97_R1 benchmark results visit www.spec.org <<or more specifically: www.spec.org/sfs97r1 >>.
SPEC encourages use of the SPECsfs97_R1 benchmark in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that required of licensees submitting to the SPEC web site or otherwise disclosing valid SPECsfs97_R1 results. For example, a research environment may use early prototype hardware that simply cannot be expected to stay up for the length of time required to run the required number of points, or may use research software that are unsupported and are not generally available. Nevertheless, SPEC encourages researchers to obey as many of the run rules as practical, even for informal research. SPEC suggests that following the rules will improve the clarity, reproducibility, and comparability of research results. Where the rules cannot be followed, SPEC requires the results be clearly distinguished from full compliant results such as those officially submitted to SPEC, by disclosing the deviations from the rules and avoiding the use of the SPECsfs97_R1 metric name.
This document provides the rules to follow for all submitted, reported, published and publicly disclosed runs of the SPEC System File Server (SPECsfs97_R1 Benchmark according to the norms specified and approved by the SPEC SFS sub Steering Committee (SFSSC). These run rules also form the basis for determining which server hardware and software features are allowed for benchmark execution and result publication. This document should be considered the complete guide when addressing the issues of benchmark and NFS server configuration requirements for the correct execution of the benchmark. The only other documents that should be considered are potential clarifications or interpretations of these Run and Disclosure Rules. These potential interpretations should only be accepted if they originate from and are approved by the SFSSC. These run and disclosure rules are meant to provide the standard by which customers can compare and contrast NFS server performance. It is the intent of the SFSSC to set a reasonable standard for benchmark execution and disclosure of results so customers are presented with enough information about the disclosed configuration to potentially reproduce configurations and their corresponding results. As a requirement of the license of the benchmark, these run and disclosure rules must be followed. If the user of the SFS 3.0 benchmark suite does not adhere to the rules set forth herein, SPEC may choose to terminate the license with the user. Please refer to the SPEC SFS 3.0 Benchmark license for complete details of the user's responsibilities. For this document, it is assumed the reader is familiar with the SFS 3.0 benchmark through the use of SFS 2.0 and/or the reading of the user documentation for SFS 3.0
The general philosophy behind this set of rules for benchmark execution is to ensure that benchmark results can be reproduced if desired.
Products are considered generally available if they can be ordered by ordinary customers and ship within a reasonable time frame. This time frame is a function of the product size and classification, and common practice. The availability of support and documentation for the products must coincide with the release of the products.
Hardware products that are still supported by their original or primary vendor may be used if their original general availability date was within the last five years. The five-year limit does not apply to the hardware used in client systems - i.e., client systems are simply required to have been generally available at some time in the past.
Software products that are still supported by their original or primary vendor may be used if their original general availability date was within the last three years.
In the disclosure, the submitting vendor must identify any SUT component that can no longer be ordered by ordinary customers.
In addition to the base operating system, the server will need either the NFS Version 2 or NFS Version 3 software. The clients used for testing will need an ANSI-conformant C compiler (if benchmark compilation is required), a bourne shell, a remote shell, a copy of the benchmark and a network interface. All of the server software components are required to be generally available within six months of result publication. Use of benchmark specific software components on either the clients or server are not allowed.
Vendor Makefile Wrappers
Included in this benchmark release are pre-compiled versions of the benchmark for various operating systems at various levels. If it becomes necessary for the user to compile a version of the benchmark source for testing, generic makefiles are provided in the benchmark source directories. Typically a vendor makefile wrapper (M.vendor) is used in conjunction with the generic makefile for benchmark compilation. The makefiles may be modified or supplemented in a performance neutral fashion to facilitate the compilation and execution of the benchmark on operating systems not included within the benchmark distribution. It should be noted that as of SFS 3.0, the client no longer needs NFS client software present or configured for successful execution of the benchmark. The following is a list of the vendors and their respective operating system levels for which the benchmark has been pre-compiled and included with the benchmark distribution.
Makefile wrappers from other vendors have not been tested and binaries for those other vendors are not provided.
SPEC permits minimal performance-neutral portability changes of the benchmark source. When benchmark source changes are made, an enumeration of the modifications and the specific source changes must be submitted to SPEC prior to result publication. All modifications must be reviewed and deemed performance neutral by the SFSSC. Results requiring such modifications can not be published until such time that the SFSSC accepts the modifications as performance neutral. Source code changes required for standards compliance should be reported to SPEC. Appropriate standards documents should be cited. SPEC may consider incorporating such changes in future releases. Whenever possible, SPEC will strive to develop and enhance the benchmark to be standards-compliant. Portability changes will generally be allowed if, without the modification, the:
For a benchmark result to be eligible for disclosure, all items identified in the following sections must be true.
NFS Version 3 |
---|
SETATTR |
READLINK |
CREATE |
MKDIR |
SYMLINK |
MKNOD |
REMOVE |
RMDIR |
RENAME |
LINK |
In section "NFS protocol requirements" on page 51, the term stable storage is used. For clarification, the following references and further definition is provided and must be followed for results to be disclosed.
RFC 1094, NFS: Network File System, of March 1989, page 3 states the following concerning the NFS protocol:
All of the procedures in the NFS protocol are assumed to be synchronous. When a procedure returns to the client, the client can assume that the operation has completed and any data associated with the request is now on stable storage. For example, a client WRITE request may cause the server to update data blocks, filesystem information blocks (such as indirect blocks), and file attribute information (size and modify times). When the WRITE returns to the client, it can assume that the write is safe, even in case of a server crash, and it can discard the data written. This is a very important part of the statelessness of the server. If the server waited to flush data from remote requests, the client would have to save those requests so that it could resend them in case of a server crash.
SPEC has further clarification of this definition to resolve any potential ambiguity. For the purposes of the benchmark, SPEC defines stable storage in terms of the following operational description:
NFS servers must be able to recover without data loss from multiple power failures (including cascading power failures, i.e., several power failures in quick succession), operating system failures, and hardware failure of components (e.g., CPU) other than the storage medium itself (e.g., disk, non-volatile RAM). At any point where the data can be cached, after response to the client, there must be a mechanism to ensure the cached data survives server failure. Specifically, where non-volatile RAM (NVRAM) is utilized, the NVRAM power source should be able to sustain the contents of the memory in the face of multiple power failures (including cascading power failures) for a period of no less than 72 hours.
In "Server configuration requirements" on page 51 the term uniform access is used to define a requirement. This section provides a complete description and examples. The NFS server configuration for the benchmark execution should provide uniform file system access to the clients being used.
SPEC intends that for every network, all file systems should be accessed by all clients uniformly. Each network must access all of the disk controllers in the SUT to be considered compliant with the Uniform access requirement.
Uniform access is meant to eliminate potential exploitation of any partitionable aspect of the benchmark, particularly when reporting cluster results. It is recognized that servers vary as to exposing elements such as processor, disk controller or disk to load generators remotely accessing file systems. The algorithm presented below is the preferred, but not the only mechanism, when determining file system access for benchmark configuration. This method should prevent biased configurations for benchmark execution.
Once the number of load generating processes has been determined, then load generator mount points should distribute file systems in the following manner. Using a round-robin assignment, select the next file system to mount by selecting from the following collection, varying first (1), then (2), then (3), and so on:
Note that this list may not be complete for system components which should be considered for uniform access. Some server architectures may have other major components. In general, components should be included so all data paths are included within the system.
The network(s) used for valid benchmark execution must be isolated networks. Results obtained on production networks are invalid as they will most likely not be reproducible. Furthermore, the benchmark may fail to correctly converge to the requested load rate and behave erratically due to varying ambient load on the network.
This section details the requirements governing how the benchmark is to be executed for the purpose of generating results for disclosure.
SPEC's Description of Stable Storage for SFS 3.0, the NFS server's target file systems, their configuration and underlying physical medium used for benchmark execution must follow the stable storage requirements. At the start of each benchmark run, before the first in a series of requested NFS load levels is generated, the NFS server's target filesystems must be initialized to the state of a newly-created, empty filesystem. For UNIX-based systems, the mkfs> (make filesystem) or newfs (new filesystem) command would be used for each target filesystem. For non-UNIX-based systems, a semantic equivalent to the mkfs or newfs command must be used.
The result of benchmark execution is a set of NFS throughput / response time data points for the server under test which defines a performance curve. The measurement of all data points used to define this performance curve must be made within a single benchmark run, starting with the lowest requested NFS load level and proceeding to the highest requested NFS load level. Published benchmark results must include at least 10 uniformly spaced requested load points (excluding zero NFSops/sec ). Two additional non-uniformly spaced requested load points beyond the highest uniformly spaced point may also be included. The achieved throughput of the optional non-uniformly spaced data points should be no more than 5% higher than the highest uniformly spaced achieved throughput data point. The highest achieved throughput must be within 10 % of the requested throughput for it to be considered a valid data point. Any invalid data points will invalidate the entire run unless they are at or below 25% of the maximum measured throughput. All data points at or below the maximum reported throughput must be reported. Invalid data points must be submitted but will not appear on the disclosure page graph. (The requested load associated with the invalid points will appear on the disclosure reporting table, however, the throughput and response time will be omitted.) No server or testbed configuration changes, server reboots, or file system initialization (e.g., newfs) are allowed during the execution of the benchmark or between data point collection. If any requested NFS load level or data point must be rerun for any reason, the entire benchmark execution must be restarted, i.e., the server's filesystems must be initialized and the series of requested NFS load levels repeated in whole.
For each data point measured, there will be the throughput and corresponding response time. For a data point to be eligible for results disclosure the response time reported by the benchmark must not exceed 40 milliseconds.
The overall response time is an indicator of how quickly the system under test responds to NFS operations over the entire range of the tested load. The overall response time is a measure of how the system will respond under an average load. Mathematically, the value is derived by calculating the area under the curve divided by the peak throughput. Below the first valid data point is assumed to be directly proportional throughput, with zero response-time at zero throughput.
The benchmark has a number of parameters which are configurable. This parameter modification is specified with the use of the RC file on the prime client. For benchmark execution for results to be disclosed, there is a subset of parameters which may be modified. Parameters outside of the set specified below may not be modified for a publishable benchmark result.
Parameters which may be modified for benchmark execution:
LOAD
Used to specify the data points to be collected by the benchmark. List
must increase in value and must represent a uniform distribution.
INCR_LOAD
If the LOAD has a single value, this parameter is used to specify the
increment to increase the load for successive data points.
NUM_RUNS
If INCR_LOAD is used, this parameter is used to specify the number of data
points to gather. For a valid benchmark execution, this value must be
greater than or equal to 10.
PROCS
This parameter specifies the number of load generating processes to be
used on each load generating client. There is a minimum number of eight
processes for each network used in the benchmark configuration. For
example, if the server being measured has two network interfaces and there
are two clients on each network, then each client would require a minimum
of four processes to be used and this parameter would have a value of 4. If
there are less than 8 processes for each network then the result will be
non-compliant with the SFS run rules.
CLIENTS
CLIENTS is used to specify the host names of the clients used for
generating the NFS load points.
MNT_POINTS
List of file systems to be used for the benchmark execution. This list
should be generated to comply to the uniform access requirements defined in
"SPEC's Description of Uniform Access for SFS 3.0" .
BIOD_MAX_WRITES
Specifies the number of outstanding or async writes that the benchmark
will generate per benchmark process. The minimum number is two and there is
no maximum number.
BIOD_MAX_READS
Specifies the number of outstanding or async reads that the benchmark will
generate per benchmark process. The minimum number is two and there is no
maximum number.
TCP
Specifies if TCP should be used as the transport mechanism to contact the
NFS server for all generated transactions. Default is to use UDP, if this
option is set to "on" then TCP will be used.
NFS_VERSION
Specifies the version of the NFS protocol to use for benchmark execution.
The default is version 2 and if "3" is specified, NFS version 3
will be used for the benchmark execution.
SFS_USER
The user account name which is configured on all clients to be used for
the benchmark execution. Each client should be configured to allow this
user execution of the benchmark.
SFS_DIR
Path name which specifies the location of the benchmark executables. Each
client should be configured to use the same path.
WORK_DIR
Path name where all benchmark results are placed. Each client should be
configured to have this path available.
PRIME_MON_SCRIPT
Name of a shell script or other executable program which will be invoked
to control any external programs. These external programs must be
performance neutral. If this option is used, the executable used must be
disclosed.
PRIME_MON_ARGS
Arguments which are passed to the executable specified in
PRIME_MON_SCRIPT.
RSH
The default for this option is the rsh command. For those operating
environments which do not use rsh for remote execution, this option should
be set to the appropriate remote execution program. This value applies to
the prime client.
There are two mechanisms which can be used for obtaining valid benchmark executions. The first is the use of the sfs_mgr script. For those familiar with the benchmark, this shell script can be used in combination with an RC file for benchmark execution. The second is to use the runsfs script. This script is a menu based utility that will provide a helping hand to the user that is somewhat unfamiliar with the benchmark and its execution.
Since it is the intent of these run and disclosure rules to provide the standard by which customers can compare and contrast NFS server performance, it is important to provide all the pertinent information about the system tested so this intent can be met. The following describes what is required for disclosure of benchmark results. It is recognized that all of the following information can not be provided with each reference to benchmark results. Because of this, there is a minimum amount of information that must be always be present and upon request, the party responsible for disclosing the benchmark results must provide a full disclosure of the benchmark configuration. Note that SPEC publication requires a full disclosure.
The following are the minimum allowable disclosure of benchmark results
The XXX would be replaced with the throughput value obtain from the right most data point of the throughput / response time curve generated by the benchmark. The YYY would be replaced with the overall response time value as generated by the benchmark.
The information described in the following sections should be sufficient for reproduction of the disclosed benchmark results. If additional information is needed, the party disclosing the results should provide the information as a note or additional disclosure. All product names and model numbers and configurations should be complete such that the information provided could be used to order the disclosed products.
Server stable storage configuration
Other server hardware configuration
Network hardware
configuration
These apply for the configuration which used network components to build
the test configuration.
Disclosure Notes
The Notes section is used to document: