Weighted Geometric Mean Selected
for SPECviewperf® Composite Numbers
by Bill Licea-Kane
At its February 1995 meeting in Salt Lake City, a subcommittee
within the SPECopcSM project
group was given the task of recommending a method for deriving
a single composite metric for each viewset running under the SPECviewperf®
benchmark. Composite numbers had been discussed by the SPECopc
group for more than a year.
In May 1995, the SPECopc project group decided to adopt a weighted
geometric mean as the single composite metric for each viewset.
What is a Weighted Geometric Mean?
Above is the formula for determining a weighted geometric mean,
where "n" is the number of individual tests in a viewset, and
"w" is the weight of each individual test, expressed as a number
between 0.0 and 1.0. (A test with a weight of "10.0%" is a "w"
of 0.10. Note the sum of the weights of the individual tests must
equal 1.00.)
Why the Weighted Geometric Mean?
The SPECopc subcommittee that recommended a method for determining
composite numbers started with the description for assigning weights
that is provided to each creator of a viewset: "Assign a weight
to each path based on the percentage of time in each path..."
Given this description, the weighted geometric mean of each viewset
is the correct composite metric. This composite metric is a derived
quantity that is exactly as if you ran the viewset tests for 100
seconds, where test 1 was run for 100 × weight1
seconds, test 2 for 100 × weight2 seconds,
and so on.
The end result would be the number of frames rendered/total time
which will equal frames/second. It also has the desirable property
of "bigger is better"; that is, the higher the number, the better
the performance.
Why Not Weighted Harmonic Mean?
Since the results of SPECviewperf are expressed as "frames/second,"
the subcommittee was asked why we did not choose the weighted
harmonic mean. The weighted harmonic mean would have been the
correct composite if the description published for SPECviewperf
read as follows: "Assign a weight to each path based on the percentage
of operations in each path..."
Given this description, the weighted harmonic mean would be as
if you ran the viewset tests for 100 frames, where 100 ×
weight1 frames were drawn with test1, the next
100 × weight2 frames were drawn by test2,
and so on. The 100 frames divided by the total time would be the
weighted harmonic mean.
Since the weights for the viewsets were selected on percentage
of time, not percentage of operations, we chose the weighted geometric
mean over the weighted harmonic mean.
What About Weighted Arithmetic Mean?
The weighted arithmetic mean is correct for calculating grades
at the end of a school term. It is not correct for the situation
we face here.
Consider for a moment a trivial example, where there are two
tests, equally weighted in a viewset:
|
Test 1
|
Test 2
|
Weighted Arithmetic Mean
|
System A |
1.0
|
100.0
|
50.5
|
System B |
1.1
|
100.0
|
50.55
|
System C |
1.0
|
110.0
|
55.5
|
System B is 10-percent faster at Test1 than System A. System
C is 10-percent faster at Test2 than System A. But look at the
weighted arithmetic means. System B's weighted arithmetic mean
is only .1-percent higher than System A's, while System C's weighted
arithmetic mean is 10-percent higher. Even normalization doesn't
help here.
Why Not Normalized Weighted Geometric Mean?
Since our weights were percentage of time and since the results
from SPECviewperf are expressed in frames/sec, we were not obligated
to normalize. Normalization introduces many issues of its own,
starting with something as simple as how to select a reference
system.
We invite readers to select two different systems whose results
are published in this newsletter and to use each one as the reference
system. You will discover quickly that the normalized weighted
geometric means change only in absolute magnitude. If the weighted
geometric mean of System B is 10-percent higher than System A,
for example, the normalized weighted geometric mean of System
B will be 10-percent higher than System A, no matter what reference
system you choose.
Is There a Disadvantage to Weighted Geometric Mean?
As with any composite, the weighted geometric mean can act as
a "filter" for results; this introduces the danger that important
information might be lost and inappropriate conclusions could
be drawn. So, proper use of these composites is important. Use
the composite as an additional piece of information. But also
take a look at each individual test result in a viewset.
Please don't rely exclusively on any synthetic benchmark such
as SPECviewperf. In the end, isn't actual application performance
on an actual computer system what you are really attempting to
find?
Bill Licea-Kane is a founding member of SPECopc and a former chair of the project group.
|