[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]


[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)

O2 SPEC95 Performance Comparison
Using Different CPUs

Last Change: 12/Aug/1998

SPEC's Introduction to SPEC95

SPECfp95 Analysis

SPECint95 Analysis


(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale)
(the graphs are also to the same scale as those given on other single-CPU comparison pages)

O2 SPECfp95 Performance Comparison


Objectives

This analysis examines how different CPUs perform in O2, ie. the focus is on how different R5000s and R10000s perform in the same system, in this case O2 (I have separate pages dealing with how the same CPU performs in different systems).

Note that I do not have any SPEC95 data for the following CPUs when used in O2:

Since some older O2 systems may be using these CPUs, I would like to add them to the study. So, please contact me if you have any published detailed SPEC95 data for the above CPUs (note that final base and peak averages are of little use; it's the detailed individual test results I'm looking for). I'm also looking for data for R10K/225 in O2.

As with all these studies, a 3D Inventor model of the data is available (screenshots of this are included below). Load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective). Rotate the object 30 degrees horizontally and then 30 degrees vertically (use Roty and Rotx thumbwheels) - that'll give the standard isometric view. I actually found slightly smaller angles makes things a little clearer (15 or 20 degrees) so feel free to experiment. Note that newer versions of popular browsers may be able to load and show the object directly, although such browsers may not offer Orthographic viewing.

All source data for this analysis came from www.specbench.org.

Given below is a comparison table of available SPECfp95 test results for O2. Faster CPUs are leftmost in the table (in the Inventor graph, they're placed at the back). After the table and 3D graphs is a short-cut index to the original results pages for the various systems.

          R10000   R10000    R5000SC    R5000PC
          250MHz   195MHz    180MHz     180MHz

tomcatv    10.2     9.78      7.35       6.77
swim       14.4     13.9      10.6       10.6
su2cor     5.40     4.72      2.42       1.94
hydro2d    3.26     3.17      2.48       2.48
mgrid      7.26     6.95      5.02       4.39
applu      6.49     5.92      4.23       4.05
turb3d     11.1     9.57      5.60       4.25
apsi       11.6     9.77      5.47       4.39
fpppp      37.2     29.3      10.3       7.96
wave5      12.8     11.8      6.92       4.20

          O2 SPECfp95 Comparison

[Left Isometric View] [Right Isometric View]

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz R10000 | 195MHz R10000 | 180MHz R5000SC | 180MHz R5000PC]


Next, a separate 2D comparison graph for each of the ten SPECfp95 tests:

tomcatv:

tomcatv comparison graph

swim:

swim comparison graph

su2cor:

su2cor comparison graph

hydro2d:

hydro2d comparison graph

mgrid:

mgrid comparison graph

applu:

applu comparison graph

turb3d:

turb3d comparison graph

apsi:

apsi comparison graph

fpppp:

fpppp comparison graph

wave5:

wave5 comparison graph

Observations

Obviously, the R10000 performs better than R5000 in O2, though not by much in some cases. Remember though that the above data does not include the 200MHz R5000 which has a larger L2 cache than the 180MHz R5000 (1MB compared to 512K), so don't form any concrete judgements just yet. I need to obtain detailed R5K/200 SPEC results in order to offer a complete picture, especially since R5K/200 is probably the most popular CPU being used in O2s today (ie. in terms of the configuration of O2s being sold at the moment).

There are a variety of reasons why R10000 doesn't perform as well in O2 compared to Octane or Origin. These are discussed in depth on the O2 architecture page, and further on the R10000/195 comparison page, so I won't repeat the details here.

One important point though: ignore fpppp. That particular test involves a tiny data set (small enough to fit into L1 cache on R10000, never mind L2 cache!), somewhere between 8K and 32K; it's highly unlikely that a typical task in today's computing environment will involve data sets anything like as small as that used by fpppp. Most modern applications process far larger and more complex data sets, sometimes into the gigabyte range (eg. seismic modeling, etc.) This isn't to say your task isn't like fpppp, just that's it's very unlikely. In the context of the above results, fpppp is annoying because it skews the final averages - this is why none of my SPEC95 analysis pages ever use SPEC averages as a basis for making conclusions.

Note that fpppp will not be included in SPEC98.


O2 SPECint95 Performance Comparison

Just as for the SPECfp95 analysis given above, you can download a 3D performance graph (gzipped) if you wish: load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective), etc.

The rationale and method for this examination were the same as for SPECfp95. Thus, given below is a comparison table of the various SPECint95 test results. After the table and 3D graphs is a short-cut index to the original results pages.

          R10000   R10000    R5000SC    R5000PC
          250MHz   195MHz    180MHz     180MHz

go         13.9     11.0      5.19       3.51
m88ksim    14.5     11.1      5.25       5.04
gcc        10.7     9.02      4.57       3.16
compress   12.0     10.6      3.64       2.45
li         11.9     9.42      5.25       4.36
ijpeg      11.5     9.35      4.40       4.04
perl       15.7     13.0      6.52       5.04
vortex     9.74     8.20      4.27       2.89

          O2 SPECint95 Comparison

[Left Isometric View] [Right Isometric View]

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz R10000 | 195MHz R10000 | 180MHz R5000SC | 180MHz R5000PC]


Next, a separate 2D comparison graph for each of the eight SPECint95 tests:

go:

go comparison graph

m88ksim:

m88ksim comparison graph

gcc:

gcc comparison graph

compress:

compress comparison graph

li:

li comparison graph

ijpeg:

ijpeg comparison graph

perl:

perl comparison graph

vortex:

vortex comparison graph

Observations

Obviously, R10000 is much better than R5000 in O2 for integer tasks, but remember that the 200MHz R5000 (with 1MB L2) is not included in this study (no data available yet).

Other SPEC95 analysis pages I've written have described the way in which most SPECint95 tests involve small data sets, resulting in few cache misses. Only vortex (and to a lesser extent gcc) involves a varied memory access patern, benefiting from a large L2 cache as shown by the 250MHz R10000 4MB L2 Origin results. Thus, baring in mind the factors affecting cache miss behaviour in R10K O2, it isn't surprising to see vortex showing the lowest value out of the eight R10K O2 SPECint95 results. If one looks at the tests which do not seem to involve heavy memory traffic, eg. perl and m88ksim, O2 shows good results, matching Octane and Origin fairly well (though it's within the margins of error that result from compiler optimisation, R10K/250 O2 actually beats R10K/250 Origin2000 for m88ksim).

What this means is that if you have an int task which doesn't cause heavy memory traffic and doesn't benefit much from a large (>1MB) L2 cache, then you'll get good performance with an R10K O2, ie. there's no need to spend a fortune on an Origin2000. This also means that, if you have a variety of int tasks and systems, it's well worth experimenting to see which system offers the best performance for each task; it's entirely possible that swapping two tasks over between two different machines may result in better performance for one task but no loss of performance for the other (an extreme example would be vortex vs. m88ksim for O2 vs. Origin2000).

How can one tell if one's task is like m88ksim, perl, etc.? Well, one can use tools like gr_osview to study what's happening whilst the code is running (degree of memory traffic, etc.), one can run comparison tests using other systems with identical CPUs to test the effects of different L2 sizes (eg. 195MHz R10000 4MB L2 Origin vs. 195MHz R10000 1MB L2 Octane, or 195MHz 1MB L2 Power Challenge vs. 195MHz 2MB L2 Power Challenge), and one can run tests using old vs. new systems with the same CPUs and L2 sizes to test whether one's task can benefit from better memory latency, higher memory bandwidth and better outstanding cache miss support (eg. 195MHz 1MB L2 Octane vs. 195MHz 1MB L2 Indigo2).

By way of typical evidence, note that m88ksim shows no improvement when moving from 195MHz 1MB L2 Power Challenge to 195MHz 2MB L2 Power Challenge (the same applies to 1MB L2 Octane vs. 4MB Origin). Performance differences only really show up when looking at R10K/250, but even then the margins are not great (vortex is the exception).

Of course, you'll need access to different systems to run comparison tests (though you can use gr_osview on any system to gain some insight), but if SGI values your custom then they should be willing to help out with tests in the event that you do not have access to the necessary systems.

Remember that many typical daily tasks involve small data sets, eg. processing typical Internet movie frames (half-size PAL, half-size NTSC). But sometimes careful thought is required; eg. a full-size NTSC frame will fit into a 1MB L2 cache (0.9MB), but a full-size PAL frame will not (1.27MB). Thus, a system with more than 1MB L2 will be better for processing PAL data (eg. 4MB L2 versions of R10000 in Origin). Though it's possible one may be able to hardware-accelerate a movie processing task, depending on the system (hardware JPEG support on O2 with ICE, and other systems using video accelerator boards, eg. Octane Compression).

Judging exactly which system is best for a particular task, or which processor is best for a system already decided upon (perhaps because of budget constraints) may not always be easy. Thus, always have proper tests done, and investigate thoroughly, before making any final purchasing decision.


Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)
[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]
[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]