Next: Use of Scattergrams for Defining Up: EMME/2 News 8 October 1989 Previous: Laser Printer Support for GPR

EMME/2 BENCHMARKS

In past EMME/2 News issues, several benchmarks have been published each time EMME/2 was ported to a new product. These benchmarks reported the CPU time for auto and transit assignment with the standard Winnipeg demonstration data base. In an attempt to give a general picture of the relative machine performance and evolution over time, we have gathered here all the benchmark results accumulated over the years. We will also compare some of these results with those obtained by running well known CPU and FPU benchmarks.

The following table gives all the EMME/2 benchmark results sorted according to auto assignment performance. The lines marked with "*" give results for the 80386 using native 32-bit protected mode.

Computer Model Processor Speed Date Auto Assignment Trans.

CPU/FPU MHz 1 it. Total Ass.

IBM 3090 NEC NAS AS/EX90 n/a 09/89 8.4 92.7 19.1

HP 9000-835 HP RISC 30 09/89 10.3 113.6 23.4

SUN SPARCserver 330 SPARC 25 08/89 11.1 n/a n/a

SUN SPARCstation1 SPARC 20 09/89 14.9 163.2 28.9

HP 9000-825 HP RISC 25 09/89 18.6 202.5 56.5

VAX 8600 DEC n/a 03/87 23.1 n/a n/a

Interpro 340 Clipper 25 03/89 29.8 325.9 92.2

Definicon PM-030 68030/882 33 09/89 34.1 382.1 80.5

Interpro 120 Clipper 20 03/89 36.3 412.8 116.8

COMPAQ 20e (*) 80386/387 20 09/89 43.7 491.3 101.7

Definicon DSI-780 68020/881 20 09/89 51.6 580.6 123.5

COMPAQ 20 (*) 80386/387 20 09/89 58.3 653.4 135.0

SUN 3/60 68020/881 20 09/89 63.4 702.9 132.3

IBM PS2/70 (*) 80386/387 20 09/89 64.8 721.8 147.6

COMPAQ 20e 80386/387 20 09/89 66.2 737.1 214.4

Definicon DSI-780 68020/881 17 03/87 67.5 n/a 152.5

Masscomp 5520 68020/lightning n/a 09/89 73.2 806.0 172.3

COMPAQ 20 (*) 80386/387 16 07/88 75.2 n/a 183.6

Masscomp 5400 68020/881 n/a 11/86 86.0 937.9 190.2

COMPAQ 20 80386/387 20 09/89 86.1 957.0 267.7

IBM PS2/70 80386/387 20 09/89 91.8 1018.2 297.5

VAX 11/780 DEC n/a 03/87 119.2 n/a 300.5

microVAX II DEC n/a 03/87 151.2 n/a 450.3

VAX station 2000 DEC n/a 09/89 167.9 1842.4 392.2

CLUB 286 80286/287 10 09/89 177.2 1947.9 477.5

HP 9000-500 HP n/a 08/85 186.5 2028.8 292.0

AT clone (Taiwan) 80286/287 12 09/89 199.8 2200.7 548.6

SUN 3/50 68020/881 15 03/87 202.0 n/a 247.7

Definicon DSI-32 32032/081 10 11/85 223.4 2455.9 534.8

IBM AT 80286/287 8 02/88 262.5 n/a 1193.4

Masscomp 500 68010 n/a 06/85 291.3 3166.0 499.6

Symmetric 375 32016/081 10 12/86 417.3 4526.9 862.0

AT&T Unix-PC 7300 68010 10 01/84 463.5 5147.6 n/a

Burroughs XT550 68010 10 10/85 522.4 5695.7 1503.5

Pixel 100/AP 68000 10 06/85 549.5 5959.0 715.8

SUN 2/50 68010 10 06/85 584.6 6391.1 730.5

Not surprisingly, the above table is also nearly in reverse chronological order. In the five year span covered by the benchmarks, a tremendous progress has been made at the hardware level. Consider the 68000 microprocessors family for example: a factor of 15 has been gained from the 10 MHz 68000 with an iteration time of 550 sec to the 33 MHz 68030 with an iteration time of 34.1 sec! Similarly, the 20 MHz 80386 is 6 times faster than its 8 MHz 80286 predecessor. In this last case, the difference would even be greater if we could have included results from 25 and 33 MHz 80386 or the new generation of 80486 machines which are now available. Then, at the top of the table, we have RISC microprocessors nearing the IBM 3090 mainframe results. At these speeds, we can almost speak of interactive assignments!

For all the 80386 machines tested, the improvement obtained when using the native protected mode (lines marked with "*" in the above table) is about 30% for the auto assignment and 50% for the transit assignment.

Different computer models are usually compared using small standard benchmarks that try to isolate one system component (i.e CPU, FPU, disk I/O, etc). An interesting question is: if machine X is 10 times faster than machine Y according to benchmark Z, will EMME/2 also run 10 times faster? In other words, can we predict the EMME/2 performance on a machine based solely on a few standard benchmark results? In a search for an answer to this last question, we have conducted a small study with three benchmarks testing the performance of the CPU, the FPU and a mix of both. To be even more complete, the study could also have included disk I/O and memory access benchmarks, but we decided to concentrate on CPU and FPU benchmarks, since EMME/2 is known to be processor bound. The following benchmarks were used:

WHETSTONE: This well known program was written in 1976. It is often termed a pure FPU benchmark, but in fact it reflects a program carrying out mixed calculations using 60instructions. Although often used, it is not a very good benchmark, because it can be simplified by some optimizing compilers, making the results appear better than they really are, therefore invalidating comparisons between machines. A single precision version swhet and a double precision version dwhet were used for the study. The output for this benchmark is expressed in thousand whetstones/sec.: the higher the result, better the performance is.
SIEVE: Another well known benchmark since 1981, its purpose is to measure the CPU preformances by doing mostly array indexing and integer arithmetic. This program can still be simplified by some compilers but to a lesser degree than the whetstone program. Written in C, a "register short" integer version ssieve and a "register long" integer version lsieve were used for the study.
FLOAT: This is a small "homebrew" benchmark which we have been using for many years in order to quickly test FPU throughput of a system. It is a simple loop computing the value of LN(2):
```
      real*4 a,b
      a=0.
      b=1.
   10 a=a+1./b-1./(b+1.)
      b=b+2.
      if(b.lt.1000000.)goto 10
      print *,a
      end
```
Two versions were used: a single precision sfloat and a double precision dfloat.

The following table gives the benchmark results along with some EMME/2 results taken from the previous table. All the tests where done with the basic hardware configuration, without software accelerators such as high level memory cacheing (memcache) or virtual disk (vdisk). For multi-user operating systems, the programs where run at night to avoid as much as possible interference from other users. (Although it would be interesting in itself to observe the degradation under normal and heavy workloads.) Except for swhet and dwhet, all results are expressed in seconds.

Computer swhet dwhet sfloat dfloat ssieve lsieve Auto Transit Buildwpg

SPARCstation1 6303 4249 3.0 4.8 1.75 1.27 163.2 28.8 382

HP 9000-825 3491 2631 5.3 6.0 1.83 1.49 202.5 56.5 450

PM-030 2345 2129 8.1 9.3 3.32 2.89 382.1 80.5 870

SUN 3/60 1186 1171 17.7 18.3 4.68 4.93 702.9 132.3 1147

DSI-780 1220 1120 15.3 17.9 5.24 4.60 580.6 123.5 1146

Masscomp 5520 2500 1697 6.1 10.4 5.45 5.88 806.0 172.3 1124

COMPAQ 20e (*) 1388 1303 15.0 16.8 5.33 3.73 491.3 101.6 1044

COMPAQ 20e 1036 924 14.2 14.9 5.89 10.22 737.1 214.4 1359

COMPAQ 20 (*) 1282 1175 15.0 17.4 6.53 4.15 653.4 135.0 1294

COMPAQ 20 950 842 14.4 15.4 8.03 13.84 957.0 267.7 1804

IBM PS2/70 (*) 1158 1064 18.1 20.8 6.70 4.16 721.8 147.6 1453

IBM PS2/70 883 782 16.3 17.7 8.08 13.74 1018.2 297.5 1818

CLUB 286 281 254 68.9 74.0 15.16 25.35 1947.9 477.5 2981

AT clone 285 257 64.8 70.0 17.78 28.74 2200.7 548.6 3399

DSI-32 312 277 34.4 42.3 14.60 12.60 2415.0 513.0 3826

Symmetric 375 196 196 61.7 54.6 36.39 26.38 4175.2 817.0 6253

VAX st. 2000 n/a n/a 13.3 19.3 19.30 13.30 1842.4 392.2 2462

The following table gives the sample correlation matrix between the benchmark times (note that for this the whetstone/sec results of swhet and dwhet were first inverted to obtain the corresponding time values):

swhet dwhet sfloat dfloat ssieve lsieve Auto Transit

dwhet .99

sfloat .94 .95

dfloat .91 .93 .99

ssieve .95 .92 .76 .71

lsieve .90 .91 .91 .89 .82

Auto .97 .96 .80 .77 .98 .84

Transit .98 .97 .85 .82 .95 .91 .98

Buildwpg .97 .95 .82 .78 .96 .84 .99 .98

Four points are worth noting in this last table:

The auto and transit assignment times and the total elapsed time for the buildwpg macro are highly correlated. One or the other of these variables can thus be used to characterize the computational performance of EMME/2 on a given machine. This can also be seen in the first table which is sorted according to auto assignment time and almost sorted according to transit assignment time.
The .99 correlation coefficients observed between swhet and dwhet on one hand and between sfloat and dfloat on the other hand indicates that measuring single or double precision throughput doesn't seem to make an overall difference.
The very high correlation coefficients in the ssieve and swhet columns indicate that either of these benchmarks is a good predictor for the EMME/2 performances.
Although not as marked as for ssieve and swhet, there is a also high correlation between sfloat and the EMME/2 variables.

Keeping in mind the small sample size (17 observations), this small study would suggest that even if depending on the FPU throughput (high correlation with sfloat) EMME/2 seems more CPU bound than FPU bound even for the auto assignment.

In a next article, we will compare the speed of the various graphic displays and also run benchmarks on the GPL and GPR utilities, using various plotters and printers.

Next: Use of Scattergrams for Defining Up: EMME/2 News 8 October 1989 Previous: Laser Printer Support for GPR

Heinz Spiess, EMME/2 Support Center, Thu Jun 6 14:19:19 MET DST 1996

Computer Model	Processor	Speed	Date	Auto Assignment		Trans.
	CPU/FPU	MHz		1 it.	Total	Ass.
IBM 3090	NEC NAS AS/EX90	n/a	09/89	8.4	92.7	19.1
HP 9000-835	HP RISC	30	09/89	10.3	113.6	23.4
SUN SPARCserver 330	SPARC	25	08/89	11.1	n/a	n/a
SUN SPARCstation1	SPARC	20	09/89	14.9	163.2	28.9
HP 9000-825	HP RISC	25	09/89	18.6	202.5	56.5
VAX 8600	DEC	n/a	03/87	23.1	n/a	n/a
Interpro 340	Clipper	25	03/89	29.8	325.9	92.2
Definicon PM-030	68030/882	33	09/89	34.1	382.1	80.5
Interpro 120	Clipper	20	03/89	36.3	412.8	116.8
COMPAQ 20e (*)	80386/387	20	09/89	43.7	491.3	101.7
Definicon DSI-780	68020/881	20	09/89	51.6	580.6	123.5
COMPAQ 20 (*)	80386/387	20	09/89	58.3	653.4	135.0
SUN 3/60	68020/881	20	09/89	63.4	702.9	132.3
IBM PS2/70 (*)	80386/387	20	09/89	64.8	721.8	147.6
COMPAQ 20e	80386/387	20	09/89	66.2	737.1	214.4
Definicon DSI-780	68020/881	17	03/87	67.5	n/a	152.5
Masscomp 5520	68020/lightning	n/a	09/89	73.2	806.0	172.3
COMPAQ 20 (*)	80386/387	16	07/88	75.2	n/a	183.6
Masscomp 5400	68020/881	n/a	11/86	86.0	937.9	190.2
COMPAQ 20	80386/387	20	09/89	86.1	957.0	267.7
IBM PS2/70	80386/387	20	09/89	91.8	1018.2	297.5
VAX 11/780	DEC	n/a	03/87	119.2	n/a	300.5
microVAX II	DEC	n/a	03/87	151.2	n/a	450.3
VAX station 2000	DEC	n/a	09/89	167.9	1842.4	392.2
CLUB 286	80286/287	10	09/89	177.2	1947.9	477.5
HP 9000-500	HP	n/a	08/85	186.5	2028.8	292.0
AT clone (Taiwan)	80286/287	12	09/89	199.8	2200.7	548.6
SUN 3/50	68020/881	15	03/87	202.0	n/a	247.7
Definicon DSI-32	32032/081	10	11/85	223.4	2455.9	534.8
IBM AT	80286/287	8	02/88	262.5	n/a	1193.4
Masscomp 500	68010	n/a	06/85	291.3	3166.0	499.6
Symmetric 375	32016/081	10	12/86	417.3	4526.9	862.0
AT&T Unix-PC 7300	68010	10	01/84	463.5	5147.6	n/a
Burroughs XT550	68010	10	10/85	522.4	5695.7	1503.5
Pixel 100/AP	68000	10	06/85	549.5	5959.0	715.8
SUN 2/50	68010	10	06/85	584.6	6391.1	730.5

Computer	swhet	dwhet	sfloat	dfloat	ssieve	lsieve	Auto	Transit	Buildwpg
SPARCstation1	6303	4249	3.0	4.8	1.75	1.27	163.2	28.8	382
HP 9000-825	3491	2631	5.3	6.0	1.83	1.49	202.5	56.5	450
PM-030	2345	2129	8.1	9.3	3.32	2.89	382.1	80.5	870
SUN 3/60	1186	1171	17.7	18.3	4.68	4.93	702.9	132.3	1147
DSI-780	1220	1120	15.3	17.9	5.24	4.60	580.6	123.5	1146
Masscomp 5520	2500	1697	6.1	10.4	5.45	5.88	806.0	172.3	1124
COMPAQ 20e (*)	1388	1303	15.0	16.8	5.33	3.73	491.3	101.6	1044
COMPAQ 20e	1036	924	14.2	14.9	5.89	10.22	737.1	214.4	1359
COMPAQ 20 (*)	1282	1175	15.0	17.4	6.53	4.15	653.4	135.0	1294
COMPAQ 20	950	842	14.4	15.4	8.03	13.84	957.0	267.7	1804
IBM PS2/70 (*)	1158	1064	18.1	20.8	6.70	4.16	721.8	147.6	1453
IBM PS2/70	883	782	16.3	17.7	8.08	13.74	1018.2	297.5	1818
CLUB 286	281	254	68.9	74.0	15.16	25.35	1947.9	477.5	2981
AT clone	285	257	64.8	70.0	17.78	28.74	2200.7	548.6	3399
DSI-32	312	277	34.4	42.3	14.60	12.60	2415.0	513.0	3826
Symmetric 375	196	196	61.7	54.6	36.39	26.38	4175.2	817.0	6253
VAX st. 2000	n/a	n/a	13.3	19.3	19.30	13.30	1842.4	392.2	2462