Performance comparison of MCA video cards

My 55SX runs Linux and XFree86, and I recently upgraded it from XGA-I to XGA-II. The graphics didn't really seem any faster, possibly slower if 16-bit colour is being used. But still miles faster than the planar VGA. That made me wonder, what is the fastest video card in X and by how much?

On machines with both 32-bit and 16-bit MCA slots, how much faster is it to use the wider slot?

Hardware tested

The first test machine was my 55SX with 16 megs of RAM (8 on planar, 8 on Orchid card). It may seem crazy to test graphics performance on such a slow machine but this should emphasize any speedup caused by hardware acceleration rather than using the slow CPU to draw.

There were three pieces of video hardware I tested on the 55SX - the planar VGA, an XGA-I card with 1Mbyte VRAM, and an XGA-II card (the early type with hedgehog heatsink). The planar VGA I tried in 4bpp (16 colours) using the XF86_VGA16 X server, and in 1bpp (black and white) using XF86_Mono. Running in monochrome is a common trick to get faster video and less memory usage. The XGA-I and XGA-II use the XF86_AGX server; I tested XGA-II in 8bpp and 16bpp but the XGA-I works only in 8bpp under XFree86. So altogether there are five video modes to test. The resolution in each case was standard VGA, 640x480, at 60Hz refresh rate.

Secondly, I tested an XGA-II card in my 8580-111, upgraded with Blue Lightning CPU. I tested the XGA-II in a 16-bit slot and in a 32-bit slot. The video mode was again 640x480 @ 60Hz, and the colour depth was 16bpp. Of course, XF86_AGX is the X server to use in both cases.

I also tested my IBM PS/ValuePoint 486DX-33. This is a klone ISA machine with a much faster CPU than the 55SX (and faster than the 8580), but its video is an on-board Tseng ET4000. I included this out of interest to see how a 55SX (with slow CPU but accelerated graphics) compares against a machine which must do all graphics operations by hand, but has a much faster CPU to do them with.

Testing method

XFree86 comes with a tool called x11perf which benchmarks basic operations like line drawing and scrolling, and a few not so basic operations. It normally performs these operations repeatedly for five seconds to work out how many can be performed per second, and repeats this test five times. This was far too slow (there are many tests to run) so I chose one second testing, performed only once. The machines (especially the 55SX) are relatively slow so one second is probably adequate, and x11perf seems to do a short warm-up for each operation before it starts timing. Another problem was that some operations were so slow they couldn't be performed even once per second on the 55SX's planar VGA. In this case x11perf seems to do the right thing by letting the operation finish and then dividing by the time taken to get a number of operations per second like 0.7.

To minimize the effect of disk caches and other 'warm-up' effects which might make a set of tests run more quickly or more slowly the second time, I rebooted the machine between tests and made sure the command to start testing was the first thing typed after logging in. This is especially important on the 55SX where the X server might be loaded into fast planar memory, or slower RAM on the Orchid card, according to luck. By rebooting you can be reasonably sure the same memory area is being used each time.

(Aside: it would be handy to make Linux intelligently use the faster memory first, and actively move stuff to the lower part of the address space if possible. People sometimes ask about this on the kernel mailing list, apparently you need to define different 'zones', whatever that means.)

Finally, some of the tests performed by x11perf crash the X server. According to the manual page this is quite normal 'for non-DEC servers' :-). It's a little worrying, but OTOH XF86_AGX has been stable enough not to break during ordinary use. So I just chose not to run those tests - quite a few as it turned out.

I ran x11perf by replacing my ~/.xinitrc file with something like:
x11perf -repeat 1 -time 1 -dot -rect1 -rect10 ... >x11perf.out 2>&1
where -dot, -rect1, etc. are the tests I wanted to run. Then you can run the tests by rebooting, logging in and saying startx. The .xinitrc files I used were:

Results

The results generated list the speed (in number of operations per second) for each of the tests. For example, the 55SX's planar VGA at 4bpp managed about 5790 operations per second of -dot.

Raw results from x11perf

How to interpret the results

XFree86 comes with a tool x11perfcomp to get comparisons from different sets of results. It tabulates the results from one or more runs of x11perf, so you can compare individual operations (like -dot, -rect10) side-by-side. But even this is too difficult to draw conclusions from, so I wrote a Perl script x11perfcompcomp to munge the results further. On the 55SX, I wanted to start with 16-colour VGA as a base, and find out how fast the other options were relative to this - so you could say things like 'XGA-I is 2.32 times as fast at drawing rectangles'. On the 8580, I wanted to use the results from a 16-bit slot as the base, and calclulate how much speedup comes from moving to a 32-bit slot. So x11perfcomp rescales the results for each test, making the first run have a score of 1.0.

Because there are a lot of tests, x11perfcompcomp also tries to categorize and group them, producing an average speed rating for particular classes of operations such as 'Line' or 'Scroll'. This is formed by adding the rebased speed ratings for each operation in the class and dividing by the number of operations in the class. This works out better than adding the un-rebased speeds in operations per second, which might give scores heavily weighted towards the results from the quicker operations. (Consider one op executing at 1000 per second, and another at 1 per second. Should you sum these numbers and divide by two?).

I don't make any claim that this is scientific or even sensible. All I can say is that a higher number is better.

Comparing results for the 55SX

For the comparison of different cards on the 55SX, I chose VGA at 4bpp (16 colours) as the base. How much speedup could you get switching from this to black-and-white, or to an accelerated card? I also included the PS/VP's results to see how the mighty 55SX compares against this machine.

Now some comments on the performance figures for classes of operations. The name of the class is followed by the timings (average executions per second) on the five different 55SX video setups, and on the PS/VP.

'Copy' operations

VGA 4bpp1
VGA 1bpp4.62
XGA-I 8bpp27.1
XGA-II 8bpp36.2
XGA-II 16bpp22.2
PS/VP 8bpp20.9

Going from 16-colour VGA to black and white VGA, you'd expect copying and scrolling performance to roughly quadruple, because one byte of memory now holds eight pixels (eight bits) rather than two 4-bit pixels. In fact it's even better.

The XGA-II at 8bpp is quite a bit faster than the XGA-I for this operation, and they are both hugely faster than the on-board VGA. Going to 16bpp slows down the XGA-II, but doesn't fully halve the speed as you might expect. It's still not much slower than the XGA-I and still miles faster than VGA.

The PS/VP cannot keep up with even the XGA-I although it doesn't do too badly. If you are going to shovel bits of screen around by hand it certainly helps to have a fast CPU to do it with.

'Dashed line' operations

VGA 4bpp1
VGA 1bpp0.839
XGA-I 8bpp1.36
XGA-II 8bpp1.38
XGA-II 16bpp1.35
PS/VP 8bpp69.2

This seems to be a CPU-intensive operation, judging by how much faster the 486 clone machine is. Surprisingly the 1bpp VGA is a little slower than the 4bpp, perhaps because the X server is less well optimized (see below). Going to XGA or XGA-II still gives a worthwhile improvement - if your life depends on drawing dashed lines quickly. I don't know what they are used in apart from benchmarks.

(Actually, I have noticed that the dashed line support in XF86_AGX is buggy. My window manager uses them to highlight 'OK' and 'Cancel' buttons in a dialogue box (like MS Windows) and on an 8580 with XGA-II the line is dashed rather randomly, while on the 55SX with XGA-II it doesn't appear at all.)

'Dot' operations

VGA 4bpp1
VGA 1bpp2.33
XGA-I 8bpp2.14
XGA-II 8bpp2.23
XGA-II 16bpp2.16
PS/VP 8bpp21.8

Here, going to black-and-white will get just as good a speedup as changing your video card.

'Horizontal line' operations

VGA 4bpp1
VGA 1bpp1.34
XGA-I 8bpp2.33
XGA-II 8bpp2.41
XGA-II 16bpp2.33
PS/VP 8bpp16.4

'Line' operations

VGA 4bpp1
VGA 1bpp1.03
XGA-I 8bpp11.3
XGA-II 8bpp11.8
XGA-II 16bpp11.1
PS/VP 8bpp39.6

The XGA-* must have line drawing in hardware. That's pretty much the minimum for an 'accelerated' card I guess.

'Opaque stippled rectangle' operations

VGA 4bpp1
VGA 1bpp2.8
XGA-I 8bpp58
XGA-II 8bpp88.9
XGA-II 16bpp61.4
PS/VP 8bpp11

Wow, pretty good. If only I knew what an opaque stippled rectangle is used for.

'Polygon' operations

VGA 4bpp1
VGA 1bpp1.55
XGA-I 8bpp2.26
XGA-II 8bpp2.27
XGA-II 16bpp2.27
PS/VP 8bpp8.28

'Rectangle' operations

VGA 4bpp1
VGA 1bpp1.11
XGA-I 8bpp11.2
XGA-II 8bpp14
XGA-II 16bpp10.3
PS/VP 8bpp10.7

'Scroll' operations

VGA 4bpp1
VGA 1bpp5.89
XGA-I 8bpp41.1
XGA-II 8bpp60.1
XGA-II 16bpp35.2
PS/VP 8bpp18.3

This, IMHO, is the most important benchmark. At least for someone like me who spends most of the time typing into xterms or text editors. The slow scrolling of the planar VGA is a real pain; even the clone's scrolling is annoyingly slow. It looks like 256-colour XGA-II is the winner if you value a quick response to the Enter key above all else.

'Stippled rectangle' operations

VGA 4bpp1
VGA 1bpp1.13
XGA-I 8bpp12.5
XGA-II 8bpp18.3
XGA-II 16bpp12
PS/VP 8bpp5.31

'Text' operations

VGA 4bpp1
VGA 1bpp4.08
XGA-I 8bpp49.1
XGA-II 8bpp51.2
XGA-II 16bpp49
PS/VP 8bpp66.8

The second most important benchmark - character drawing. The good performance of the XGA-* cards might be reduced if there isn't enough VRAM spare for use as a font cache. That's unlikely to happen on a 1Mbyte XGA-I, especially since Linux can use it only in 8bpp, but you might not want to get the maximum possible resolution out of your XGA-II.

'Tiled rectangle' operations

VGA 4bpp1
VGA 1bpp3.42
XGA-I 8bpp43.7
XGA-II 8bpp58.6
XGA-II 16bpp36.3
PS/VP 8bpp14

'Triangle' operations

VGA 4bpp1
VGA 1bpp1.71
XGA-I 8bpp2.3
XGA-II 8bpp2.35
XGA-II 16bpp1.72
PS/VP 8bpp8.88

'Vertical line' operations

VGA 4bpp1
VGA 1bpp0.979
XGA-I 8bpp19.2
XGA-II 8bpp18.6
XGA-II 16bpp16.1
PS/VP 8bpp21.9

This is the only operation where the monochrome display is slower than 16 colours. Perhaps because the lines are no longer conveniently aligned by bit planes.

All operations combined

And finally, the totally bogus 'overall performance' number from adding together all the rebased individual test scores and dividing:

VGA 4bpp1
VGA 1bpp2.45
XGA-I 8bpp25.2
XGA-II 8bpp32.8
XGA-II 16bpp23.4
PS/VP 8bpp25.2

Comparing results for the 8580

For this machine, there were just two sets of benchmarks: XGA-II at 16bpp in a 16-bit slot, and the same setup with a 32-bit slot.

These results are much less varied than those from the 55SX, so I won't list them all here. One thing to notice is that the operations that make the video card work hard - scrolling and drawing big rectangles - show now speedup at all. This is to be expected; they don't involve sending stuff across the bus, so they wouldn't be affected by moving to a 32-bit slot. OTOH, 'Copy 500x500 from window to pixmap' is 70% faster in a 32-bit slot; this must be because it involves doing a screen grab and sending the data over the MCA bus.

The totally unscientific total performance measure indicates that the 32-bit slot gives an 11% performance improvement. Most of the individual benchmarks show an improvement between zero and 15%.

Problems

The tests run by x11perf may not reflect real-world usage. As I mentioned above, I'm sceptical that the average user spends all day drawing 64-gons and stippled rectangles. Or at least, a user who does that would be using an SGI workstation not a PS/2 Model 55SX! It would be interesting to instrument an X server in actual use to see what primitives are used most often. Or to use a real application for testing (like the Solitaire benchmark). I think that the 'Text' and 'Scroll' classes of operations are the ones that really matter.

Then the X server may not be taking full advantage of the hardware. Apparent improvements from XGA-I to XGA-II may be just because the X server doesn't understand how to drive XGA-I to the max. OTOH, the same AGX chipset family is used in many cards (the XF86_AGX server was designed for these clones, with unsupported XGA-* driving appearing as a side effect) and unless IBM's cards have extra commands not present in the clone cards, I'd expect that the command set is pretty well covered.

Also, I do want to measure how fast the different video options are under XFree86. But I'd advise some caution in using these results to make judgements about the hardware itself, unless you know what the performance is like with other video drivers (eg OS/2).

Conclusions

If you have a slow machine, get an XGA or preferably XGA-II card. They work in the 55SX and 65SX, and arguably these are the machines where they make the most difference. The difference between XGA-I and XGA-II is relatively unimportant; what really matters is to use some sort of accelerated card rather than the planar VGA.

Failing that you can get some improvement by running in monochrome. (I don't think the test results of '2.45 times as fast' will be matched in practice, because the machine does other things than just draw graphics.)

16bpp XGA-II is about 2/3 the speed of 8bpp on the 55SX. I'd expect the difference to be greater on a faster machine, as time taken by the CPU declines relative to time taken by the graphics hardware.

The XGA and XGA-II on a really slow machine compare favourably against unaccelerated SVGA on a much faster box.

If your machine has both 16-bit and 32-bit slots, put your XGA-II card in a 32-bit slot if you have one spare, but don't worry too much if you don't. The difference in performance is pretty small.

Future work

Many of the tests would run on some configurations but not on others. I excluded them from consideration. This is a shame because there are some pretty interesting results in there, eg 'copy 100x100 n-bit deep plane' manages only 0.6/s on 4bpp VGA, but 31.7/s with 1bpp. (XGA gets 224/s and XGA-II won't run this test.)

Free testing service offered for any new video card donated...


Edward Avis
Last modified: Sun Mar 16 09:01:50 GMT 2003