My 55SX runs Linux and XFree86, and I recently upgraded it from XGA-I to XGA-II. The graphics didn't really seem any faster, possibly slower if 16-bit colour is being used. But still miles faster than the planar VGA. That made me wonder, what is the fastest video card in X and by how much?
On machines with both 32-bit and 16-bit MCA slots, how much faster is it to use the wider slot?
The first test machine was my 55SX with 16 megs of RAM (8 on planar, 8 on Orchid card). It may seem crazy to test graphics performance on such a slow machine but this should emphasize any speedup caused by hardware acceleration rather than using the slow CPU to draw.
There were three pieces of video hardware I tested on the 55SX -
the planar VGA, an XGA-I card with 1Mbyte VRAM, and an XGA-II
card (the early type with hedgehog heatsink). The planar VGA I
tried in 4bpp (16 colours) using the XF86_VGA16 X
server, and in 1bpp (black and white) using
XF86_Mono. Running in monochrome is a common trick
to get faster video and less memory usage. The XGA-I and XGA-II
use the XF86_AGX server; I tested XGA-II in 8bpp
and 16bpp but the XGA-I works only in 8bpp under XFree86. So
altogether there are five video modes to test. The resolution
in each case was standard VGA, 640x480, at 60Hz refresh rate.
Secondly, I tested an XGA-II card in my 8580-111, upgraded with Blue Lightning CPU. I tested the XGA-II in a 16-bit slot and in a 32-bit slot. The video mode was again 640x480 @ 60Hz, and the colour depth was 16bpp. Of course, XF86_AGX is the X server to use in both cases.
I also tested my IBM PS/ValuePoint 486DX-33. This is a klone ISA machine with a much faster CPU than the 55SX (and faster than the 8580), but its video is an on-board Tseng ET4000. I included this out of interest to see how a 55SX (with slow CPU but accelerated graphics) compares against a machine which must do all graphics operations by hand, but has a much faster CPU to do them with.
XFree86 comes with a tool called x11perf which
benchmarks basic operations like line drawing and scrolling, and
a few not so basic operations. It normally performs these
operations repeatedly for five seconds to work out how many can
be performed per second, and repeats this test five times. This
was far too slow (there are many tests to run) so I chose one
second testing, performed only once. The machines (especially
the 55SX) are relatively slow so one second is probably
adequate, and x11perf seems to do a short warm-up for each
operation before it starts timing. Another problem was that
some operations were so slow they couldn't be performed even
once per second on the 55SX's planar VGA. In this case x11perf
seems to do the right thing by letting the operation finish and
then dividing by the time taken to get a number of operations
per second like 0.7.
To minimize the effect of disk caches and other 'warm-up' effects which might make a set of tests run more quickly or more slowly the second time, I rebooted the machine between tests and made sure the command to start testing was the first thing typed after logging in. This is especially important on the 55SX where the X server might be loaded into fast planar memory, or slower RAM on the Orchid card, according to luck. By rebooting you can be reasonably sure the same memory area is being used each time.
(Aside: it would be handy to make Linux intelligently use the faster memory first, and actively move stuff to the lower part of the address space if possible. People sometimes ask about this on the kernel mailing list, apparently you need to define different 'zones', whatever that means.)
Finally, some of the tests performed by x11perf crash the X server. According to the manual page this is quite normal 'for non-DEC servers' :-). It's a little worrying, but OTOH XF86_AGX has been stable enough not to break during ordinary use. So I just chose not to run those tests - quite a few as it turned out.
I ran x11perf by replacing my ~/.xinitrc file with
something like:
x11perf -repeat 1 -time 1 -dot -rect1 -rect10 ... >x11perf.out 2>&1
where -dot, -rect1, etc. are the tests
I wanted to run. Then you can run the tests by rebooting,
logging in and saying startx. The .xinitrc files I
used were:
The results generated list the speed (in number of operations
per second) for each of the tests. For example, the 55SX's
planar VGA at 4bpp managed about 5790 operations per second of
-dot.
XFree86 comes with a tool x11perfcomp to get
comparisons from different sets of results. It tabulates the
results from one or more runs of x11perf, so you can compare
individual operations (like -dot,
-rect10) side-by-side. But even this is too
difficult to draw conclusions from, so I wrote a Perl script x11perfcompcomp to munge the results
further. On the 55SX, I wanted to start with 16-colour VGA as a
base, and find out how fast the other options were relative to
this - so you could say things like 'XGA-I is 2.32 times as fast
at drawing rectangles'. On the 8580, I wanted to use the
results from a 16-bit slot as the base, and calclulate how much
speedup comes from moving to a 32-bit slot. So x11perfcomp
rescales the results for each test, making the first run have a
score of 1.0.
Because there are a lot of tests, x11perfcompcomp also tries to categorize and group them, producing an average speed rating for particular classes of operations such as 'Line' or 'Scroll'. This is formed by adding the rebased speed ratings for each operation in the class and dividing by the number of operations in the class. This works out better than adding the un-rebased speeds in operations per second, which might give scores heavily weighted towards the results from the quicker operations. (Consider one op executing at 1000 per second, and another at 1 per second. Should you sum these numbers and divide by two?).
I don't make any claim that this is scientific or even sensible. All I can say is that a higher number is better.
For the comparison of different cards on the 55SX, I chose VGA at 4bpp (16 colours) as the base. How much speedup could you get switching from this to black-and-white, or to an accelerated card? I also included the PS/VP's results to see how the mighty 55SX compares against this machine.
Now some comments on the performance figures for classes of operations. The name of the class is followed by the timings (average executions per second) on the five different 55SX video setups, and on the PS/VP.
| VGA 4bpp | 1 |
| VGA 1bpp | 4.62 |
| XGA-I 8bpp | 27.1 |
| XGA-II 8bpp | 36.2 |
| XGA-II 16bpp | 22.2 |
| PS/VP 8bpp | 20.9 |
Going from 16-colour VGA to black and white VGA, you'd expect copying and scrolling performance to roughly quadruple, because one byte of memory now holds eight pixels (eight bits) rather than two 4-bit pixels. In fact it's even better.
The XGA-II at 8bpp is quite a bit faster than the XGA-I for this operation, and they are both hugely faster than the on-board VGA. Going to 16bpp slows down the XGA-II, but doesn't fully halve the speed as you might expect. It's still not much slower than the XGA-I and still miles faster than VGA.
The PS/VP cannot keep up with even the XGA-I although it doesn't do too badly. If you are going to shovel bits of screen around by hand it certainly helps to have a fast CPU to do it with.
| VGA 4bpp | 1 |
| VGA 1bpp | 0.839 |
| XGA-I 8bpp | 1.36 |
| XGA-II 8bpp | 1.38 |
| XGA-II 16bpp | 1.35 |
| PS/VP 8bpp | 69.2 |
This seems to be a CPU-intensive operation, judging by how much faster the 486 clone machine is. Surprisingly the 1bpp VGA is a little slower than the 4bpp, perhaps because the X server is less well optimized (see below). Going to XGA or XGA-II still gives a worthwhile improvement - if your life depends on drawing dashed lines quickly. I don't know what they are used in apart from benchmarks.
(Actually, I have noticed that the dashed line support in XF86_AGX is buggy. My window manager uses them to highlight 'OK' and 'Cancel' buttons in a dialogue box (like MS Windows) and on an 8580 with XGA-II the line is dashed rather randomly, while on the 55SX with XGA-II it doesn't appear at all.)
| VGA 4bpp | 1 |
| VGA 1bpp | 2.33 |
| XGA-I 8bpp | 2.14 |
| XGA-II 8bpp | 2.23 |
| XGA-II 16bpp | 2.16 |
| PS/VP 8bpp | 21.8 |
Here, going to black-and-white will get just as good a speedup as changing your video card.
| VGA 4bpp | 1 |
| VGA 1bpp | 1.34 |
| XGA-I 8bpp | 2.33 |
| XGA-II 8bpp | 2.41 |
| XGA-II 16bpp | 2.33 |
| PS/VP 8bpp | 16.4 |
| VGA 4bpp | 1 |
| VGA 1bpp | 1.03 |
| XGA-I 8bpp | 11.3 |
| XGA-II 8bpp | 11.8 |
| XGA-II 16bpp | 11.1 |
| PS/VP 8bpp | 39.6 |
The XGA-* must have line drawing in hardware. That's pretty much the minimum for an 'accelerated' card I guess.
| VGA 4bpp | 1 |
| VGA 1bpp | 2.8 |
| XGA-I 8bpp | 58 |
| XGA-II 8bpp | 88.9 |
| XGA-II 16bpp | 61.4 |
| PS/VP 8bpp | 11 |
Wow, pretty good. If only I knew what an opaque stippled rectangle is used for.
| VGA 4bpp | 1 |
| VGA 1bpp | 1.55 |
| XGA-I 8bpp | 2.26 |
| XGA-II 8bpp | 2.27 |
| XGA-II 16bpp | 2.27 |
| PS/VP 8bpp | 8.28 |
| VGA 4bpp | 1 |
| VGA 1bpp | 1.11 |
| XGA-I 8bpp | 11.2 |
| XGA-II 8bpp | 14 |
| XGA-II 16bpp | 10.3 |
| PS/VP 8bpp | 10.7 |
| VGA 4bpp | 1 |
| VGA 1bpp | 5.89 |
| XGA-I 8bpp | 41.1 |
| XGA-II 8bpp | 60.1 |
| XGA-II 16bpp | 35.2 |
| PS/VP 8bpp | 18.3 |
This, IMHO, is the most important benchmark. At least for someone like me who spends most of the time typing into xterms or text editors. The slow scrolling of the planar VGA is a real pain; even the clone's scrolling is annoyingly slow. It looks like 256-colour XGA-II is the winner if you value a quick response to the Enter key above all else.
| VGA 4bpp | 1 |
| VGA 1bpp | 1.13 |
| XGA-I 8bpp | 12.5 |
| XGA-II 8bpp | 18.3 |
| XGA-II 16bpp | 12 |
| PS/VP 8bpp | 5.31 |
| VGA 4bpp | 1 |
| VGA 1bpp | 4.08 |
| XGA-I 8bpp | 49.1 |
| XGA-II 8bpp | 51.2 |
| XGA-II 16bpp | 49 |
| PS/VP 8bpp | 66.8 |
The second most important benchmark - character drawing. The good performance of the XGA-* cards might be reduced if there isn't enough VRAM spare for use as a font cache. That's unlikely to happen on a 1Mbyte XGA-I, especially since Linux can use it only in 8bpp, but you might not want to get the maximum possible resolution out of your XGA-II.
| VGA 4bpp | 1 |
| VGA 1bpp | 3.42 |
| XGA-I 8bpp | 43.7 |
| XGA-II 8bpp | 58.6 |
| XGA-II 16bpp | 36.3 |
| PS/VP 8bpp | 14 |
| VGA 4bpp | 1 |
| VGA 1bpp | 1.71 |
| XGA-I 8bpp | 2.3 |
| XGA-II 8bpp | 2.35 |
| XGA-II 16bpp | 1.72 |
| PS/VP 8bpp | 8.88 |
| VGA 4bpp | 1 |
| VGA 1bpp | 0.979 |
| XGA-I 8bpp | 19.2 |
| XGA-II 8bpp | 18.6 |
| XGA-II 16bpp | 16.1 |
| PS/VP 8bpp | 21.9 |
This is the only operation where the monochrome display is slower than 16 colours. Perhaps because the lines are no longer conveniently aligned by bit planes.
And finally, the totally bogus 'overall performance' number from adding together all the rebased individual test scores and dividing:
| VGA 4bpp | 1 |
| VGA 1bpp | 2.45 |
| XGA-I 8bpp | 25.2 |
| XGA-II 8bpp | 32.8 |
| XGA-II 16bpp | 23.4 |
| PS/VP 8bpp | 25.2 |
For this machine, there were just two sets of benchmarks: XGA-II at 16bpp in a 16-bit slot, and the same setup with a 32-bit slot.
These results are much less varied than those from the 55SX, so I won't list them all here. One thing to notice is that the operations that make the video card work hard - scrolling and drawing big rectangles - show now speedup at all. This is to be expected; they don't involve sending stuff across the bus, so they wouldn't be affected by moving to a 32-bit slot. OTOH, 'Copy 500x500 from window to pixmap' is 70% faster in a 32-bit slot; this must be because it involves doing a screen grab and sending the data over the MCA bus.
The totally unscientific total performance measure indicates that the 32-bit slot gives an 11% performance improvement. Most of the individual benchmarks show an improvement between zero and 15%.
The tests run by x11perf may not reflect real-world usage. As I mentioned above, I'm sceptical that the average user spends all day drawing 64-gons and stippled rectangles. Or at least, a user who does that would be using an SGI workstation not a PS/2 Model 55SX! It would be interesting to instrument an X server in actual use to see what primitives are used most often. Or to use a real application for testing (like the Solitaire benchmark). I think that the 'Text' and 'Scroll' classes of operations are the ones that really matter.
Then the X server may not be taking full advantage of the hardware. Apparent improvements from XGA-I to XGA-II may be just because the X server doesn't understand how to drive XGA-I to the max. OTOH, the same AGX chipset family is used in many cards (the XF86_AGX server was designed for these clones, with unsupported XGA-* driving appearing as a side effect) and unless IBM's cards have extra commands not present in the clone cards, I'd expect that the command set is pretty well covered.
Also, I do want to measure how fast the different video options are under XFree86. But I'd advise some caution in using these results to make judgements about the hardware itself, unless you know what the performance is like with other video drivers (eg OS/2).
If you have a slow machine, get an XGA or preferably XGA-II card. They work in the 55SX and 65SX, and arguably these are the machines where they make the most difference. The difference between XGA-I and XGA-II is relatively unimportant; what really matters is to use some sort of accelerated card rather than the planar VGA.
Failing that you can get some improvement by running in monochrome. (I don't think the test results of '2.45 times as fast' will be matched in practice, because the machine does other things than just draw graphics.)
16bpp XGA-II is about 2/3 the speed of 8bpp on the 55SX. I'd expect the difference to be greater on a faster machine, as time taken by the CPU declines relative to time taken by the graphics hardware.
The XGA and XGA-II on a really slow machine compare favourably against unaccelerated SVGA on a much faster box.
If your machine has both 16-bit and 32-bit slots, put your XGA-II card in a 32-bit slot if you have one spare, but don't worry too much if you don't. The difference in performance is pretty small.
Many of the tests would run on some configurations but not on others. I excluded them from consideration. This is a shame because there are some pretty interesting results in there, eg 'copy 100x100 n-bit deep plane' manages only 0.6/s on 4bpp VGA, but 31.7/s with 1bpp. (XGA gets 224/s and XGA-II won't run this test.)
Free testing service offered for any new video card donated...