Modifying zlib and libpng for lossy image compression

Abstract

PNG images are pixmaps compressed using Deflate compression, the same as used in zipfiles and by gzip. This compression has a Lempel-Ziv matching stage and then Huffman codes the results. By changing the LZ compression to allow approximate matches of repeated strings, we can have lossy compression of PNG images, while the output file is completely standard PNG format.

Implemented so far

At present my implementation is just a modified zlib whose 'minigzip' program can do lossy compression on PNM images (a simple 'raw' file format). Modifying libpng to generate lossy PNG images is still to do.

The amount of lossiness is controlled by an integer parameter called 'threshold'. When the compressor looks for duplicate strings, it does a byte-by-byte comparison and notes the numeric difference between corresponding bytes in the two strings. Actually this difference is squared to punish one large mismatch more than several small ones. The difference for each pair of pixels (one from each string) is computed and summed, and if this total is less than the threshold then the strings are considered to match. Hence a lower threshold means a more faithful output image, but fewer matching strings and so higher file size. A threshold of zero gives lossless compression and is equivalent to plain 'gzip'.

(For greyscale images, the amount of mismatch from one pixel is 'carried over' to the next, which helps to handle dithered patterns. There may be other more sophisticated ways of comparing two strings of pixels added in future; I also hope to replace 'threshold' with a 'quality factor' between 0 and 100, like JPEG. No time to explain all the ideas here.)

Currently the minigzip program knows nothing about the structure of files, it just compresses a stream of bytes. The PNM files have a short text header at the top and it's necessary to stop this being compressed lossily. So I stripped it off and just compressed the raw pixel data giving a .gz file of the raw data. To recover an image you decompress this .gz file and add the header back on. The header is just a few bytes so it does not significantly affect the compressed file size.

Results

I developed the code by testing it against a Dilbert cartoon; this is a four-greyscale image originally but I converted it to 256 shades of grey because the lossy matching assumes it can compare numeric byte values. So perhaps this isn't a particularly good example. I also grabbed a random photograph off the net to test how well the code works with 'real-life' 24-bit images; this photo was previously a JPEG and it already has some artefacts from that lossy compression, so it will be interesting to see how it comes out! The greyscale image was compressed with 'dithering' enabled (set DITHER_LOSS true); for the colour image dithering was off because it doesn't yet correctly deal with neighbouring bytes representing different colours (a difference in red intensity cannot be carried across into the neighbouring green byte, but should affect the next pixel's red value). Here are the two original images:

Dilbert (original)
Dartmoor (original)

Yes, I know these two images are not a good representative set of tests, but they are good enough to show how well this early version works. With threshold 10000, we get the following lossy output:

Dilbert, threshold 10000
Dartmoor, threshold 10000

The file size for the lossy Dilbert is 13794 bytes compared with 17434 bytes for the non-lossy PNG of that image, a 21% saving. For Dartmoor, the size is 7942 bytes compared to 41754 bytes, an 81% saving. Of course different thresholds can be tried to trade off quality for space. Here is that same photograph with a threshold of only 1000:

Dartmoor, threshold 1000

This has a size of 20284 bytes - a saving of more than half the file size - but the loss in quality is not really noticeable. If anything the quality has improved since the JPEG artefacts are no longer visible! This isn't a fair test but you get the idea.

File sizes for Dilbert image

ThresholdFile size in bytes
0 (lossless)18411 View
100018409 View
200017297 View
400016020 View
800014346 View
1000013794 View
1200013229 View
1400012818 View
2000011754 View
3000010579 View
400009711 View

File sizes for Dartmoor image

ThresholdFile size in bytes
0 (lossless)53348 View
100020284 View
200015072 View
400011774 View
80008804 View
100007942 View
120007195 View
140006696 View
200005539 View
300004904 View
400004144 View

Note that for web viewing, I have converted the images into PNG files because most browsers don't support the simple PNM format (still less some wacky gzipped-raw-data). The intermediate data files, which are used to give the file size count above, can be downloaded straight from the images/ directory. In principle you should be able to gunzip them, add the PNM header and view them, but you may find that gzip complains about CRC errors.

Also note something interesting: the PNG files are also quite small! So this software as it stands could do the job of a lossy compressor for PNG files. Indeed any program which looked for almost-repeating patterns in an image and modified them so they repeated exactly would help PNG compress better. The trouble is that if you don't know what the LZ compression of zlib is doing, you may make changes to the image which don't help in compression. It's better to hook directly into the compressor and allow it to make tweaks to the image when that would definitely reduce the file size, which is the aim of this project.

Download

zlib_lossy.tar.bz2. This is a hacked version of the zlib source; the interesting program it builds is minigzip. Run minigzip -lN to do lossy compression with threshold N. Then minigzip -d decompresses; any gzip program will work if it can ignore the CRC errors. BTW, the added code uses C99 features so you'll need gcc 3 or some other recent compiler.

To do

Finish this implementation

Generalize the interface for lossy compression so the user specifies the approximate string matcher to use, and (hopefully) get it incorporated into zlib. The code changes needed are actually quite small, so it might even get in. There are a few tricky issues when looking for approximate matches which don't come up when you allow only exact matches - but fortunately I have already come across those issues and dealt with them :-).

Then modify libpng to turn on lossiness while writing rows of pixels and turn it off again when writing header type stuff. To do this I need to find out more about the PNG file format.

Finally write a user tool to generate lossy PNG output of existing images. It should have two tunable parameters: quality à la JPEG, and speed à la gzip. The speed parameter will be needed because this approximate string matching is inherently very slow. It is many times slower than gzip (which is able to use hashes to keep track of likely strings; an option not possible when two matching strings need not have the same hash value), and gzip feels the need for speed options, so we'll definitely want them.

Comparison with JPEG

I don't know how well this approach compares with JPEG lossy compression. Does it give better image quality at a given file size, and if so for which classes of images? I have been pleasantly surprised by how well the code worked on the 'Dartmoor' photograph, so it's not necessarily the case that 'JPEG is for photographs, PNG is for screenshots'. It might depend on how big your output needs to be and how much quality you're prepared to lose.

Comparison with random format X

You can view the resulting images with any PNG viewer. At least, you will be able to when the implementation is finished. That gives this idea a definite advantage over anything which depends on adding support for a new file format to lots of programs. Here, only the writer need be aware of the strange new compression code.

Extend to multi-pixel-per-byte

The approximate matcher at present compares integer values of bytes. But PNG is often used with two, four or eight pixels packed into a byte. The matcher needs to take these apart and compare each pixel value individually; any tricks such as 'dithering' the mismatch from one pixel to the next need to work within a byte too.

Related projects

Author and copying

Ed Avis, ed@membled.com. The extensions to zlib are distributed under the same terms as zlib itself. The test images are copyright to some other people and used for research examples only. This software is very much 'not released yet', I just wanted a web page to accompany my Slashdot posting. Please contact me if you're interested in this project because it's not certain I shall have time to finish it myself.