PNG images are pixmaps compressed using Deflate compression, the same as used in zipfiles and by gzip. This compression has a Lempel-Ziv matching stage and then Huffman codes the results. By changing the LZ compression to allow approximate matches of repeated strings, we can have lossy compression of PNG images, while the output file is completely standard PNG format.
At present my implementation is just a modified zlib whose 'minigzip' program can do lossy compression on PNM images (a simple 'raw' file format). Modifying libpng to generate lossy PNG images is still to do.
The amount of lossiness is controlled by an integer parameter called 'threshold'. When the compressor looks for duplicate strings, it does a byte-by-byte comparison and notes the numeric difference between corresponding bytes in the two strings. Actually this difference is squared to punish one large mismatch more than several small ones. The difference for each pair of pixels (one from each string) is computed and summed, and if this total is less than the threshold then the strings are considered to match. Hence a lower threshold means a more faithful output image, but fewer matching strings and so higher file size. A threshold of zero gives lossless compression and is equivalent to plain 'gzip'.
(For greyscale images, the amount of mismatch from one pixel is 'carried over' to the next, which helps to handle dithered patterns. There may be other more sophisticated ways of comparing two strings of pixels added in future; I also hope to replace 'threshold' with a 'quality factor' between 0 and 100, like JPEG. No time to explain all the ideas here.)
Currently the minigzip program knows nothing about the structure of files, it just compresses a stream of bytes. The PNM files have a short text header at the top and it's necessary to stop this being compressed lossily. So I stripped it off and just compressed the raw pixel data giving a .gz file of the raw data. To recover an image you decompress this .gz file and add the header back on. The header is just a few bytes so it does not significantly affect the compressed file size.
I developed the code by testing it against a Dilbert cartoon; this is a four-greyscale image originally but I converted it to 256 shades of grey because the lossy matching assumes it can compare numeric byte values. So perhaps this isn't a particularly good example. I also grabbed a random photograph off the net to test how well the code works with 'real-life' 24-bit images; this photo was previously a JPEG and it already has some artefacts from that lossy compression, so it will be interesting to see how it comes out! The greyscale image was compressed with 'dithering' enabled (set DITHER_LOSS true); for the colour image dithering was off because it doesn't yet correctly deal with neighbouring bytes representing different colours (a difference in red intensity cannot be carried across into the neighbouring green byte, but should affect the next pixel's red value). Here are the two original images:
Yes, I know these two images are not a good representative set of tests, but they are good enough to show how well this early version works. With threshold 10000, we get the following lossy output:
The file size for the lossy Dilbert is 13794 bytes compared with 17434 bytes for the non-lossy PNG of that image, a 21% saving. For Dartmoor, the size is 7942 bytes compared to 41754 bytes, an 81% saving. Of course different thresholds can be tried to trade off quality for space. Here is that same photograph with a threshold of only 1000:
This has a size of 20284 bytes - a saving of more than half the file size - but the loss in quality is not really noticeable. If anything the quality has improved since the JPEG artefacts are no longer visible! This isn't a fair test but you get the idea.
| Threshold | File size in bytes | |
|---|---|---|
| 0 (lossless) | 18411 | View |
| 1000 | 18409 | View |
| 2000 | 17297 | View |
| 4000 | 16020 | View |
| 8000 | 14346 | View |
| 10000 | 13794 | View |
| 12000 | 13229 | View |
| 14000 | 12818 | View |
| 20000 | 11754 | View |
| 30000 | 10579 | View |
| 40000 | 9711 | View |
| Threshold | File size in bytes | |
|---|---|---|
| 0 (lossless) | 53348 | View |
| 1000 | 20284 | View |
| 2000 | 15072 | View |
| 4000 | 11774 | View |
| 8000 | 8804 | View |
| 10000 | 7942 | View |
| 12000 | 7195 | View |
| 14000 | 6696 | View |
| 20000 | 5539 | View |
| 30000 | 4904 | View |
| 40000 | 4144 | View |
Note that for web viewing, I have converted the images into PNG files because most browsers don't support the simple PNM format (still less some wacky gzipped-raw-data). The intermediate data files, which are used to give the file size count above, can be downloaded straight from the images/ directory. In principle you should be able to gunzip them, add the PNM header and view them, but you may find that gzip complains about CRC errors.
Also note something interesting: the PNG files are also quite small! So this software as it stands could do the job of a lossy compressor for PNG files. Indeed any program which looked for almost-repeating patterns in an image and modified them so they repeated exactly would help PNG compress better. The trouble is that if you don't know what the LZ compression of zlib is doing, you may make changes to the image which don't help in compression. It's better to hook directly into the compressor and allow it to make tweaks to the image when that would definitely reduce the file size, which is the aim of this project.
Generalize the interface for lossy compression so the user specifies the approximate string matcher to use, and (hopefully) get it incorporated into zlib. The code changes needed are actually quite small, so it might even get in. There are a few tricky issues when looking for approximate matches which don't come up when you allow only exact matches - but fortunately I have already come across those issues and dealt with them :-).
Then modify libpng to turn on lossiness while writing rows of pixels and turn it off again when writing header type stuff. To do this I need to find out more about the PNG file format.
Finally write a user tool to generate lossy PNG output of existing images. It should have two tunable parameters: quality à la JPEG, and speed à la gzip. The speed parameter will be needed because this approximate string matching is inherently very slow. It is many times slower than gzip (which is able to use hashes to keep track of likely strings; an option not possible when two matching strings need not have the same hash value), and gzip feels the need for speed options, so we'll definitely want them.
I don't know how well this approach compares with JPEG lossy compression. Does it give better image quality at a given file size, and if so for which classes of images? I have been pleasantly surprised by how well the code worked on the 'Dartmoor' photograph, so it's not necessarily the case that 'JPEG is for photographs, PNG is for screenshots'. It might depend on how big your output needs to be and how much quality you're prepared to lose.
You can view the resulting images with any PNG viewer. At least, you will be able to when the implementation is finished. That gives this idea a definite advantage over anything which depends on adding support for a new file format to lots of programs. Here, only the writer need be aware of the strange new compression code.
The approximate matcher at present compares integer values of bytes. But PNG is often used with two, four or eight pixels packed into a byte. The matcher needs to take these apart and compare each pixel value individually; any tricks such as 'dithering' the mismatch from one pixel to the next need to work within a byte too.
Ed Avis, ed@membled.com. The extensions to zlib are distributed under the same terms as zlib itself. The test images are copyright to some other people and used for research examples only. This software is very much 'not released yet', I just wanted a web page to accompany my Slashdot posting. Please contact me if you're interested in this project because it's not certain I shall have time to finish it myself.