Bug with unzip -L

When an zipfile created on MS-DOS (or another OS with uppercase filenames) is unpacked on Unix (or another OS with case-sensitive filenames), unzip used to convert the case of the filenames stored in the zipfile, so that if it contained README.TXT this would be extracted as readme.txt.

For some reason the trend seems to be away from this and towards extracting the original uppercase names, even when the originating system would have been just as happy with lower case. So recent versions of unzip will extract the original ugly-looking filenames (I believe mtools has also made the same change). But the unzip developers did helpfully provide the -L switch so you can reenable the old case-changing behaviour if wanted.

But if a zipfile was created on a system that really does care about case, any uppercase filenames are probably intentional, so they should be left alone. Accordingly, unzip's -L flag will have an effect only if the zipfile was created on an uppercase-only system. If the zipfile comes from a case-sensitive or case-preserving system, -L will have no effect (it is 'intelligent'). You can use -LL to force the lowercasing.

The problem seems to be with some zipfiles that have mixed-case filenames but which unzip -L decides to rename anyway. For example, jjvm's source code is distributed as a zipfile containing filenames such as jjvm/maximum/jvm/VM.java. When unpacked with unzip -L, this gets changed to jjvm/maximum/jvm/vm.java, which breaks the Java compiler. But clearly the zipfile was created on a case-preserving system, since it has lowercase characters in the name!

My guess is that whatever zip program was used set the system identifier to MS-DOS, even though it actually used Windows' long, case-preserving filenames. This is reasonable if you are using DJGPP for example, which gives long filenames if running in a DOS box inside Win9x. And in fact the DOS version of pkzip itself will use long filenames nowadays. So unzip should not rely on the system identifier alone; it should be a bit more intelligent and realize that if lowercase characters are present in filenames, that's a pretty good indication that the creator was a mixed-case system and so the renaming should not happen.

One file that needs changing appears to be process.c, which sets a flag lcflag depending on the -L command line argument and the system the zipfile was created on. Maybe the checking to see whether the filenames actually do have mixed case (ie, some lowercase characters) should be done elsewhere, I haven't had time to investigate. I wrote this web page instead :-).

(BTW, to deal with ugly uppercase filenames more generally, have a look at lcra.


Edward Avis
Last modified: Tue Oct 23 14:48:48 BST 2001