respell version 0.1

A tool to convert English text from one spelling system to another. At present, there are spelling files for american, british, and canadian spellings.

System requirements

Perl 5, and ispell version 3.2.06.epa1 or later. This is an unofficial release of ispell made to incorporate the new -e5 expansion option: the code will be merged back into the main ispell tree when the maintainer has time.

How it works

Simply having a lookup table from one spelling convention to another is not enough. Often there are two words which, because of differences in meaning or in pronunciation, are spelt differently in one system but the same in another. This is most notable moving from british to american spelling: for example cheque/check -> check, curb/kerb -> curb, and many others. But there are examples in the other direction too, for example vice/vise -> vice, and analyses/analyzes -> analyses (where the difference is in pronunciation as well as meaning).

Instead we create one lookup table for each language from 'words' to one or more 'spellings' for each word. A 'word' is an uppercase key like ANALYZE or CHEQUE, and two words are separate if there is any spelling convention which assigns them different spellings. Then to convert from one spelling convention to another, we do a reverse lookup in the source spelling's table from each character string to its corresponding 'words' (which may be more than one), and then in the target's table we find the most common spelling for each word. When more than one word is involved, and the most common target spelling for these words differs, the user must be asked what the intended meaning (or pronunciation) was.

For example, suppose we wanted to translate 'prophesy' from american to british. Looking up in american reveals two words which could use that spelling:

    PROPHECY:	prophesy
    PROPHESY:	prophesy
Now looking up the two words (PROPHECY and PROPHESY) in british gives:
    PROPHECY:	prophecy
    PROPHESY:	prophesy
So the two possible choices are 'prophecy' and 'prophesy'. It's the user's job to pick between them.

In fact, the spelling files give several 'words' on each line, by using ispell-style expansion flags. The IspellExpand.pm module runs ispell with the new -e5 option to convert these to several words. This is why version 3.2.06.epa1 or later of ispell is needed.

'Universal' spellings

It is possible to combine two or more spelling files to produce a single spelling which can be converted to any of them without loss of infomation, a kind of 'universal donor' spelling. For example it would have the prophecy/prophesy distinction, but also analyses/analyzes. It turns out that canadian spelling is fairly close to being 'universal' for english, but it needs some tweaking. The best universal spelling comes from combining canadian, british and american in that order (so the canadian spellings are listed first, where possible) and is generated by 'make' in the file 'ucba'.

How to use it

Three spelling files are provided: american, british and canadian. These can be loaded by the Spelling.pm module and conversion tables can be built. Then there are two executables:

respell

The program 'respell' is a filter. Give it two spelling files (from and to) and it will convert text from one to the other. When more than one possible output choice is possible, the several choices are included in the output inside square brackets. For example, [ prophecy prophesy ]. You can disable this, and just pick the most common target spelling, with the -f option.

Words which don't need changing, and nonword characters, are passed through unchanged. By default, respell will only deal with lowercase words. The -i option tells it to handle Capitalized words, and -I handles UPPERCASE words.

Finally, the -q flag suppresses most of the chatter.

respell.cgi

A slightly more sophisticated interface to respelling documents. For speed, this doesn't use the Spelling module but instead prebuilt lookup files. You can build these files with 'make'.

You need to install respell.cgi on your web server together with the data files. There may be a live demo at the website for this project (see below).

Download

If you want anonymous CVS access, ask and I might be able to arrange it.

Demo

Sorry, since the move to a new web server the live demo no longer works. I hope to have it back up soon.

Installing

Currently, there is no 'make install' mechanism. You can either run the programs from the directory where they were unpacked, or copy the executables and .pm files somewhere suitable.

'make' will build some data files needed for respell.cgi. 'make test' will check that the conversion tables are as expected. There are corresponding 'make full' and 'make test_full' targets for an exhaustive set of files converting every possible spelling to every other.

Future plans

Related projects

This tool doesn't handle the various spelling reforms proposed for English, which are much more wide-ranging than the small differences between US and UK spelling. The semi-free program BTRSPL converts between standard English spelling and one of three spelling reform proposals.

The varcon table has the same purpose as this project, but it's inadequate because it doesn't handle one spelling mapping to two or more. It should however be possible to generate a new varcon list from this project's data files.

Author

Ed Avis, ed@membled.com. See the file COPYING for copying conditions.

This project has a web page at http://membled.com/work/apps/respell/>.


Edward Avis
Last modified: Mon Dec 2 23:38:41 GMT 2002