Different ways to read XML
One reason I originally went with XML is not having to write a
parser. Well that's true... up to a point. You still have to
recognize the different element names and attributes, and with a
fairly complex DTD like XMLTV's that's quite a lot of code.
AFAICT there are three styles of interface to the XML parser:
- DOM, or tree-based. This parses the whole
XML document into memory and then you call methods with names
like getAttributeNode() to traverse it. Flexible, but slow
and unscalable because the whole tree must be loaded into
memory. Of course this is what we use :-). Apparently a C
DOM implementation like Xerces would be faster than the
pure-Perl XML::DOM, but I couldn't get it working.
- Token-based, for example SAX. Here you
get a token at a time and have to do most of the work
yourself. But you can read files gradually. Instead of SAX
you can get really close to the metal and talk to expat
directly.
- Generate a parser from the DTD. This is ideal but there's
no tool which does exactly what I want.
FleXML
might come close.
Current answer is a layer on top of XML::DOM.
XMLTV.pm table-based DOM
handler routine thingies.
You can call parsefile() to read a whole file into
a Perl data structure and write_data() to write it
out again. Should be fairly straightforward to replace the DOM
code by something else. Also possible to cheat by serializing
Perl data structures to disk.
Next: weird stuff that TV listings have
Edward Avis
Last modified: Sat Aug 23 17:21:38 BST 2008