[antlr-interest] ANTLR3 tutorial

Martin Probst mail at martin-probst.com
Thu Aug 3 01:30:14 PDT 2006


Hi,

I'm kind of repeating myself in this, but whatever. I consider myself  
quite an expert on XML (heck, I'm developing an XML database!) so I  
got my 0.02 € on this.

First, you should at all times avoid tricking people into writing a  
parser for XML. There is absolutely no reason for this. None  
whatsoever. There are XML parsers out there for all languages and a  
variety of different profiles, SAX, DOM, XML Pull, whatever. They are  
highly optimized and it's extremely unlikely you get something faster  
using ANTLR. Plus, you totally spoil the whole XML thing (may not  
have processing instructions, comments, CDATA, entitites, ... except  
from that you don't even support Unicode and have quite some errors  
in that lexer). XML was invented (among other things) to save people  
from having to write their own parser!

Second, there are appropriate techniques to create bindings from XML  
to the language of choice for custom vocabularies. E.g. XML beans and  
friends who do all the parsing plus validation plus create the domain  
specific objects. Again, faster than everything you can write (in a  
reasonable amount of time), plus less errors, plus validation, plus  
some even language independent (YMMV).

Third, and if you really know what you are doing (ie. have spent  
years on XML and have a very specific case in which everything is  
different. This is almost certainly not you, whoever might read  
this :-) ), then don't start by writing an XML lexer, but rather use  
an existing one, e.g. any SAX parser you like. Then use that one as  
the lexer, convert the events to tokens, and implement the  
appropriate ANTLR interface. This might be even easier if you use XML  
Pull as the underlying technology. This way you might get one of the  
most important things (encoding & Unicode) right, which will save  
your users a lot of pain, and you also solved the escaping/entity/ 
etc. thing. And again, it's going to be a lot faster than anything  
ANTLR can generate.

So please don't tell people how to generate their own XML parser.  
Tell them that they are at the wrong address and should rather use a  
pre-built XML parser. It's always the same - the old lex/yacc people  
who think "I need to parse something" and start off with a compiler  
toolkit, trying to solve the same problems over and over again. And  
then they fail and complain about XML being complex because they  
didn't get Unicode or Entities right ...

Martin

Am 02.08.2006 um 19:33 schrieb Oliver Zeigermann:

> Hi folks!
>
> I finished the first part of my Parsing XML using ANTLR3 tutorial:
>
> http://www.antlr.org/wiki/display/ANTLR3/Parsing+XML
>
> And the first part:
>
> http://www.antlr.org/wiki/display/ANTLR3/Lexer
>
> However, and most frustrating the Wiki made a mess of that page that I
> could not even fix after an hour of work :( :( :( I keep trying to
> find a solution, any hints highly appreciated.
>
> Anyway, because of this I have the intro and the first part on lexing
> here as well:
>
> http://zeigermann.de/antlr/Intro.html
> http://zeigermann.de/antlr/Lexing.html
>
> Comments/Improvements on the tutorial itself are highly welcome. Are
> the most important questions answered? Can you even follow? Is this
> complete crap? Let me know :)
>
> Also, stay tuned for the next part parsing which I am already  
> working on.
>
> Cheers
>
> Oliver
>



More information about the antlr-interest mailing list