[antlr-interest] ANTLR3 tutorial
Martin Probst
mail at martin-probst.com
Thu Aug 3 01:30:14 PDT 2006
Hi,
I'm kind of repeating myself in this, but whatever. I consider myself
quite an expert on XML (heck, I'm developing an XML database!) so I
got my 0.02 € on this.
First, you should at all times avoid tricking people into writing a
parser for XML. There is absolutely no reason for this. None
whatsoever. There are XML parsers out there for all languages and a
variety of different profiles, SAX, DOM, XML Pull, whatever. They are
highly optimized and it's extremely unlikely you get something faster
using ANTLR. Plus, you totally spoil the whole XML thing (may not
have processing instructions, comments, CDATA, entitites, ... except
from that you don't even support Unicode and have quite some errors
in that lexer). XML was invented (among other things) to save people
from having to write their own parser!
Second, there are appropriate techniques to create bindings from XML
to the language of choice for custom vocabularies. E.g. XML beans and
friends who do all the parsing plus validation plus create the domain
specific objects. Again, faster than everything you can write (in a
reasonable amount of time), plus less errors, plus validation, plus
some even language independent (YMMV).
Third, and if you really know what you are doing (ie. have spent
years on XML and have a very specific case in which everything is
different. This is almost certainly not you, whoever might read
this :-) ), then don't start by writing an XML lexer, but rather use
an existing one, e.g. any SAX parser you like. Then use that one as
the lexer, convert the events to tokens, and implement the
appropriate ANTLR interface. This might be even easier if you use XML
Pull as the underlying technology. This way you might get one of the
most important things (encoding & Unicode) right, which will save
your users a lot of pain, and you also solved the escaping/entity/
etc. thing. And again, it's going to be a lot faster than anything
ANTLR can generate.
So please don't tell people how to generate their own XML parser.
Tell them that they are at the wrong address and should rather use a
pre-built XML parser. It's always the same - the old lex/yacc people
who think "I need to parse something" and start off with a compiler
toolkit, trying to solve the same problems over and over again. And
then they fail and complain about XML being complex because they
didn't get Unicode or Entities right ...
Martin
Am 02.08.2006 um 19:33 schrieb Oliver Zeigermann:
> Hi folks!
>
> I finished the first part of my Parsing XML using ANTLR3 tutorial:
>
> http://www.antlr.org/wiki/display/ANTLR3/Parsing+XML
>
> And the first part:
>
> http://www.antlr.org/wiki/display/ANTLR3/Lexer
>
> However, and most frustrating the Wiki made a mess of that page that I
> could not even fix after an hour of work :( :( :( I keep trying to
> find a solution, any hints highly appreciated.
>
> Anyway, because of this I have the intro and the first part on lexing
> here as well:
>
> http://zeigermann.de/antlr/Intro.html
> http://zeigermann.de/antlr/Lexing.html
>
> Comments/Improvements on the tutorial itself are highly welcome. Are
> the most important questions answered? Can you even follow? Is this
> complete crap? Let me know :)
>
> Also, stay tuned for the next part parsing which I am already
> working on.
>
> Cheers
>
> Oliver
>
More information about the antlr-interest
mailing list