[antlr-interest] ANTLR3 tutorial

Oliver Zeigermann oliver.zeigermann at gmail.com
Thu Aug 3 01:46:45 PDT 2006


Hey, Martin,

calm down :) I use SAX for my real life work as well.

This is a tutorial. Nothing more. My simple XML subset is an ideal
candidate to show some ANTLR stuff as it is reasonably tricky and XML
should be known to most people.

If you think this isn't obvious, why don't you add something like a
warning to the front page of the tutorial? I wouldn't mind.

Cheers

Oliver

P.S.: The only case where I really needed my own XML lexer was when I
had to parse XML fragements. Most XML parsers puke when they are fed
with fragments.

2006/8/3, Martin Probst <mail at martin-probst.com>:
> Hi,
>
> I'm kind of repeating myself in this, but whatever. I consider myself
> quite an expert on XML (heck, I'm developing an XML database!) so I
> got my 0.02 € on this.
>
> First, you should at all times avoid tricking people into writing a
> parser for XML. There is absolutely no reason for this. None
> whatsoever. There are XML parsers out there for all languages and a
> variety of different profiles, SAX, DOM, XML Pull, whatever. They are
> highly optimized and it's extremely unlikely you get something faster
> using ANTLR. Plus, you totally spoil the whole XML thing (may not
> have processing instructions, comments, CDATA, entitites, ... except
> from that you don't even support Unicode and have quite some errors
> in that lexer). XML was invented (among other things) to save people
> from having to write their own parser!
>
> Second, there are appropriate techniques to create bindings from XML
> to the language of choice for custom vocabularies. E.g. XML beans and
> friends who do all the parsing plus validation plus create the domain
> specific objects. Again, faster than everything you can write (in a
> reasonable amount of time), plus less errors, plus validation, plus
> some even language independent (YMMV).
>
> Third, and if you really know what you are doing (ie. have spent
> years on XML and have a very specific case in which everything is
> different. This is almost certainly not you, whoever might read
> this :-) ), then don't start by writing an XML lexer, but rather use
> an existing one, e.g. any SAX parser you like. Then use that one as
> the lexer, convert the events to tokens, and implement the
> appropriate ANTLR interface. This might be even easier if you use XML
> Pull as the underlying technology. This way you might get one of the
> most important things (encoding & Unicode) right, which will save
> your users a lot of pain, and you also solved the escaping/entity/
> etc. thing. And again, it's going to be a lot faster than anything
> ANTLR can generate.
>
> So please don't tell people how to generate their own XML parser.
> Tell them that they are at the wrong address and should rather use a
> pre-built XML parser. It's always the same - the old lex/yacc people
> who think "I need to parse something" and start off with a compiler
> toolkit, trying to solve the same problems over and over again. And
> then they fail and complain about XML being complex because they
> didn't get Unicode or Entities right ...
>
> Martin
>
> Am 02.08.2006 um 19:33 schrieb Oliver Zeigermann:
>
> > Hi folks!
> >
> > I finished the first part of my Parsing XML using ANTLR3 tutorial:
> >
> > http://www.antlr.org/wiki/display/ANTLR3/Parsing+XML
> >
> > And the first part:
> >
> > http://www.antlr.org/wiki/display/ANTLR3/Lexer
> >
> > However, and most frustrating the Wiki made a mess of that page that I
> > could not even fix after an hour of work :( :( :( I keep trying to
> > find a solution, any hints highly appreciated.
> >
> > Anyway, because of this I have the intro and the first part on lexing
> > here as well:
> >
> > http://zeigermann.de/antlr/Intro.html
> > http://zeigermann.de/antlr/Lexing.html
> >
> > Comments/Improvements on the tutorial itself are highly welcome. Are
> > the most important questions answered? Can you even follow? Is this
> > complete crap? Let me know :)
> >
> > Also, stay tuned for the next part parsing which I am already
> > working on.
> >
> > Cheers
> >
> > Oliver
> >
>
>


More information about the antlr-interest mailing list