[antlr-interest] "No Past Experience Required"

Sat Oct 17 11:10:07 PDT 2009

On Fri, Oct 16, 2009 at 5:06 PM, Horst Dehmer <horst.dehmer at inode.at> wrote:

> Hello Peter,
>
> i myself am not too familiar with ANTLR 3 (and parser generators in
> general)
> and for that reason i feel tempted to try to give some answers.
>
> A symbol is a 'word' in a grammar, like a word in a natural language. A
> symbol can be a terminal (symbol) which can no further be 'reduced', like
> reserved words in a programming language: 'for', 'while', 'if'... You get
> the picture. Non-terminals are the left side of a grammar (production)
> rule.
> Non-terminals are reduced during parsing according to the defined grammar
> and the stream of available tokens.
>
> A lexer usually takes a character stream as an input and converts it in a
> stream of tokens. The characters 'f', 'o', 'r', ' ', '(' for example become
> the tokens 'for' and '('. In the old times (and still today with certain
> lexers) lexer rules were defined just by regular expressions, which
> restricts the class of what can be recognized by a parser pretty much
> compared to ANTLR.
>
> The parser takes a token stream as a input and tries to recognize a higher
> level order. Think of it as sentences of a language following a certain
> grammar. The rules which make up a parser grammar are usually much more
> complex than lexer rules. (E)BNF can not be expressed by regular
> expression.
>
> LL(k), LL(*) parsers (top-down, results are build by working their way from
> left to right) are classes of parsers which are not very popular. I might
> be
> wrong (or to old) but LR parsers are much more common. So maybe have a
> quick
> look at lex/yacc. For me that's the father of all parser generators (for
> C).
>
> Predicates (syntactic and semantic) help a LL parser determine which 'way
> to
> go' or which alternatives to ignore. I guess they are quite unique to LL
> parsers which have the "problem" with left-recursion or common prefix,
> which
> really is no problem once you got your head around it.
>
> For what it's worth: i would not start with ANTLR to build my first
> grammars. The documentation is far too sophisticated for my taste and you
> definitivly have to have some experiences to get started. If you don't know
> what to look for even the book is useless. Don't get me wrong ANTLR 3 is a
> very powerful and reliable tool, but before you use it make sure you really
> need it. If you'd like to stick to ANTLR, have a look at the example
> grammars. There are lots of 'em around and they give good hints on how to
> solve promlems.
>
> And last but not least, if you are really interested in things, read the
> Dragon Book: Principles of Compiler Design, by Alfred Aho and Jeffrey D.
> Ullman
>
> Regards,
>
> Horst
>
> PS: For those recognizing my errors and misunderstandings, help Peter to
> get
> going.
>
>
> On 16.10.09 23:28, "Peter Boughton" <boughtonp at gmail.com> wrote:
>
> > Hi all,
> >
> > I'm looking to figure out ANTLR and so I'm reading this page:
> >
> http://www.antlr.org/wiki/display/ANTLR3/Quick+Starter+on+Parser+Grammars+-+No
> > +Past+Experience+Required
> >
> > But there are a number of things I'm stuck on - and so I'm asking here
> > for both a good explanation and so that someone with edit permission
> > can hopefully update the wiki page to actually be "no past experience"
> > and more helpful for future newbies.
> >
> >
> > "Left recursion."
> > Maybe it's me being dim, but it took me a while to figure how "a : a B
> > | C ;" translated to "a : C B*;" - and whilst I get it now, perhaps a
> > quick sentence to explain the transition might be helpful for the wiki
> > page?
> >
> >
> > "syntactic predicates"
> > So, they're an advanced category discussed in a book... yippee! I
> > don't have the book, and the absolute earliest I could get it (if I
> > choose to) is Tuesday, so that's not particularly helpful. If they're
> > going to be mentioned, they should at least have a brief explanation
> > of when they apply, and preferably a link to an online resource that
> > covers them.
> > (I tried reading the Wikipedia article, but that waffles on without
> > being very helpful.)
> >
> > Can anyone provide [a link to] a good, concise definition of syntactic
> > predicates and semantic predicates and any other terms like this - for
> > that matter, is there a good online glossary for all this sort of
> > compiler/parser related stuff?
> >
> >
> > "Difference between lexer and parser rules"
> > This whole paragraph leaves me going "wha?". I don't know what the
> > difference between a lexer and a parser is, so attempting to explain
> > the difference between their rules doesn't mean much yet.
> >
> > There's a vague suggestion that lexers look at literals (whether
> > explicit or combined via multiple rules), and parsers look at
> > everything else, but even then I'm not sure I get the implications of
> > that?
> >
> > What's a symbolic name?
> >
> > In the example, what does the "EQUALS='=';" syntax do? Should that be
> > a colon, or is something else happening?
> > What about IMAGINERY_TOKEN - what's that doing?
> >
> > I'm now confused about tokens and non-terminal symbols - they are
> > obviously different, but how?
> >
> > In the example for "How to build a grammar from a language
> > specification", the FILE rule is not in the tokens section
> > Does that make it a non-terminal symbol?
> > Is this example grammar handled by a lexer or a parser or both?
> >
> >
> > I do have some other questions, but they're not directly related to
> > the wiki page, so I'll leave them for a later email, once I have a
> > better idea of what's what.
> >
> >
> > Thanks,
> >
> > Peter
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

It would be hard to deny that ANTLR has a significant learning curve.  That
might not not too surprising, given what we customarily expect from machine
translation tools.  After all, the clever behaviors we want must come from
someplace.  That said, as someone raised on lex and yacc, I have to admit
that getting my brain wrapped around those tools was just as hard for me.
(A few people who have read the lex/yacc man pages and O'Reilly books might
agree with me.)  If you believe that writing parsers and grammars has the
potential to be hard, then the choice of tool comes down to questions like
the capability and maturity of the tool, the maintainability of your code,
and the availability of high-quality support.

Several tools -- notably lex and yacc -- excel in capability and maturity.
Personally, I find nontrivial grammars created with those tools to be
tedious to maintain and particular brittle with respect to my mistakes
during maintenance activities.  And the ANTLR community benefits from a
wealth of knowledgable and generous folks, to whom I am indebted for much of
my appreciation of ANTLR.   That's my opinion; everyone has to form his own.

Looking at Ales Teska's recently posted ANTLR3 grammar for python (
http://devel-www.cyber.cz/files/python3grammarC.tar.gz) may help form an
opinion.  If you can more or less follow that, there's a good chance you'll
enjoy using ANTLR productively.  I also found Cedric Cuche's Objective-C
grammar (http://antlr.org/grammar/1212699960054/ObjectiveC2ansi.g) very
helpful.  There are plenty of other fine exampes on antlr.org; these are
just a couple which happen to show how easy it can be to write
rich-yet-maintainable grammars with ANTLR.

As always, your mileage may vary.  Every situation has unique
considerations.  Good luck in any event.

Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091017/6b443d79/attachment.html