[antlr-interest] "No Past Experience Required"

Horst Dehmer horst.dehmer at inode.at
Fri Oct 16 17:06:59 PDT 2009


Hello Peter,

i myself am not too familiar with ANTLR 3 (and parser generators in general)
and for that reason i feel tempted to try to give some answers.

A symbol is a 'word' in a grammar, like a word in a natural language. A
symbol can be a terminal (symbol) which can no further be 'reduced', like
reserved words in a programming language: 'for', 'while', 'if'... You get
the picture. Non-terminals are the left side of a grammar (production) rule.
Non-terminals are reduced during parsing according to the defined grammar
and the stream of available tokens.

A lexer usually takes a character stream as an input and converts it in a
stream of tokens. The characters 'f', 'o', 'r', ' ', '(' for example become
the tokens 'for' and '('. In the old times (and still today with certain
lexers) lexer rules were defined just by regular expressions, which
restricts the class of what can be recognized by a parser pretty much
compared to ANTLR.

The parser takes a token stream as a input and tries to recognize a higher
level order. Think of it as sentences of a language following a certain
grammar. The rules which make up a parser grammar are usually much more
complex than lexer rules. (E)BNF can not be expressed by regular expression.

LL(k), LL(*) parsers (top-down, results are build by working their way from
left to right) are classes of parsers which are not very popular. I might be
wrong (or to old) but LR parsers are much more common. So maybe have a quick
look at lex/yacc. For me that's the father of all parser generators (for C).

Predicates (syntactic and semantic) help a LL parser determine which 'way to
go' or which alternatives to ignore. I guess they are quite unique to LL
parsers which have the "problem" with left-recursion or common prefix, which
really is no problem once you got your head around it.

For what it's worth: i would not start with ANTLR to build my first
grammars. The documentation is far too sophisticated for my taste and you
definitivly have to have some experiences to get started. If you don't know
what to look for even the book is useless. Don't get me wrong ANTLR 3 is a
very powerful and reliable tool, but before you use it make sure you really
need it. If you'd like to stick to ANTLR, have a look at the example
grammars. There are lots of 'em around and they give good hints on how to
solve promlems. 

And last but not least, if you are really interested in things, read the
Dragon Book: Principles of Compiler Design, by Alfred Aho and Jeffrey D.
Ullman

Regards,

Horst

PS: For those recognizing my errors and misunderstandings, help Peter to get
going.


On 16.10.09 23:28, "Peter Boughton" <boughtonp at gmail.com> wrote:

> Hi all,
> 
> I'm looking to figure out ANTLR and so I'm reading this page:
> http://www.antlr.org/wiki/display/ANTLR3/Quick+Starter+on+Parser+Grammars+-+No
> +Past+Experience+Required
> 
> But there are a number of things I'm stuck on - and so I'm asking here
> for both a good explanation and so that someone with edit permission
> can hopefully update the wiki page to actually be "no past experience"
> and more helpful for future newbies.
> 
> 
> "Left recursion."
> Maybe it's me being dim, but it took me a while to figure how "a : a B
> | C ;" translated to "a : C B*;" - and whilst I get it now, perhaps a
> quick sentence to explain the transition might be helpful for the wiki
> page?
> 
> 
> "syntactic predicates"
> So, they're an advanced category discussed in a book... yippee! I
> don't have the book, and the absolute earliest I could get it (if I
> choose to) is Tuesday, so that's not particularly helpful. If they're
> going to be mentioned, they should at least have a brief explanation
> of when they apply, and preferably a link to an online resource that
> covers them.
> (I tried reading the Wikipedia article, but that waffles on without
> being very helpful.)
> 
> Can anyone provide [a link to] a good, concise definition of syntactic
> predicates and semantic predicates and any other terms like this - for
> that matter, is there a good online glossary for all this sort of
> compiler/parser related stuff?
> 
> 
> "Difference between lexer and parser rules"
> This whole paragraph leaves me going "wha?". I don't know what the
> difference between a lexer and a parser is, so attempting to explain
> the difference between their rules doesn't mean much yet.
> 
> There's a vague suggestion that lexers look at literals (whether
> explicit or combined via multiple rules), and parsers look at
> everything else, but even then I'm not sure I get the implications of
> that?
> 
> What's a symbolic name?
> 
> In the example, what does the "EQUALS='=';" syntax do? Should that be
> a colon, or is something else happening?
> What about IMAGINERY_TOKEN - what's that doing?
> 
> I'm now confused about tokens and non-terminal symbols - they are
> obviously different, but how?
> 
> In the example for "How to build a grammar from a language
> specification", the FILE rule is not in the tokens section
> Does that make it a non-terminal symbol?
> Is this example grammar handled by a lexer or a parser or both?
> 
> 
> I do have some other questions, but they're not directly related to
> the wiki page, so I'll leave them for a later email, once I have a
> better idea of what's what.
> 
> 
> Thanks,
> 
> Peter
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address




More information about the antlr-interest mailing list