[antlr-interest] C++ beginner questions

Mon Oct 3 16:07:37 PDT 2005

David Maxwell wrote:
> I've just started using Antlr, and while I appreciate the concepts, I'm
> finding it very frustrating.

Antlr has a pretty steep learning curve :(

> 2) Are there any examples targetting C++ that do NOT use the AST
> funcionality?

Should be some simple ones in the distro (but I guess you found those
already)

A bigger project is here (has lexer/parser/bunch of treewalkers,
although this may not be a good example to start with (and it's software
that's grown over time, the parser is in need of a rewrite)):

http://fmt.cs.utwente.nl/tools/motor/

It does cover line info in ast, error handling, and has a decent lexer
for a C-ish language. It also has symbol table stuff, but that's for a
language with some quirks that make it less usable for a general language.

> The AST tools sound great - but right now I just want a lex/yacc
> replacement with multi-token lookahead. The types of things you want to
> do in the parsing stage are quite different if you're not building ASTs,
> and I can't find examples to get me started. (Yes, I've looked in
> examples/cpp/*)

Actually things are not that different only you have less options to
'divide and conquer'.

> 3) The documentation plays a bit fast and loose with the term 'antlr'.
> i.e. "Antlr keeps track of the column position of tokens for you" -
> 
> Well, no, but the _Lexer_ antlr produces does so. Now, it's not a great
> example, because we could have an argument about what column position
> should mean in the context of the Parser - but regardless, it would be
> nice to have a list of which Functions are applicable in which contexts.
> Using getColumn() in the Parser led to some wasted time for me, before
> I thought about the Parser/Lexer split in the .g file for a bit.

In the parser position information is tagged on teh tokens. Same as in
TreeParser (although you have to use a custom AST class to get
line/column information in the AST)

> 4) Is there any equivalent to the Lex/Yacc documentation 'How to resolve
> shift/reduce conflicts' - for how to address lexical nondeterminisms in
> antlr?

Cannot add much to what Bryan said.. although left factoring is more
important imho then fiddling with greedy options. Also some warnings you
can never get rid off in antlr (cannot be turned off while antlr does
the right thing). Any book covering top down parsing should have a
section left factoring and there's probably some to be found on the net.
         Note that there is a tradeoff in readability of your grammar
and left factoring. Syntactic predicates can help reduce conflicts for a
start at the expense of performance. Lateron you can remove them by left
factoring things out when you have your gramar running.

> For example, I've noticed that rule order in the Lexer DOES matter, but
> I can't find any documentation about how to order rules to get the
> desired results. I've just been following 'most specific first', but
> I'd appreciate a more precise answer.

If I'm not mistaken, it just handles stuff in order as they appear in
the input. So the first matching alternative is chosen.

Cheers,

Ric