[antlr-interest] ANTLR Questions

Gavin Lambert antlr at mirality.co.nz
Wed May 28 01:54:23 PDT 2008


At 11:02 28/05/2008, ANTLR Mailing List wrote:
 >Using this grammar:
 >http://www.antlr.org/pipermail/antlr-interest/attachments/2008
 >0526/595e3dfb/attachment-0001.obj
 >
 >I seem to get ambiguity errors, or so I think. The error 
messages
 >are very ambiguous themselves (Yes, I know, wait until ANTLR 3 
is
 >built on ANTLR 3), but I cannot pinpoint the results of them..

A very quick glance over the grammar suggests these might be 
problems:

1. The use of ~IdentifierPart means you're actually consuming the 
following non-IdentifierPart character, which may not be what you 
want.  You should probably use a syntactic predicate instead.

2. Actually, you probably shouldn't do it at all, since 
'IdentifierPart' is not a character set, it is a sequence (it 
contains IdentifierStart, which contains EscapeSequence, which can 
represent a sequence of characters); it's illegal to use ~ on a 
sequence.

3. Your various integer tokens are ambiguous; remember, the lexer 
doesn't have any context, and can't lookahead past a + or * 
without an explicit syntactic predicate (or backtracking, which 
doesn't work in the lexer).  You'll need to merge all of these 
into one rule with type switches depending on predicates.

4. RegExpLiteral, SingleLineComment, MultiLineComment, and 
DocComment are all ambiguous (RegExpLiteral can match all of 
them).

5. MultiLineCommentInside is just plain illegal, as previously 
mentioned.  To do reversed sequences you have to explicitly spell 
out the possibilities; ie. instead of this:
    ~'*/'
you need to do this:
    (~'*' | '*' ~'/')
Another option is to use ANTLR's automatic non-greedy matching and 
change MultiLineComment to:
   '/*' .* '*/'
(You can't extract a fragment out of that though, it won't work.)

You also need to watch out a bit for over-use of fragments.  Since 
fragments are still treated as rules (they get their own method) 
they unfortunately don't always give the same behaviour as when 
they're inlined.  This is especially true when used with ~.

 > * How would you create a code generator using a tree grammar?

You make the parser output an AST, then create a tree grammar to 
recognise that AST and either output the desired code directly or 
use StringTemplate to do it for you.

 > * What would be an efficient system for entering and exiting
 >contexts?

You mean like scopes?  ANTLR provides stackable scopes, which are 
useful for contextual information.



More information about the antlr-interest mailing list