[antlr-interest] Trouble parsing a language where '{' has too many meanings

Tue Jul 10 22:23:51 PDT 2007

Hi Felix,

Sorry for not answering sooner -- I'm slammed with work (preparing for
a conference presentation and upgrading a production system on tight
deadlines) and I wanted to go build a test case.

There are a few options:

1) Start testing, bit by bit, in ANTLRWorks to see where things go wrong.

2) Build up the grammar a bit at a time using test-driven development.
(There's an example of this on the wiki.)

3) I haven't tried using predicates in the lexer, but I wonder if you
could do this:

ML_COMMENT : (LT(-1) == RBRACKET && LT(-2) == EQUALS && LT(-3) ==
LONG_HELP) =>  '{' ( options {greedy=false;} : . )* '}.' ;

4) (experimental) The parser calls the lexer for a token at a time, so
you _might_ be able to pass a flag and use it in predicates in the
lexer. This feels extremely risky to me (e.g. parser lookaheads might
switch the flag at inopportune times.)

5) What if you had a rule like:
LONG_HELP : 'LongHelp' '=' '{' ( options {greedy=false;} : . )* '}.' ;
and then post-processed the token's text to grab everything inside the
brackets? In a sense, you're making the lexer do the parser's job,

I've always looked for ways to make lexer rules unambiguous based on
the left hand side, so you're in unknown territory for me (using
right-hand tokens for disambiguation.) Hopefully, one of these
suggestions will point you toward an answer.

 ...Richard