[antlr-interest] Re: SYNTAX predicates in lexer

Wed Mar 30 20:53:51 PST 2005

I do not do C++ so I am unable to reproduce your example.  But I have
a couple of suggestions.

1) Top-Level Rules (those that you call from outside Antlr) should
   always end with a reference to the EOF token - and, of course, they
   must be top-level, e.g. no other parser rule(s) refer to them.

   so your rule for c should be:

c: A X EOF;

2) No Token should accept the Empty String. I think because your Z
   protected token may be empty, and therefore your A token may be
   empty, you get your infinite loop.

   so your rule for Z should be:

protected Z : ( 'a' )+ ; 

   and the parser adjusted to make tokens which contain Z as a
   component optional.

   Note, that I previously answered one of your questions regarding
   how to handle a general lexer situation such as:

A : ( a )* b ;
B : ( a )* ;

   with this pseudo fragment:

protected a : ... ;
protected b : ... ;

B : ( a )* ( b { $setType(A); } )? ;

   and this answer also suffers from this same empty-string problem.

   Sorry About That.

3) I would try to avoid any kind of Predicates in the Lexer (and the
   Parser too, for that matter).  The potential of re-parsing the
   entire remaining input (and then backtracking) each time a
   syntactic predicate is invoked is, for me, too big a performance
   penalty.  I'd rather re-work the rules so that the syntactic
   predicate (and subsequent backtracking) is not necessary.  If you
   must use them, analyze their lookahead/backtrack impact *very*
   carefully. (Just my opinion)

Hope this helps...
   -jbb