[antlr-interest] Re: SYNTAX predicates in lexer
John B. Brodie
jbb at acm.org
Wed Mar 30 20:53:51 PST 2005
I do not do C++ so I am unable to reproduce your example. But I have
a couple of suggestions.
1) Top-Level Rules (those that you call from outside Antlr) should
always end with a reference to the EOF token - and, of course, they
must be top-level, e.g. no other parser rule(s) refer to them.
so your rule for c should be:
c: A X EOF;
2) No Token should accept the Empty String. I think because your Z
protected token may be empty, and therefore your A token may be
empty, you get your infinite loop.
so your rule for Z should be:
protected Z : ( 'a' )+ ;
and the parser adjusted to make tokens which contain Z as a
component optional.
Note, that I previously answered one of your questions regarding
how to handle a general lexer situation such as:
A : ( a )* b ;
B : ( a )* ;
with this pseudo fragment:
protected a : ... ;
protected b : ... ;
B : ( a )* ( b { $setType(A); } )? ;
and this answer also suffers from this same empty-string problem.
Sorry About That.
3) I would try to avoid any kind of Predicates in the Lexer (and the
Parser too, for that matter). The potential of re-parsing the
entire remaining input (and then backtracking) each time a
syntactic predicate is invoked is, for me, too big a performance
penalty. I'd rather re-work the rules so that the syntactic
predicate (and subsequent backtracking) is not necessary. If you
must use them, analyze their lookahead/backtrack impact *very*
carefully. (Just my opinion)
Hope this helps...
-jbb
More information about the antlr-interest
mailing list