[antlr-interest] Recursive lexer rule: strange error message

pganelin ganelin at mail.com
Fri Jul 13 05:34:03 PDT 2007


I need to create a very complex lexer for some DSL.

To give you an idea of what should be considered a single token, I put 
some tokens below, one per string.

()
((a)(a))
(%()
(%(%()
(%)%))
(/* anything here is ignored, including parentheses)))))(((((((((*/)


Brief description: If you have matching parentheses than you do not need 
to quote, prefix any unmatching parentheses with the character ‘%’. The 
latter is a quoting character for some macro language. Also you can 
directly embed C style comments into the lexem (so!) and that part would 
be ignored. The rest would still generate a single token. So the example 
on the last line would result in the token:
()

Following the example of page 108 of ANTLR3 book (with recursive 
curlies) I did something like this. I am ignoring ‘%’ for now.



fragment
PARENTH
:
'(' ( PARENTH | ( '/*' ) => COMMENT | '/' | ~( '(' | ')' | '/' ) )* ')'
;

fragment
COMMENT
:
'/*' ( options {greedy=false;} : . )* '*/'
;



When I tried to run antlr3 on this I got a strange error message, which 
I do not understand.

[antlr3] warning(206): Macro.g:439:68: Alternative 3: after matching 
input such as '/''*''* ''(''(''(''*''(' decision cannot predict what 
comes next due to recursion overflow to PARENTH from PARENTH.


If I remove the COMMENT part I have no warning.

Is it a bug in ANTLR or my error? Any help would be greatly appreciated. 
I am very hesitant to move this functionality to the parser, because it 
is really a single token from a grammar point of view.

Pavel.



More information about the antlr-interest mailing list