[antlr-interest] Recursive lexer rule: strange error message
pganelin
ganelin at mail.com
Fri Jul 13 05:34:03 PDT 2007
I need to create a very complex lexer for some DSL.
To give you an idea of what should be considered a single token, I put
some tokens below, one per string.
()
((a)(a))
(%()
(%(%()
(%)%))
(/* anything here is ignored, including parentheses)))))(((((((((*/)
Brief description: If you have matching parentheses than you do not need
to quote, prefix any unmatching parentheses with the character ‘%’. The
latter is a quoting character for some macro language. Also you can
directly embed C style comments into the lexem (so!) and that part would
be ignored. The rest would still generate a single token. So the example
on the last line would result in the token:
()
Following the example of page 108 of ANTLR3 book (with recursive
curlies) I did something like this. I am ignoring ‘%’ for now.
fragment
PARENTH
:
'(' ( PARENTH | ( '/*' ) => COMMENT | '/' | ~( '(' | ')' | '/' ) )* ')'
;
fragment
COMMENT
:
'/*' ( options {greedy=false;} : . )* '*/'
;
When I tried to run antlr3 on this I got a strange error message, which
I do not understand.
[antlr3] warning(206): Macro.g:439:68: Alternative 3: after matching
input such as '/''*''* ''(''(''(''*''(' decision cannot predict what
comes next due to recursion overflow to PARENTH from PARENTH.
If I remove the COMMENT part I have no warning.
Is it a bug in ANTLR or my error? Any help would be greatly appreciated.
I am very hesitant to move this functionality to the parser, because it
is really a single token from a grammar point of view.
Pavel.
More information about the antlr-interest
mailing list