[antlr-interest] Recursive lexer rule: strange error message
Micheal J
open.zone at virgin.net
Sat Jul 14 13:25:05 PDT 2007
Pavel,
> I need to create a very complex lexer for some DSL.
>
> To give you an idea of what should be considered a single
> token, I put
> some tokens below, one per string.
>
> ()
> ((a)(a))
> (%()
> (%(%()
> (%)%))
> (/* anything here is ignored, including parentheses)))))(((((((((*/)
>
>
> Brief description: If you have matching parentheses than you
> do not need
> to quote, prefix any unmatching parentheses with the
> character '%'. The
> latter is a quoting character for some macro language. Also you can
> directly embed C style comments into the lexem (so!) and that
> part would
> be ignored. The rest would still generate a single token. So
> the example
> on the last line would result in the token:
> ()
Freaky language!
> Following the example of page 108 of ANTLR3 book (with recursive
> curlies) I did something like this. I am ignoring '%' for now.
>
>
>
> fragment
> PARENTH
> :
> '(' ( PARENTH | ( '/*' ) => COMMENT | '/' | ~( '(' | ')' |
> '/' ) )* ')' ;
>
> fragment
> COMMENT
> :
> '/*' ( options {greedy=false;} : . )* '*/'
> ;
This illustrates one possible solution (that handles '%(' and '%)':
grammar PavelGanelin;
program : COMPLEX_TOKEN EOF;
COMPLEX_TOKEN : ( QUOTED_LPAREN | QUOTED_RPAREN )* BALANCED ( QUOTED_LPAREN
| QUOTED_RPAREN )*;
fragment BALANCED : '(' ( IDENT | BALANCED | COMMENT | QUOTED_LPAREN |
QUOTED_RPAREN )* ')';
fragment COMMENT : '/*' (options {greedy=false;} : .)* '*/' ;
fragment QUOTED_LPAREN : '%(' ;
fragment QUOTED_RPAREN : '%)' ;
fragment IDENT : ( 'a'..'z' | 'A'..'Z' )+ ;
Micheal
-----------------------
The best way to contact me is via the list/forum. My time is very limited.
More information about the antlr-interest
mailing list