[antlr-interest] Recursive lexer rule: strange error message

Micheal J open.zone at virgin.net
Sat Jul 14 13:25:05 PDT 2007


Pavel,

> I need to create a very complex lexer for some DSL.
> 
> To give you an idea of what should be considered a single 
> token, I put 
> some tokens below, one per string.
> 
> ()
> ((a)(a))
> (%()
> (%(%()
> (%)%))
> (/* anything here is ignored, including parentheses)))))(((((((((*/)
> 
> 
> Brief description: If you have matching parentheses than you 
> do not need 
> to quote, prefix any unmatching parentheses with the 
> character '%'. The 
> latter is a quoting character for some macro language. Also you can 
> directly embed C style comments into the lexem (so!) and that 
> part would 
> be ignored. The rest would still generate a single token. So 
> the example 
> on the last line would result in the token:
> ()

Freaky language!

> Following the example of page 108 of ANTLR3 book (with recursive 
> curlies) I did something like this. I am ignoring '%' for now.
> 
> 
> 
> fragment
> PARENTH
> :
> '(' ( PARENTH | ( '/*' ) => COMMENT | '/' | ~( '(' | ')' | 
> '/' ) )* ')' ;
> 
> fragment
> COMMENT
> :
> '/*' ( options {greedy=false;} : . )* '*/'
> ;

This illustrates one possible solution (that handles '%(' and '%)':

grammar PavelGanelin;

program : COMPLEX_TOKEN EOF;

COMPLEX_TOKEN :  ( QUOTED_LPAREN | QUOTED_RPAREN )* BALANCED ( QUOTED_LPAREN
| QUOTED_RPAREN )*;

fragment BALANCED : '(' ( IDENT | BALANCED | COMMENT | QUOTED_LPAREN |
QUOTED_RPAREN )* ')';
fragment COMMENT : '/*' (options {greedy=false;} : .)* '*/' ;
fragment QUOTED_LPAREN : '%(' ;
fragment QUOTED_RPAREN : '%)' ;
fragment IDENT : ( 'a'..'z' | 'A'..'Z' )+ ;

Micheal

-----------------------
The best way to contact me is via the list/forum. My time is very limited.



More information about the antlr-interest mailing list