[antlr-interest] Need help with lexer rules

Mon Aug 15 14:01:16 PDT 2005

Thank you John that's it!
...a protected RG rule and this:
TERMINATING_RG:
  RG ( WS )* ( LG | ( VOCAB ( VOCAB | WS )* LG ) { $setType(TEXT); } )?
;
Sorry for going off the list,
Sven

John B. Brodie wrote:

>Sorry, in my last suggestion, I forgot to handle the trailing LG
>character in the TEXT token.
>
>Try this:
>
>class SvensLexer extends Lexer;
>
>options { charVocabulary = '\3' .. '\u00ff'; }
>
>tokens { TEXT; TERMINATING_RG }
>
>protected RG : '\u00ab'; // or whatever unicode it is, i deleted orig email
>protected LG : '\u00ab'; // and have forgotten the codes....
>
>protected WS : ' ' | '\t' { tab(); } | ...whatever else... ;
>
>SKIPPED_WS : ( WS )+ { $setType(Token.SKIP); } ;
>
>// note ~( WS | RG | LG ) is what we want but isn't allowed,
>// so must enumerate those characters again here...
>protected VOCAB :
>   ~( ' ' | '\t' { tab(); } | ...whatever else...
>      | ....the unicode for RG goes here
>      | ....the unicode for LG goes here
>    ) ;
>
>TERMINATING_RG :
>   RG ( WS )* ( LG | ( VOCAB ( VOCAB | WS )* LG ) { $setType(TEXT); } )?
>   ; // the above seems goofy but i think is necessary to avoid
>     // non-determinism on WS characters.
>  
>