[antlr-interest] Re: Recommendation for Lexer

Thu Feb 9 12:54:20 PST 2006

> A typical rule might look like this:
> 
> <STATEX> "foo" (("." {Digits}) | 
>                 ({Digits} ("." [0-9]*)?)) [eE] [+-]? {Digits}
>          { setState(STATEY); return token(FOO); }
> 
> Now my problem is how to access certain parts of the match 
> without re-parsing the string in the Java code part (e.g. 
> ideally I'd like no indexOf(), substring() stuff but rather 
> something like $1 or \1 to get the capturing groups).

No general solution but, you can use lexer states to extract a contiguous
substring (much like we do to remove the '@' from verbatim strings and
identifiers). Not that you want any more states...

Micheal