[antlr-interest] Re: Recommendation for Lexer

Thu Feb 9 04:21:36 PST 2006

On Thu, 2006-02-09 at 06:56 +0000, Micheal J wrote:
> > JFlex looks good at the moment. It doesn't impose any class 
> > inheritance on you and the generated lexer is completely 
> > standalone, so it should be easy to integrate with ANTLR. 
> 
> Have a look at our KSCParse sample on the ANTLR site. It's for C# targets
> but includes a CsFlex (Jflex for C#) lexer that demonstrates ANTLR
> integration [with Jflex-style lexers]. 
> 
> Although you don't *need* it, Kunle also added an ANTLR mode to CsFlex
> (patch code in the CsFlex site on SourceForge) that you could port to Jflex
> [and submit to the Jflex project if you desire] to make ANTLR integration
> even easier.

I actually get along with it quite well currently, I was just describing
to Xue Yong Zhi how JFlex solves some of my problems better than ANTLR
lexers.

> > Plus it brings native support for the issues I have. The only 
> > thing I'm missing is a deeper control about what parts of the 
> > token end up in the tokens text, but maybe I've just not 
> > found that yet.
> 
> The sample should help in that regard too. You get to decide what is in the
> tokens you return (or perhaps I haven't quite appreciated the complexity of
> your lexer).

A typical rule might look like this:

<STATEX> "foo" (("." {Digits}) | 
                ({Digits} ("." [0-9]*)?)) [eE] [+-]? {Digits}
         { setState(STATEY); return token(FOO); }

Now my problem is how to access certain parts of the match without
re-parsing the string in the Java code part (e.g. ideally I'd like no
indexOf(), substring() stuff but rather something like $1 or \1 to get
the capturing groups).

Martin