[antlr-interest] Re: Recommendation for Lexer
Martin Probst
mail at martin-probst.com
Thu Feb 9 04:21:36 PST 2006
On Thu, 2006-02-09 at 06:56 +0000, Micheal J wrote:
> > JFlex looks good at the moment. It doesn't impose any class
> > inheritance on you and the generated lexer is completely
> > standalone, so it should be easy to integrate with ANTLR.
>
> Have a look at our KSCParse sample on the ANTLR site. It's for C# targets
> but includes a CsFlex (Jflex for C#) lexer that demonstrates ANTLR
> integration [with Jflex-style lexers].
>
> Although you don't *need* it, Kunle also added an ANTLR mode to CsFlex
> (patch code in the CsFlex site on SourceForge) that you could port to Jflex
> [and submit to the Jflex project if you desire] to make ANTLR integration
> even easier.
I actually get along with it quite well currently, I was just describing
to Xue Yong Zhi how JFlex solves some of my problems better than ANTLR
lexers.
> > Plus it brings native support for the issues I have. The only
> > thing I'm missing is a deeper control about what parts of the
> > token end up in the tokens text, but maybe I've just not
> > found that yet.
>
> The sample should help in that regard too. You get to decide what is in the
> tokens you return (or perhaps I haven't quite appreciated the complexity of
> your lexer).
A typical rule might look like this:
<STATEX> "foo" (("." {Digits}) |
({Digits} ("." [0-9]*)?)) [eE] [+-]? {Digits}
{ setState(STATEY); return token(FOO); }
Now my problem is how to access certain parts of the match without
re-parsing the string in the Java code part (e.g. ideally I'd like no
indexOf(), substring() stuff but rather something like $1 or \1 to get
the capturing groups).
Martin
More information about the antlr-interest
mailing list