[antlr-interest] Recommendation for Lexer

Wed Feb 8 06:17:25 PST 2006

Hi all,
it's me again, the guy with the problem called XQuery. I'm currently
having the following situation:
- we have an ANTLR 2.x parser that is working well even if the code is
  by now a bit cryptic
- we have a handwritten Lexer that is fast, but sucks wrt
  maintainability
We originally went for a handwritten Lexer because my precursor didn't
get ANTLR to produce anything usable for a stateful language like
XQuery. The Lexer works, but had quite a lot of bugs in the seldomly
used code paths as it's a huge pain to maintain (think: unending
switches over characters, matching, guessing etc. all done by hand).
Adding keywords and states requires a lot of work and testing.

Now I'm going to add several language extensions and I'm ready to dump
the handwritten Lexer. The problem is: I can't go with ANTLR the way it
currently is - the language is keyword less and in addition to that
requires several states (~16). Switching lexers after each token is not
an option, plus we also need stackable states.

I tricked Terence into doing the language islands feature for ANTLR 3,
but unfortunately I need a new lexer long before the summer (and ANTLR 3
will only be in beta in the summer, no?).

So can anyone recommend me an alternative? I'm currently toying around
with JFlex which looks good (feature wise, the syntax less so). There
seems to be an abundance of tools, has anyone made experiences there?

Thanks,
Martin