[antlr-interest] Re: Recommendation for Lexer

Wed Feb 8 09:42:41 PST 2006

> 
> Now I'm going to add several language extensions and I'm ready to dump
> the handwritten Lexer. The problem is: I can't go with ANTLR the way it
> currently is - the language is keyword less and in addition to that
> requires several states (~16). Switching lexers after each token is not
> an option, plus we also need stackable states.
> 
> I tricked Terence into doing the language islands feature for ANTLR 3,
> but unfortunately I need a new lexer long before the summer (and ANTLR 3
> will only be in beta in the summer, no?).
> 

I do not know your particular problem, but whatever tool you choose, no 
one can give you a off-the-shelf solution for the state problem. The 
logic for maintaining state has to be implemented by you. Other tools 
may provide better syntax for this, but I think antlr is very good as well.

I am using antlr to prase ruby, which requires lots of states in lexer 
also. I find the problem is manageable: for simple things you can use 
semantic predict, for more complicated ones you can override the 
generated lexer and use all the traditional OO techniques.

Here is simple example: in ruby, '/' can be the DIVIDE operator, or 
start of regular expression (same syntax as in perl), so you can have 
the following lexer rule:
DIV_OR_REGEX
: {exprect_div()}? '/' {$setType(DIV);}
| '/' REGEX_CONTENT '/' {$setType(REGEX);}
;

It is still very readable. sure you need to implment exprect_div(), but 
as I said, this is something you have to do anyway.

HereDoc is more complicated, so I overide nextToken() and macth heredoc 
content based on the current state.

Using multiple lexers is another good choice.

-- 
Xue Yong Zhi
http://seclib.blogspot.com