[antlr-interest] lexer: compound keywords with a twist

Susan Jolly easjolly at ix.netcom.com
Sun Aug 19 20:42:56 PDT 2007


This might be a case where you want to take advantage of the ability to emit
more than one token per lexer rule as explained in the ANTLR book starting
on page 95.

If you use a lexer rule similar to 

KEYWORD = ('a'..'z'| 'A'..'Z'| ' '| '$')+;

it will get all of your "compound" keywords plus, of course, other
sequences.

Then you include your own emit() method in your lexer that emits this token
if the token text actually is a keyword.  If not, you use a custom
"mini-lexer" to rescan the token text and emit the correct sequence of
tokens. Of course, you wouldn't want to do this unless the "mini-lexer" is
very simple.

I had a situation where the "mini-lexer" simply had to emit each character
as a separate token so this strategy worked really well.




More information about the antlr-interest mailing list