[antlr-interest] Writing (for now) just a lexer

Wed Feb 11 00:52:15 PST 2009

Hi all,

I'm teaching a compilers class at University of Wisconsin-Madison. 
Traditionally the class has followed sort of a classic sequence of 
projects 'write a lexer', 'write a parser', etc., and in the past has 
used either JLex/Java CUP or Flex/Bison for the lexer and parser 
generator. This is my first time teaching this class, and I'm writing 
these assignments assuming the use of ANTLR instead. I don't really want 
to make major changes to the class, so I want to keep these assignments 
separate, but the combined nature of ANTLR grammars has thrown a couple 
oddities into the way this works. Anyway, this is the one I'm not really 
sure how to deal with, as I'm also new to ANTLR.

The question is this: how do I store additional information in a token? 
(E.g. for the token corresponding to an int literal, how would I store 
the value as an int?)

Using something like Flex, I know how to do this; just add an additional 
option in the union representing the token type. But under ANTLR, I'm 
not sure. I see "How do I use a custom token type?" 
(http://www.antlr.org/wiki/pages/viewpage.action?pageId=1844), but this 
isn't quite what I want, as I want to be able to return a subclass of 
CommonToken for just a couple particular rules.

The couple grammars I've looked at (for Java) don't do this, presumably 
leaving the string->integer conversion for later, but this doesn't make 
a whole lot of sense to be to be honest. There are potentially multiple 
contexts where this sort of thing would need to be done later, while 
doing it in lexing seems cleaner. It also allows me to keep better 
consistency with the fact that I've been giving "an integer literal is 
too large" as an example of an error that could arise during lexing. 
(Not that you *couldn't* do it later.)

Evan Driscoll