[antlr-interest] Writing (for now) just a lexer
Evan Driscoll
driscoll at cs.wisc.edu
Wed Feb 11 00:52:15 PST 2009
Hi all,
I'm teaching a compilers class at University of Wisconsin-Madison.
Traditionally the class has followed sort of a classic sequence of
projects 'write a lexer', 'write a parser', etc., and in the past has
used either JLex/Java CUP or Flex/Bison for the lexer and parser
generator. This is my first time teaching this class, and I'm writing
these assignments assuming the use of ANTLR instead. I don't really want
to make major changes to the class, so I want to keep these assignments
separate, but the combined nature of ANTLR grammars has thrown a couple
oddities into the way this works. Anyway, this is the one I'm not really
sure how to deal with, as I'm also new to ANTLR.
The question is this: how do I store additional information in a token?
(E.g. for the token corresponding to an int literal, how would I store
the value as an int?)
Using something like Flex, I know how to do this; just add an additional
option in the union representing the token type. But under ANTLR, I'm
not sure. I see "How do I use a custom token type?"
(http://www.antlr.org/wiki/pages/viewpage.action?pageId=1844), but this
isn't quite what I want, as I want to be able to return a subclass of
CommonToken for just a couple particular rules.
The couple grammars I've looked at (for Java) don't do this, presumably
leaving the string->integer conversion for later, but this doesn't make
a whole lot of sense to be to be honest. There are potentially multiple
contexts where this sort of thing would need to be done later, while
doing it in lexing seems cleaner. It also allows me to keep better
consistency with the fact that I've been giving "an integer literal is
too large" as an example of an error that could arise during lexing.
(Not that you *couldn't* do it later.)
Evan Driscoll
More information about the antlr-interest
mailing list