[antlr-interest] Writing (for now) just a lexer

Wed Feb 11 04:46:14 PST 2009

> Hi all,
> 
> I'm teaching a compilers class at University of Wisconsin-Madison. 
> Traditionally the class has followed sort of a classic sequence of 
> projects 'write a lexer', 'write a parser', etc., and in the past has 
> used either JLex/Java CUP or Flex/Bison for the lexer and parser 
> generator. This is my first time teaching this class, and I'm writing 
> these assignments assuming the use of ANTLR instead. I don't really want 
> to make major changes to the class, so I want to keep these assignments 
> separate, but the combined nature of ANTLR grammars has thrown a couple 
> oddities into the way this works.

You can specify separate lexer and parser grammars by using "lexer grammar Test;" resp. "parser grammar Test;". Beware that ANTLR uses both the order (first matching rule wins) and the matched input length (longer input wins) to determine the winning rule. Furthermore, tokens with similar starting sequences need to be treated specially, as shown on < http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs>.

> Anyway, this is the one I'm not really 
> sure how to deal with, as I'm also new to ANTLR.
> 
> The question is this: how do I store additional information in a token? 
> (E.g. for the token corresponding to an int literal, how would I store 
> the value as an int?)
> 
> Using something like Flex, I know how to do this; just add an additional 
> option in the union representing the token type. But under ANTLR, I'm 
> not sure. I see "How do I use a custom token type?" 
> (http://www.antlr.org/wiki/pages/viewpage.action?pageId=1844), but this 
> isn't quite what I want, as I want to be able to return a subclass of 
> CommonToken for just a couple particular rules.

I suppose, you could decide via a flag, if the current context requires a custom token.

Johannes
> 
> The couple grammars I've looked at (for Java) don't do this, presumably 
> leaving the string->integer conversion for later, but this doesn't make 
> a whole lot of sense to be to be honest. There are potentially multiple 
> contexts where this sort of thing would need to be done later, while 
> doing it in lexing seems cleaner. It also allows me to keep better 
> consistency with the fact that I've been giving "an integer literal is 
> too large" as an example of an error that could arise during lexing. 
> (Not that you *couldn't* do it later.)
> 
> Evan Driscoll
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01