[antlr-interest] unicode 16bit versus new 21bit stuff

Sun Jun 20 12:50:06 PDT 2004

On Jun 20, 2004, at 12:15 PM, John D. Mitchell wrote:
> Ah, I see.  So rather than having a passive lexer eating input and
> providing it to the parser, the lexer is actually directed by the 
> parser.

Yep.  This makes it handle nasty things like the C++ template vs ">>" 
token problem simply disappear.  I.e., when lexing

List<List<int>> a;

you'll see that the nested template has ">>" in it.  The lexer, without 
context, cannot know which to pick.  Only the parser knows that it 
expects ">" followed by ">" not ">>" token. :)

The beauty of this is that the new lexer mechanism will deal with this 
easily.  Just create a new lexer rule for each parser context that is a 
set of alternatives.  The parser calls 
lexer.mySpecificContextTokenSet() rather than lexer.nextToken().  There 
are some issues such as ID vs literals and any other set of 
ambiguous/subset tokens, but I won't go into that at the moment.  
Whitespace can be tricky too.

This also seems to be a reasonable solution for keywords that can be 
variables.  When the parser must see it as a keyword, it asks for the 
keyword not an ID. :)

Woohoo!

This will be attacked after the core system is working as it is a 
direct extension and doesn't affect the underlying engine.

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Cofounder, http://www.peerscope.com pure link sharing

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/