[antlr-interest] unicode 16bit versus new 21bit stuff
Terence Parr
parrt at cs.usfca.edu
Sun Jun 20 12:50:06 PDT 2004
On Jun 20, 2004, at 12:15 PM, John D. Mitchell wrote:
> Ah, I see. So rather than having a passive lexer eating input and
> providing it to the parser, the lexer is actually directed by the
> parser.
Yep. This makes it handle nasty things like the C++ template vs ">>"
token problem simply disappear. I.e., when lexing
List<List<int>> a;
you'll see that the nested template has ">>" in it. The lexer, without
context, cannot know which to pick. Only the parser knows that it
expects ">" followed by ">" not ">>" token. :)
The beauty of this is that the new lexer mechanism will deal with this
easily. Just create a new lexer rule for each parser context that is a
set of alternatives. The parser calls
lexer.mySpecificContextTokenSet() rather than lexer.nextToken(). There
are some issues such as ID vs literals and any other set of
ambiguous/subset tokens, but I won't go into that at the moment.
Whitespace can be tricky too.
This also seems to be a reasonable solution for keywords that can be
variables. When the parser must see it as a keyword, it asks for the
keyword not an ID. :)
Woohoo!
This will be attacked after the core system is working as it is a
direct extension and doesn't affect the underlying engine.
Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Cofounder, http://www.peerscope.com pure link sharing
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list