[antlr-interest] "back to the future" (merged lexer / parser spec)
Terence Parr
parrt at cs.usfca.edu
Fri Nov 19 10:00:47 PST 2004
Howdy,
One of the things that I liked about old PCCTS (and that was different
though I doubt I invented) 15 years ago was the fact that you could
have one spec. You didn't need both a lexer and parser spec, which we
all know can cause lots of mismatches between token types etc...
So, ANTLR 3 will allow you to reference char and string literals in
rules and also to define lexer rules all in the same spec. For
example,
grammar foo;
a : ID "while" '.' ;
ID : ('a'..'z')+ ;
Results in this parser:
public void a()
{
match(ID);
match(2);
match(3);
}
where the literal references are replaced with an appropriate token
type reference. ANTLR automatically builds this:
lexer grammar fooLexer;
T2 : "while" ;
T3 : '.' ;
ID:('a'..'z')+;
and generates code (all without writing it to the disk). I just use a
StringTemplate:
/** For merged lexer/parsers, we must construct a separate lexer spec.
* This is the template for lexer; put the literals first then the
* regular rules.
*/
protected StringTemplate lexerGrammarST =
new StringTemplate(
"lexer grammar <name>Lexer;\n" +
"\n" +
"<literals:{T<it.type> : <it.literal> ;\n}>\n" +
"<rules>",
AngleBracketTemplateLexer.class
);
to generate the new lexer string and then do a "new
Grammar(lexerGrammarString)" to get the new grammar object. :) Gotta
love StringTemplate for code gen!
You can still have separate specs, but you usually won't need to do
that.
With this merged spec, ultimately I would like to do context-sensitive
(goal-oriented) lexing so we could handle things like the C++ nested
template lexing issue. With input "List<List<int>>" you don't know in
the lexer w/o context if the final ">>" is two '>' or one '>>' (shift).
Oh, another thing. With the rules for tokens in the parser spec, you
can specify the kind of tree node to create for each token with an
option:
INT
options {
AST=IntNode;
}
: ('0'..'9')+
;
Handy. :)
Actually 3.0 does the merged spec right now; took me an hour or two.
The new code base is SOooooo sweet.
Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list