[antlr-interest] "back to the future" (merged lexer / parser spec)

Terence Parr parrt at cs.usfca.edu
Fri Nov 19 10:00:47 PST 2004


Howdy,

One of the things that I liked about old PCCTS (and that was different 
though I doubt I invented) 15 years ago was the fact that you could 
have one spec.  You didn't need both a lexer and parser spec, which we 
all know can cause lots of mismatches between token types etc...

So, ANTLR 3 will allow you to reference char and string literals in 
rules and also to define lexer rules all in the same spec.  For 
example,

grammar foo;

a : ID "while" '.' ;

ID : ('a'..'z')+ ;

Results in this parser:

     public void a()
     {
         match(ID);
         match(2);
         match(3);
     }

where the literal references are replaced with an appropriate token 
type reference.  ANTLR automatically builds this:

lexer grammar fooLexer;

T2 : "while" ;
T3 : '.' ;
ID:('a'..'z')+;

and generates code (all without writing it to the disk).  I just use a 
StringTemplate:

	/** For merged lexer/parsers, we must construct a separate lexer spec.
	 *  This is the template for lexer; put the literals first then the
	 *  regular rules.
	 */
	protected StringTemplate lexerGrammarST =
		new StringTemplate(
			"lexer grammar <name>Lexer;\n" +
			"\n" +
			"<literals:{T<it.type> : <it.literal> ;\n}>\n" +
			"<rules>",
			AngleBracketTemplateLexer.class
		);

to generate the new lexer string and then do a "new 
Grammar(lexerGrammarString)" to get the new grammar object. :)  Gotta 
love StringTemplate for code gen!

You can still have separate specs, but you usually won't need to do 
that.

With this merged spec, ultimately I would like to do context-sensitive 
(goal-oriented) lexing so we could handle things like the C++ nested 
template lexing issue.  With input "List<List<int>>" you don't know in 
the lexer w/o context if the final ">>" is two '>' or one '>>' (shift).

Oh, another thing.  With the rules for tokens in the parser spec, you 
can specify the kind of tree node to create for each token with an 
option:

INT
options {
	AST=IntNode;
}
	:	('0'..'9')+
	;

Handy. :)

Actually 3.0 does the merged spec right now; took me an hour or two.  
The new code base is SOooooo sweet.

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list