[antlr-interest] Difference between tokens {...} and ordinary lexerrules
Kenneth Domino
kenneth.domino at domemtech.com
Fri Feb 29 16:25:43 PST 2008
Hi Felix,
Placing the literal definitions for your tokens in the token section has the
effect of adding productions for those literals to a lexer grammar. For
example, ANTLR will generate for the combined lexer and parser grammar
input:
grammar test;
tokens {
CLINTON = 'clinton';
OBAMA = 'obama';
}
stuff: (CLINTON|OBAMA)*;
the file test__.g. This file is a lexer grammar and it contains the
translation of the token list in terms of productions:
lexer grammar test;
CLINTON : 'clinton' ;
OBAMA : 'obama' ;
ANTLR can read this file and generate the lexer for your target. This means
that they are equivalent.
However, if the grammar is a lexer grammar, then ANTLR does not accept nor
add equivalent productions. For example, the grammar
lexer grammar test;
tokens {
CLINTON = 'clinton';
OBAMA;
}
OBAMA: 'obama';
is not accepted. Antlr outputs:
ANTLR Parser Generator Version 3.0.1 (August 13, 2007) 1989-2007
error(108): test.g:3:12: literals are illegal in lexer tokens{} section:
'clinton'
If you tried to only use the tokens section to define your lexer
productions, e.g.,
lexer grammar test;
tokens {
CLINTON = 'clinton';
OBAMA = 'obama';
}
ANTLR will produce something different entirely:
ANTLR Parser Generator Version 3.0.1 (August 13, 2007) 1989-2007
error(100): test.g:6:1: syntax error: antlr: test.g:6:1: unexpected token:
null
error(10): internal error: test.g : java.lang.NullPointerException
org.antlr.tool.Grammar.setGrammarContent(Grammar.java:524)
org.antlr.tool.Grammar.<init>(Grammar.java:456)
org.antlr.Tool.getGrammar(Tool.java:331)
org.antlr.Tool.process(Tool.java:267)
org.antlr.Tool.main(Tool.java:70)
It seems to me the safest thing to do is to place the defintions as
productions in your grammar. You also have the capability to add predicates
to the productions in order to fine tune the lexer. But, you probably don't
want to add literals to your grammar like this anyways if you have a lot of
them. It seems that the lexer generated may become too large to compile
(e.g., Java and a 64K code size limit).
--Ken Domino
More information about the antlr-interest
mailing list