[antlr-interest] Tokenizing question

Gavin Lambert antlr at mirality.co.nz
Mon Feb 11 23:51:26 PST 2008


At 10:39 12/02/2008, Mark Volkmann wrote:
 >I think Shmuel Siegel provided a solution in the thread on 
"Lexer
 >ambiguities". The trick is to make the most general of your
 >conflicting rules be a lexer rule and make the other, more
 >specific rules be parser rules.
[...]
 >declaration_command: timescale; // omitted other alternatives
 >timescale: '$timescale' NUMBER time_unit '$end';
 >time_unit: 's' | 'ms' | 'us' | 'ns' | 'ps' | 'fs';
 >simulation_command: value_change; // omitted other alternatives
 >value_change: scalar_value_change;
 >scalar_value_change: value IDENTIFIER;
 >
 >value: '0' | '1' | 'x' | 'X' | 'z' | 'Z';
 >NUMBER: DIGIT+;
 >fragment DIGIT: '0'..'9';
 >
 >// An IDENTIFIER cannot begin with a digit.
 >IDENTIFIER: ('!'..'/' | ':'..'~') ('!'..'~')*;

You have to be careful with this sort of thing too :)  Any time 
you use a quoted literal string in a parser rule, it secretly 
creates a new lexer rule.  This means that "ms" can now never be 
an IDENTIFIER, and neither can "x".

In other words, the "value" rule defined above is effectively 
equivalent to this:

value: T401 | T402 | T403 | T404 | T405 | T406;
T401: '0';
T402: '1';
T403: 'x';
T404: 'X';
T405: 'z';
T406: 'Z';

It won't generate duplicates, though, so you can "add back" the 
ones you want via another parser rule.  For example, to permit "x 
x" as a scalar_value_change (the first being the value, the second 
an identifier):

scalar_value_change: value identifier;
value: '0' | '1' | 'x' | 'X' | 'z' | 'Z';
identifier: IDENTIFIER | 'x' | 'X' | 'z' | 'Z' | time_unit;
number: NUMBER | '0' | '1';

I usually prefer to avoid using literal strings in parser rules at 
all (it helps to remind me of this effect, and it makes the 
generated code easier to understand), but some people think it 
makes the grammar easier to read by using them.  As long as you 
don't forget how it works underneath, either is fine -- it's just 
a matter of taste.



More information about the antlr-interest mailing list