[antlr-interest] Tokenizing question
Gavin Lambert
antlr at mirality.co.nz
Mon Feb 11 23:51:26 PST 2008
At 10:39 12/02/2008, Mark Volkmann wrote:
>I think Shmuel Siegel provided a solution in the thread on
"Lexer
>ambiguities". The trick is to make the most general of your
>conflicting rules be a lexer rule and make the other, more
>specific rules be parser rules.
[...]
>declaration_command: timescale; // omitted other alternatives
>timescale: '$timescale' NUMBER time_unit '$end';
>time_unit: 's' | 'ms' | 'us' | 'ns' | 'ps' | 'fs';
>simulation_command: value_change; // omitted other alternatives
>value_change: scalar_value_change;
>scalar_value_change: value IDENTIFIER;
>
>value: '0' | '1' | 'x' | 'X' | 'z' | 'Z';
>NUMBER: DIGIT+;
>fragment DIGIT: '0'..'9';
>
>// An IDENTIFIER cannot begin with a digit.
>IDENTIFIER: ('!'..'/' | ':'..'~') ('!'..'~')*;
You have to be careful with this sort of thing too :) Any time
you use a quoted literal string in a parser rule, it secretly
creates a new lexer rule. This means that "ms" can now never be
an IDENTIFIER, and neither can "x".
In other words, the "value" rule defined above is effectively
equivalent to this:
value: T401 | T402 | T403 | T404 | T405 | T406;
T401: '0';
T402: '1';
T403: 'x';
T404: 'X';
T405: 'z';
T406: 'Z';
It won't generate duplicates, though, so you can "add back" the
ones you want via another parser rule. For example, to permit "x
x" as a scalar_value_change (the first being the value, the second
an identifier):
scalar_value_change: value identifier;
value: '0' | '1' | 'x' | 'X' | 'z' | 'Z';
identifier: IDENTIFIER | 'x' | 'X' | 'z' | 'Z' | time_unit;
number: NUMBER | '0' | '1';
I usually prefer to avoid using literal strings in parser rules at
all (it helps to remind me of this effect, and it makes the
generated code easier to understand), but some people think it
makes the grammar easier to read by using them. As long as you
don't forget how it works underneath, either is fine -- it's just
a matter of taste.
More information about the antlr-interest
mailing list