[antlr-interest] how to resolve ambiguity considering identifiers vs mathematical constants
Gerard van de Glind
g.vandeglind at beinformed.nl
Tue May 27 08:24:18 PDT 2008
>I am writing a grammar (using antlr 3) that should be able to
>parse identifiers and mathematical constants (they may be used in
>more complex constructs). For example, the mathematical constant
>PI and E. Depending on the context they can also be identifiers.
>How can I best resolve this ambiguity?
>
>Thanks in advance!
>
>Gerard
>
>
>Here is a simplified grammar:
>
>expression
> : MATH_CONSTANT | IDENTIFIER;
>
>MATH_CONSTANT
> : 'E' | 'PI';
>
>IDENTIFIER
> : ('a'..'z'|'A'..'Z')
> ('0'..'9'|'a'..'z'|'A'..'Z'|'_'|'.')*;
What you've already got there should work. (You might get an
ambiguity warning, but you should be able to just ignore it.) The
important thing is to list your most specific rules (keywords etc
like MATH_CONSTANT) before the more general rules (like
IDENTIFIER).
(Having said that, sometimes the lexer lookahead generation seems
to get a bit confused, and you need to do more work to help it
out. So write lots of unit tests.)
If there's another case at the parser level where you want to
treat a MATH_CONSTANT as if it were an IDENTIFIER, then you just
need to put in a parser rule (similar to expression above) which
recognises both. If you're constructing an AST, you can also
convert the MATH_CONSTANT token to an IDENTIFIER one if you want
to at the same time.
Thanks Gavin,
You directed me to a solution to my problem. I changed my grammar to the
following (I use a # to tell my grammar it is dealing with a math
constant):
expression : MATH_CONSTANT | IDENTIFIER;
MATH_CONSTANT : ('E' | 'PI') '#';
IDENTIFIER : ('a'..'z'|'A'..'Z')
('0'..'9'|'a'..'z'|'A'..'Z'|'_'|'.')*;
More information about the antlr-interest
mailing list