[antlr-interest] how to resolve ambiguity considering identifiers vs mathematical constants

Tue May 27 08:24:18 PDT 2008

>I am writing a grammar (using antlr 3) that should be able to 
>parse identifiers and mathematical constants (they may be used in 
>more complex constructs). For example, the mathematical constant 
>PI and E. Depending on the context they can also be identifiers.
>How can I best resolve this ambiguity?
>
>Thanks in advance!
>
>Gerard
>
>
>Here is a simplified grammar:
>
>expression
>                 :              MATH_CONSTANT | IDENTIFIER;
>
>MATH_CONSTANT
>                 :              'E' | 'PI';
>
>IDENTIFIER
>  :             ('a'..'z'|'A'..'Z') 
> ('0'..'9'|'a'..'z'|'A'..'Z'|'_'|'.')*;

What you've already got there should work.  (You might get an 
ambiguity warning, but you should be able to just ignore it.)  The 
important thing is to list your most specific rules (keywords etc 
like MATH_CONSTANT) before the more general rules (like 
IDENTIFIER).

(Having said that, sometimes the lexer lookahead generation seems 
to get a bit confused, and you need to do more work to help it 
out.  So write lots of unit tests.)

If there's another case at the parser level where you want to 
treat a MATH_CONSTANT as if it were an IDENTIFIER, then you just 
need to put in a parser rule (similar to expression above) which 
recognises both.  If you're constructing an AST, you can also 
convert the MATH_CONSTANT token to an IDENTIFIER one if you want 
to at the same time.

Thanks Gavin,

You directed me to a solution to my problem. I changed my grammar to the
following (I use a # to tell my grammar it is dealing with a math
constant):

expression		:              MATH_CONSTANT | IDENTIFIER;

MATH_CONSTANT	:              ('E' | 'PI') '#';

IDENTIFIER		:             ('a'..'z'|'A'..'Z')
('0'..'9'|'a'..'z'|'A'..'Z'|'_'|'.')*;