[antlr-interest] adding Unicode identifiers confuses grammar

Mon Sep 14 11:29:52 PDT 2009

On 09/14/2009 08:36 AM, David J. Biesack wrote:
> I'm working on a grammar for an AMPL-like language (see an extracted simplified
> version below). It works fine (ANTLR 3.1.3) when I use the following token
> definition for identifiers:
>
> ID
>    :
>    ('a'..'z'|'A'..'Z'|'_'|'$') ('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
>    ;
>
> but when I copy the token fragments for Unicode identifiers from
> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
> and change my ID rule to use them:
>
> ID
>    :
>    IdentifierStart IdentifierPart*
>    ;
>
> I get many warnings and disabled tokens, and an error. Here are some (full errors listed below):
>
>      warning(209): AMPL.g:140:1: Multiple token rules can match input such as "'o'": OR, ORDERED, ID
>
>      As a result, token(s) ORDERED,ID were disabled for that input
>      ...
>      warning(209): AMPL.g:75:1: Multiple token rules can match input such as "':'": ASSIGN, COLON
>    
If you are sure that the messages are not correct and the lexer rules 
are not ambiguous, then you probably need to increase the conversion 
timeout:

-Xconversiontimeout 30000

if that does not work, then there is a conflict in your rules.

Jim