[antlr-interest] adding Unicode identifiers confuses grammar
Jim Idle
jimi at temporal-wave.com
Mon Sep 14 11:29:52 PDT 2009
On 09/14/2009 08:36 AM, David J. Biesack wrote:
> I'm working on a grammar for an AMPL-like language (see an extracted simplified
> version below). It works fine (ANTLR 3.1.3) when I use the following token
> definition for identifiers:
>
> ID
> :
> ('a'..'z'|'A'..'Z'|'_'|'$') ('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
> ;
>
> but when I copy the token fragments for Unicode identifiers from
> http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
> and change my ID rule to use them:
>
> ID
> :
> IdentifierStart IdentifierPart*
> ;
>
> I get many warnings and disabled tokens, and an error. Here are some (full errors listed below):
>
> warning(209): AMPL.g:140:1: Multiple token rules can match input such as "'o'": OR, ORDERED, ID
>
> As a result, token(s) ORDERED,ID were disabled for that input
> ...
> warning(209): AMPL.g:75:1: Multiple token rules can match input such as "':'": ASSIGN, COLON
>
If you are sure that the messages are not correct and the lexer rules
are not ambiguous, then you probably need to increase the conversion
timeout:
-Xconversiontimeout 30000
if that does not work, then there is a conflict in your rules.
Jim
More information about the antlr-interest
mailing list