[antlr-interest] lexical nondeterminism

John B. Brodie jbb at acm.org
Tue Aug 22 13:02:18 PDT 2006


>follow up the previous email, I changed the rules abit as shown:
>=========================================================================
>protected ANYSTRING	:  (~('\n'|'\r'))* ('\n'|'\r');
>protected WS : (  ' ' | '\t' );
>
>PROPERTYNAME	: '%' ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|
>SPECIALCHAR)* ;
>COMMENT : "//" ANYSTRING;
>ABSTRACT	:	("ABSTRACT" (WS)+) => ("ABSTRACT" (WS)+) ANYSTRING
>	|	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|SPECIALCHAR)*
>{ $setType(VARIABLE_NAME); } ;
>=========================================================================
>
>then I got the following warning message:
>
>1	lexical nondeterminism upon k==1:'\t',' ' k==2:'\u0003'..'\u00ff'
>k==3:<end-of-token>,'\u0003'..'\u00ff' between alt 1 and exit branch of
>block	
>
>
>anyone can help?

The problem is with the (WS)+ phrase before the ANYSTRING.

Consider the input "ABSTRACT  ", the second blank could either be part
of the (WS)+ or be the first character of the ANYSTRING, thus the
non-determinism.

I assume you want ANYSTRING to start with the first non-blank
character, so just add ~(' '|'\t) to the from of the ANYSTRING rule.
(and of course, also adjusting any other rule that uses ANYSTRING).

Also you do not really need the predicate since you have a fixed size
lookahead. e.g. k=9 will distinguish "ABSTRACT " from "ABSTRACTION". I
always work really hard to avoid predicates because they involve
backtracking with the possibility of scanning the input text multiple
times.

Anyway, here is a lexer that gets no complaints from the antlr.Tool
(did not actually try to test it any further):

//=========================================================================
class L extends Lexer;

options {
    k = 9;
    charVocabulary = '\3'..'\377';
}

protected SPECIALCHAR : '_';

protected ANYSTRING	:  ~(' '|'\t') (~('\n'|'\r'))* ('\n'|'\r');
protected WS : (  ' ' | '\t' );

protected NAME
    : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|SPECIALCHAR)* ;

PROPERTYNAME : '%' NAME ;

COMMENT : "//" (WS)+ ANYSTRING;

ABSTRACT : "ABSTRACT" (WS)+ ANYSTRING ;

VARIABLE_NAME : NAME;
//=========================================================================

Hope this helps...

   -jbb


More information about the antlr-interest mailing list