[antlr-interest] Lexical nondeterminism

John B. Brodie jbb at acm.org
Wed Jan 11 10:16:56 PST 2006


Gabriel Radu asked:
>I am trying to write a antler grammar and I am getting a following result:
>
>ANTLR Parser Generator   Version 2.7.5 (20050128)   1989-2005 jGuru.com
>ServiceCompiler.g: warning:lexical nondeterminism between rules
>INT_or_FLOAT_or_MACADR_or_VERSIONSTRING and DEFAULT upon
>AuvitranServiceCompiler.g:     k==1:'D','d'
>AuvitranServiceCompiler.g:     k==2:'E','e'
>AuvitranServiceCompiler.g:     k==3:'F','f'
>AuvitranServiceCompiler.g:     k==4:'A','a'
>AuvitranServiceCompiler.g:     k==5:'U','u'
>AuvitranServiceCompiler.g:     k==6:'L','l'
>AuvitranServiceCompiler.g:     k==7:'T','t'
>AuvitranServiceCompiler.g:     k==8:<end-of-token>
>AuvitranServiceCompiler.g:     k==9:<end-of-token>
>AuvitranServiceCompiler.g:     k==10:<end-of-token>
>
>The interesting parts of the lexer are:
>
>...lots of informative stuff snipped...

You have:

>protected INT
>  :	(HEXDIG)+
>;

and

>protected VERSIONSTRING_L
>  : ( DIGIT )+ DOT ( DIGIT )+ DOT ( DIGIT )+ ('A'..'Z'|'a'..'z')?
>;
>
>protected VERSIONSTRING_S
>  : ( DIGIT )+ DOT ( DIGIT )+ ('A'..'Z'|'a'..'z')
>;
>
>protected VERSIONSTRING : ;
>
>INT_or_FLOAT_or_MACADR_or_VERSIONSTRING
>
>   : ( DIGIT (DIGIT)? DOT DIGIT ( DIGIT (DIGIT)? )? DOT )
>          => VERSIONSTRING_L { $setType( VERSIONSTRING ); }
>
>   | ( DIGIT (DIGIT)? DOT DIGIT ( DIGIT (DIGIT)? )? ('A'..'Z'|'a'..'z') )
>          => VERSIONSTRING_S { $setType( VERSIONSTRING ); }
>
>   | ( ( DIGIT )+ DOT ) => FLOAT { $setType( FLOAT ); }
>
>   | ( HEXDIG HEXDIG MACADRSEPARATOR ) => MACADR { $setType( MACADR ); }
>
>   | ( ( DIGIT )+ ) => INT { $setType( INT ); }
>
>;

and

>DEFAULT:
>    ('D' | 'd')
>    ('E' | 'e')
>    ('F' | 'f')
>    ('A' | 'a')
>    ('U' | 'u')
>    ('L' | 'l')
>    ('T' | 't')
>;

i believe that your ambiguity arises from INT being a sequence of
HEXDIG (dispite the predicate in the INT_or_FLOAT_...whatever rule).

thus the intput string `default` could be a DEFAULT or an INT followed
by NONTOCLITs.

while your k=10 lookahead would seem to be plenty to disambiguate this
(just need to look at the first 5 symbols); it has been my
exprience that lookahead is not considered when one of the items being
considered is expressed as a loop (e.g. either ()+ or ()*). that is, Antlr
will not try to do the 5 symbol lookahead before entering the INT loop.

so if an INT really is a sequence of HEXDIG then you will need to add
another predicated alternative to your INT_or_...whatever rule.

on the other hand if an INT is really a sequence of DIGIT then just
fix the protected INT rule and set the k=3 and (I think, not tested)
and you will have fixed this ambiguity.


on another issue which you did not (yet) ask about. you should be
really careful with your syntax predicates. consider the input string
"11.22.33.44.55.66". it would seem that this should scan as a MACADR,
yet your predicate for VERSIONSTRING_L will match this string and you
will end up scanning it as a VERSIONSTRING ("11.22.33") followed by DOT
followed by another VERSIONSTRING (i think).

attached is a version of your scanner that addresses this issue.

hope this helps...

//--------------------------begin attachment--------------------------

//----------------------------------------------------------------------
// Lexer
//----------------------------------------------------------------------

class ServiceLexer extends Lexer;

//----------------------------------------------------------------------
// White speace:

WS_ : (' ' | '\t') { $setType(SKIP); } ;

NEWLINE
    : '\n' ( '\r' )?
    | '\r' ( '\n' )?
;


//----------------------------------------------------------------------
// Chars:

NONTOCLIT
    :   'g'..'u' | 'x'..'z'
    |   'G'..'U' | 'X'..'Z'
;

protected LETTER : 'A'..'Z' | 'a'..'z' ;



//----------------------------------------------------------------------
// Numbers:

protected DIGIT
	:	'0'..'9'
;

protected HEXLIT
  : 'a'..'f' | 'A'..'F'
;

protected HEXDIG
  : ( DIGIT | HEXLIT )
;

protected INT
  :	( HEXDIG )+
;

protected FLOAT
  : ( DIGIT )+ DOT ( DIGIT )+
;

protected MACADRSEPARATOR
  : DOT
;

protected MACADR
  :
    HEXDIG HEXDIG MACADRSEPARATOR
    HEXDIG HEXDIG MACADRSEPARATOR
    HEXDIG HEXDIG MACADRSEPARATOR
    HEXDIG HEXDIG MACADRSEPARATOR
    HEXDIG HEXDIG MACADRSEPARATOR
    HEXDIG HEXDIG
;

protected VERSIONSTRING
  : ( DIGIT )+ DOT ( DIGIT )+ ( ( DOT ( DIGIT )+ ( LETTER )? ) | LETTER )
;

INT_or_FLOAT_or_MACADR_or_VERSIONSTRING_or_DEFAULT
    : ( DEFAULT ) => ( DEFAULT { $setType( DEFAULT ); } )
    | ( MACADR ) => ( MACADR { $setType( MACADR ); } )
    | ( VERSIONSTRING ) => ( VERSIONSTRING { $setType( VERSIONSTRING ); } )
    | ( FLOAT ) => ( FLOAT { $setType( FLOAT ); } )
    | ( INT ) => ( INT { $setType( INT ); } )
;



//----------------------------------------------------------------------
// Punctuation:

DOT:    '.' ;

COMMA:	',' ;

COLON:	':' ;

SCOLON:	';' ;



//[ some more text]



//----------------------------------------------------------------------
protected DEFAULT:
    ('D' | 'd')
    ('E' | 'e')
    ('F' | 'f')
    ('A' | 'a')
    ('U' | 'u')
    ('L' | 'l')
    ('T' | 't')
;


//---------------------------end attachment---------------------------




More information about the antlr-interest mailing list