[antlr-interest] Lexical nondeterminism
Gabriel Radu
gabriel.adrian.radu at googlemail.com
Fri Jan 13 02:51:51 PST 2006
Dear John,
What you suggested worked just fine apart form
"WS_ : (' ' | '\t') { $setType(SKIP); } ;" where when generating a C++
parser SKIP needs to be preceded by it's namespaces.
Thank you for your help!
Kind regards,
Gabriel
On 11/01/06, John B. Brodie <jbb at acm.org> wrote:
>
> Gabriel Radu asked:
> >I am trying to write a antler grammar and I am getting a following result:
> >
> >ANTLR Parser Generator Version 2.7.5 (20050128) 1989-2005 jGuru.com
> >ServiceCompiler.g: warning:lexical nondeterminism between rules
> >INT_or_FLOAT_or_MACADR_or_VERSIONSTRING and DEFAULT upon
> >AuvitranServiceCompiler.g: k==1:'D','d'
> >AuvitranServiceCompiler.g: k==2:'E','e'
> >AuvitranServiceCompiler.g: k==3:'F','f'
> >AuvitranServiceCompiler.g: k==4:'A','a'
> >AuvitranServiceCompiler.g: k==5:'U','u'
> >AuvitranServiceCompiler.g: k==6:'L','l'
> >AuvitranServiceCompiler.g: k==7:'T','t'
> >AuvitranServiceCompiler.g: k==8:<end-of-token>
> >AuvitranServiceCompiler.g: k==9:<end-of-token>
> >AuvitranServiceCompiler.g: k==10:<end-of-token>
> >
> >The interesting parts of the lexer are:
> >
> >...lots of informative stuff snipped...
>
> You have:
>
> >protected INT
> > : (HEXDIG)+
> >;
>
> and
>
> >protected VERSIONSTRING_L
> > : ( DIGIT )+ DOT ( DIGIT )+ DOT ( DIGIT )+ ('A'..'Z'|'a'..'z')?
> >;
> >
> >protected VERSIONSTRING_S
> > : ( DIGIT )+ DOT ( DIGIT )+ ('A'..'Z'|'a'..'z')
> >;
> >
> >protected VERSIONSTRING : ;
> >
> >INT_or_FLOAT_or_MACADR_or_VERSIONSTRING
> >
> > : ( DIGIT (DIGIT)? DOT DIGIT ( DIGIT (DIGIT)? )? DOT )
> > => VERSIONSTRING_L { $setType( VERSIONSTRING ); }
> >
> > | ( DIGIT (DIGIT)? DOT DIGIT ( DIGIT (DIGIT)? )? ('A'..'Z'|'a'..'z') )
> > => VERSIONSTRING_S { $setType( VERSIONSTRING ); }
> >
> > | ( ( DIGIT )+ DOT ) => FLOAT { $setType( FLOAT ); }
> >
> > | ( HEXDIG HEXDIG MACADRSEPARATOR ) => MACADR { $setType( MACADR ); }
> >
> > | ( ( DIGIT )+ ) => INT { $setType( INT ); }
> >
> >;
>
> and
>
> >DEFAULT:
> > ('D' | 'd')
> > ('E' | 'e')
> > ('F' | 'f')
> > ('A' | 'a')
> > ('U' | 'u')
> > ('L' | 'l')
> > ('T' | 't')
> >;
>
> i believe that your ambiguity arises from INT being a sequence of
> HEXDIG (dispite the predicate in the INT_or_FLOAT_...whatever rule).
>
> thus the intput string `default` could be a DEFAULT or an INT followed
> by NONTOCLITs.
>
> while your k=10 lookahead would seem to be plenty to disambiguate this
> (just need to look at the first 5 symbols); it has been my
> exprience that lookahead is not considered when one of the items being
> considered is expressed as a loop (e.g. either ()+ or ()*). that is, Antlr
> will not try to do the 5 symbol lookahead before entering the INT loop.
>
> so if an INT really is a sequence of HEXDIG then you will need to add
> another predicated alternative to your INT_or_...whatever rule.
>
> on the other hand if an INT is really a sequence of DIGIT then just
> fix the protected INT rule and set the k=3 and (I think, not tested)
> and you will have fixed this ambiguity.
>
>
> on another issue which you did not (yet) ask about. you should be
> really careful with your syntax predicates. consider the input string
> "11.22.33.44.55.66". it would seem that this should scan as a MACADR,
> yet your predicate for VERSIONSTRING_L will match this string and you
> will end up scanning it as a VERSIONSTRING ("11.22.33") followed by DOT
> followed by another VERSIONSTRING (i think).
>
> attached is a version of your scanner that addresses this issue.
>
> hope this helps...
>
> //--------------------------begin attachment--------------------------
>
> //----------------------------------------------------------------------
> // Lexer
> //----------------------------------------------------------------------
>
> class ServiceLexer extends Lexer;
>
> //----------------------------------------------------------------------
> // White speace:
>
> WS_ : (' ' | '\t') { $setType(SKIP); } ;
>
> NEWLINE
> : '\n' ( '\r' )?
> | '\r' ( '\n' )?
> ;
>
>
> //----------------------------------------------------------------------
> // Chars:
>
> NONTOCLIT
> : 'g'..'u' | 'x'..'z'
> | 'G'..'U' | 'X'..'Z'
> ;
>
> protected LETTER : 'A'..'Z' | 'a'..'z' ;
>
>
>
> //----------------------------------------------------------------------
> // Numbers:
>
> protected DIGIT
> : '0'..'9'
> ;
>
> protected HEXLIT
> : 'a'..'f' | 'A'..'F'
> ;
>
> protected HEXDIG
> : ( DIGIT | HEXLIT )
> ;
>
> protected INT
> : ( HEXDIG )+
> ;
>
> protected FLOAT
> : ( DIGIT )+ DOT ( DIGIT )+
> ;
>
> protected MACADRSEPARATOR
> : DOT
> ;
>
> protected MACADR
> :
> HEXDIG HEXDIG MACADRSEPARATOR
> HEXDIG HEXDIG MACADRSEPARATOR
> HEXDIG HEXDIG MACADRSEPARATOR
> HEXDIG HEXDIG MACADRSEPARATOR
> HEXDIG HEXDIG MACADRSEPARATOR
> HEXDIG HEXDIG
> ;
>
> protected VERSIONSTRING
> : ( DIGIT )+ DOT ( DIGIT )+ ( ( DOT ( DIGIT )+ ( LETTER )? ) | LETTER )
> ;
>
> INT_or_FLOAT_or_MACADR_or_VERSIONSTRING_or_DEFAULT
> : ( DEFAULT ) => ( DEFAULT { $setType( DEFAULT ); } )
> | ( MACADR ) => ( MACADR { $setType( MACADR ); } )
> | ( VERSIONSTRING ) => ( VERSIONSTRING { $setType( VERSIONSTRING ); } )
> | ( FLOAT ) => ( FLOAT { $setType( FLOAT ); } )
> | ( INT ) => ( INT { $setType( INT ); } )
> ;
>
>
>
> //----------------------------------------------------------------------
> // Punctuation:
>
> DOT: '.' ;
>
> COMMA: ',' ;
>
> COLON: ':' ;
>
> SCOLON: ';' ;
>
>
>
> //[ some more text]
>
>
>
> //----------------------------------------------------------------------
> protected DEFAULT:
> ('D' | 'd')
> ('E' | 'e')
> ('F' | 'f')
> ('A' | 'a')
> ('U' | 'u')
> ('L' | 'l')
> ('T' | 't')
> ;
>
>
> //---------------------------end attachment---------------------------
>
>
>
More information about the antlr-interest
mailing list