[antlr-interest] Lexer ambigiuoties

Johannes Luber JALuber at gmx.de
Wed Feb 18 03:42:35 PST 2009


> Hi,
> 
> thanks for the pointer - very interesting.
...
> 
> The example illustrates that it can be done, but it would mean a half 
> rewrite of the current lexer.
> 
> Thanks,
> Paul

The deeper problem lies in the fact that ANTLR uses an insufficent algorithm to sort out - for humans - non-ambiguous input in all cases correctly. Not sure if changing the algorithm would help here, too, but it would at least simplify the common cases. Unfortunately, it isn't clear when Ter does finally do a rewrite here.

Johannes
> 
> Johannes Luber schrieb:
> > Paul Bouché (NSN) schrieb:
> >   
> >> Hi,
> >>
> >> I have a lexer which already recognizes valid tokens of different
> types, 
> >> e.g. an integer will generate an integer token, a quoted string a
> string 
> >> token, an ip address and ipaddress token etc.
> >> E.g:
> >>
> >> property : key '=' value;
> >> key : Name;
> >> value : Integer | String | Ipaddress;
> >> Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+
> >> Integer : ('+'|'-')? ('0'..'9')+;
> >> Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+
> >> // simplified, actual grammar is correct max of three digits
> >> String :  ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\''
> >>          | '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"'
> >>          );
> >> WHITESPACE
> >>        :
> >>        ( ' ' | '\t' | '\n' )+
> >>        { skip(); }
> >>        ;
> >>
> >> All works fine. Now I need to include unquoted strings with blanks. The
> >> problem is '0 ' (zero blank - without quotes of course). I cannot get 
> >> the lexer to match this as an Integer as before. Basically I want a
> rule 
> >> which says, if it is not something of the previous tokens, try if is an
> >> unquoted string. Of course an unquoted string may not have newlines.
> >> Any hints on how to archive this?
> >> I tried everything and ran several times into code too large exceptions
> >> because the actual grammar is much more complex (there are more
> unquoted 
> >> values which are recognized by certain prefixed characters such as < 0x
> >> :: etc.).
> >>
> >> Thanks a bunch!
> >> Paul
> >>
> >>     
> > Try to set the appropriate type later like it is done here:
> >
> <http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs>
> >
> > Johannes
> >   

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01


More information about the antlr-interest mailing list