[antlr-interest] Lexer ambigiuoties

Tue Feb 17 22:38:45 PST 2009

Paul Bouché (NSN) schrieb:
> Hi,
> 
> I have a lexer which already recognizes valid tokens of different types, 
> e.g. an integer will generate an integer token, a quoted string a string 
> token, an ip address and ipaddress token etc.
> E.g:
> 
> property : key '=' value;
> key : Name;
> value : Integer | String | Ipaddress;
> Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+
> Integer : ('+'|'-')? ('0'..'9')+;
> Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ 
> // simplified, actual grammar is correct max of three digits
> String :  ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\''
>          | '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"'
>          );
> WHITESPACE
>        :
>        ( ' ' | '\t' | '\n' )+
>        { skip(); }
>        ;
> 
> All works fine. Now I need to include unquoted strings with blanks. The 
> problem is '0 ' (zero blank - without quotes of course). I cannot get 
> the lexer to match this as an Integer as before. Basically I want a rule 
> which says, if it is not something of the previous tokens, try if is an 
> unquoted string. Of course an unquoted string may not have newlines.
> Any hints on how to archive this?
> I tried everything and ran several times into code too large exceptions 
> because the actual grammar is much more complex (there are more unquoted 
> values which are recognized by certain prefixed characters such as < 0x 
> :: etc.).
> 
> Thanks a bunch!
> Paul
> 
Try to set the appropriate type later like it is done here:
<http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs>

Johannes