[antlr-interest] Lexer ambigiuoties
Johannes Luber
jaluber at gmx.de
Tue Feb 17 22:38:45 PST 2009
Paul Bouché (NSN) schrieb:
> Hi,
>
> I have a lexer which already recognizes valid tokens of different types,
> e.g. an integer will generate an integer token, a quoted string a string
> token, an ip address and ipaddress token etc.
> E.g:
>
> property : key '=' value;
> key : Name;
> value : Integer | String | Ipaddress;
> Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+
> Integer : ('+'|'-')? ('0'..'9')+;
> Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+
> // simplified, actual grammar is correct max of three digits
> String : ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\''
> | '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"'
> );
> WHITESPACE
> :
> ( ' ' | '\t' | '\n' )+
> { skip(); }
> ;
>
> All works fine. Now I need to include unquoted strings with blanks. The
> problem is '0 ' (zero blank - without quotes of course). I cannot get
> the lexer to match this as an Integer as before. Basically I want a rule
> which says, if it is not something of the previous tokens, try if is an
> unquoted string. Of course an unquoted string may not have newlines.
> Any hints on how to archive this?
> I tried everything and ran several times into code too large exceptions
> because the actual grammar is much more complex (there are more unquoted
> values which are recognized by certain prefixed characters such as < 0x
> :: etc.).
>
> Thanks a bunch!
> Paul
>
Try to set the appropriate type later like it is done here:
<http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs>
Johannes
More information about the antlr-interest
mailing list