[antlr-interest] Lexer ambigiuoties
"Paul Bouché (NSN)"
paul.bouche at nsn.com
Tue Feb 17 11:53:02 PST 2009
Hi,
I have a lexer which already recognizes valid tokens of different types,
e.g. an integer will generate an integer token, a quoted string a string
token, an ip address and ipaddress token etc.
E.g:
property : key '=' value;
key : Name;
value : Integer | String | Ipaddress;
Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+
Integer : ('+'|'-')? ('0'..'9')+;
Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+
// simplified, actual grammar is correct max of three digits
String : ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\''
| '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"'
);
WHITESPACE
:
( ' ' | '\t' | '\n' )+
{ skip(); }
;
All works fine. Now I need to include unquoted strings with blanks. The
problem is '0 ' (zero blank - without quotes of course). I cannot get
the lexer to match this as an Integer as before. Basically I want a rule
which says, if it is not something of the previous tokens, try if is an
unquoted string. Of course an unquoted string may not have newlines.
Any hints on how to archive this?
I tried everything and ran several times into code too large exceptions
because the actual grammar is much more complex (there are more unquoted
values which are recognized by certain prefixed characters such as < 0x
:: etc.).
Thanks a bunch!
Paul
--
Paul Bouché
Voice: +49 30 590080-1284
Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin, Germany
Sitz der Gesellschaft: München / Registered office: Munich
Registergericht: München / Commercial registry: Munich, HRA 88537
WEEE-Reg.-Nr.: DE 52984304
Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens Networks Management GmbH
Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke
Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri Kivinen
Sitz der Gesellschaft: München / Registered office: Munich
Registergericht: München / Commercial registry: Munich, HRB 163416
More information about the antlr-interest
mailing list