[antlr-interest] Lexer ambigiuoties

Tue Feb 17 19:25:12 PST 2009

Looks like you are trying to do things in Lexer that actually have to be
done in parser. Try keeping the bare minimum in Lexer and move other parsing
logics into Parser. 

Can you post a small sample input you are trying to parse?

- Indhu

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Paul Bouché
Sent: Wednesday, February 18, 2009 5:21 AM
To: Sidharth Kuruvila
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Lexer ambigiuoties

Hi,

that does not work. The problem is when I define a rule for unquoted strings
like: (where komma is a delimiter):

Ustring : Integer ' '+ ~('\n' | '{' | ',') |  Name ' '+ ~('\n' | '{' | ',')
| ~(' ' | '\n' | ',')+;

The lexer will match >>3<< as an integer but >>3 << causes an error whereas
before this was ok. Of course how should the lexer know that in one case
blank is supposed to be a whitespace and in another case is part of the
value, i.e. >>3 a<<.

What I would like to write is:

Ustring : ~Name | ~Integer;

but this is not possible.

BR,
Paul

Sidharth Kuruvila schrieb: 

Try moveing the rule for Name bellow Ipaddress.

Regards,
Sidharth

On Wed, Feb 18, 2009 at 1:23 AM, "Paul Bouché (NSN)" <paul.bouche at nsn.com>
wrote:

Hi,

I have a lexer which already recognizes valid tokens of different types,
e.g. an integer will generate an integer token, a quoted string a string
token, an ip address and ipaddress token etc.
E.g:

property : key '=' value;
key : Name;
value : Integer | String | Ipaddress;
Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+
Integer : ('+'|'-')? ('0'..'9')+;
Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+
// simplified, actual grammar is correct max of three digits
String :  ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\''
        | '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"'
        );
WHITESPACE
      :
      ( ' ' | '\t' | '\n' )+
      { skip(); }
      ;

All works fine. Now I need to include unquoted strings with blanks. The
problem is '0 ' (zero blank - without quotes of course). I cannot get
the lexer to match this as an Integer as before. Basically I want a rule
which says, if it is not something of the previous tokens, try if is an
unquoted string. Of course an unquoted string may not have newlines.
Any hints on how to archive this?
I tried everything and ran several times into code too large exceptions
because the actual grammar is much more complex (there are more unquoted
values which are recognized by certain prefixed characters such as < 0x
:: etc.).

Thanks a bunch!
Paul

--
Paul Bouché
Voice: +49 30 590080-1284

Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin,
Germany
Sitz der Gesellschaft: München / Registered office: Munich
Registergericht: München / Commercial registry: Munich, HRA 88537
WEEE-Reg.-Nr.: DE 52984304

Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens
Networks Management GmbH
Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke
Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri
Kivinen
Sitz der Gesellschaft: München / Registered office: Munich
Registergericht: München / Commercial registry: Munich, HRB 163416

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
I am but a man.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090218/ceededdc/attachment.html