[antlr-interest] Lexer ambigiuoties

Indhu Bharathi indhu.b at s7software.com
Wed Feb 18 06:22:24 PST 2009


I actually refer to the way how ANTLR decides which token has to be generated next. The simplest case would be that one has a NUMBER rule, a DOT rule and a FLOGTING_POINT rule. With the input "1." ANTLR could theoritically create a NUMBER token followed by a DOT token, but just tries to match FLOATING_POINT, which fails. 

Curious... Why not in such cases, backtrack and return NUMBER? 

----- Original Message ----- 
From: Johannes Luber <JALuber at gmx.de> 
To: paul bouche <paul.bouche at apertio.com> 
Cc: antlr-interest at antlr.org 
Sent: Wednesday, February 18, 2009 7:33:20 PM GMT+0530 Asia/Calcutta 
Subject: Re: [antlr-interest] Lexer ambigiuoties 

> Johannes Luber schrieb: 
> > The deeper problem lies in the fact that ANTLR uses an insufficent 
> algorithm to sort out - for humans - non-ambiguous input in all cases correctly. 
> From the book I glean that LL(*) does cover all context free languages. 
> Those for humans non ambiguous but for computers ambiguous cases are 
> only non ambiguous to humans because they have context? Because a blank 
> or any other character for that matter may be interpreted as white space 
> in one case it shall be interpreted differently in another case. The 
> difference between those cases is context, i.e. what came before and 
> what the next k-ahead symbol is. 
> 
> Or could you be more concrete by what you mean with "uses an insufficent 
> algorithm" - ah I just thought that the parser is LL(*) but the lexer 
> uses a cyclic DFA for prediction which may not cover all context free 
> languages and certainly not context-sensitive. 

I actually refer to the way how ANTLR decides which token has to be generated next. The simplest case would be that one has a NUMBER rule, a DOT rule and a FLOGTING_POINT rule. With the input "1." ANTLR could theoritically create a NUMBER token followed by a DOT token, but just tries to match FLOATING_POINT, which fails. 

Johannes 
> 
> BR, 
> Paul 
> 
> Paul 
> > Not sure if changing the algorithm would help here, too, but it would 
> at least simplify the common cases. Unfortunately, it isn't clear when Ter 
> does finally do a rewrite here. 
> > 
> > Johannes 
> > 
> >> Johannes Luber schrieb: 
> >> 
> >>> Paul Bouché (NSN) schrieb: 
> >>> 
> >>> 
> >>>> Hi, 
> >>>> 
> >>>> I have a lexer which already recognizes valid tokens of different 
> >>>> 
> >> types, 
> >> 
> >>>> e.g. an integer will generate an integer token, a quoted string a 
> >>>> 
> >> string 
> >> 
> >>>> token, an ip address and ipaddress token etc. 
> >>>> E.g: 
> >>>> 
> >>>> property : key '=' value; 
> >>>> key : Name; 
> >>>> value : Integer | String | Ipaddress; 
> >>>> Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+ 
> >>>> Integer : ('+'|'-')? ('0'..'9')+; 
> >>>> Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.' 
> ('0'..'9')+ 
> >>>> // simplified, actual grammar is correct max of three digits 
> >>>> String : ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\'' 
> >>>> | '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"' 
> >>>> ); 
> >>>> WHITESPACE 
> >>>> : 
> >>>> ( ' ' | '\t' | '\n' )+ 
> >>>> { skip(); } 
> >>>> ; 
> >>>> 
> >>>> All works fine. Now I need to include unquoted strings with blanks. 
> The 
> >>>> problem is '0 ' (zero blank - without quotes of course). I cannot get 
> >>>> the lexer to match this as an Integer as before. Basically I want a 
> >>>> 
> >> rule 
> >> 
> >>>> which says, if it is not something of the previous tokens, try if is 
> an 
> >>>> unquoted string. Of course an unquoted string may not have newlines. 
> >>>> Any hints on how to archive this? 
> >>>> I tried everything and ran several times into code too large 
> exceptions 
> >>>> because the actual grammar is much more complex (there are more 
> >>>> 
> >> unquoted 
> >> 
> >>>> values which are recognized by certain prefixed characters such as < 
> 0x 
> >>>> :: etc.). 
> >>>> 
> >>>> Thanks a bunch! 
> >>>> Paul 
> >>>> 
> >>>> 
> >>>> 
> >>> Try to set the appropriate type later like it is done here: 
> >>> 
> >>> 
> >> 
> <http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs> 
> >> 
> >>> Johannes 
> >>> 
> >>> 
> > 
> > 
> 
> 
> -- 
> Paul Bouché 
> Voice: +49 30 590080-1284 
> 
> Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin, 
> Germany 
> Sitz der Gesellschaft: München / Registered office: Munich 
> Registergericht: München / Commercial registry: Munich, HRA 88537 
> WEEE-Reg.-Nr.: DE 52984304 
> 
> Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens 
> Networks Management GmbH 
> Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke 
> Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri 
> Kivinen 
> Sitz der Gesellschaft: München / Registered office: Munich 
> Registergericht: München / Commercial registry: Munich, HRB 163416 
> 

-- 
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL 
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a 

List: http://www.antlr.org/mailman/listinfo/antlr-interest 
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090218/2c3eda06/attachment.html 


More information about the antlr-interest mailing list