[antlr-interest] Lexer ambigiuoties

Johannes Luber JALuber at gmx.de
Wed Feb 18 06:03:20 PST 2009


> Johannes Luber schrieb:
> > The deeper problem lies in the fact that ANTLR uses an insufficent
> algorithm to sort out - for humans - non-ambiguous input in all cases correctly.
>  From the book I glean that LL(*) does cover all context free languages. 
> Those for humans non ambiguous but for computers ambiguous cases are 
> only non ambiguous to humans because they have context? Because a blank 
> or any other character for that matter may be interpreted as white space 
> in one case it shall be interpreted differently in another case. The 
> difference between those cases is context, i.e. what came before and 
> what the next k-ahead symbol is.
> 
> Or could you be more concrete by what you mean with "uses an insufficent 
> algorithm" - ah I just thought that the parser is LL(*) but the lexer 
> uses a cyclic DFA for prediction  which  may not cover all context free 
> languages and certainly not context-sensitive.

I actually refer to the way how ANTLR decides which token has to be generated next. The simplest case would be that one has a NUMBER rule, a DOT rule and a FLOGTING_POINT rule. With the input "1." ANTLR could theoritically create a NUMBER token followed by a DOT token, but just tries to match FLOATING_POINT, which fails.

Johannes
> 
> BR,
> Paul
> 
> Paul
> >  Not sure if changing the algorithm would help here, too, but it would
> at least simplify the common cases. Unfortunately, it isn't clear when Ter
> does finally do a rewrite here.
> >
> > Johannes
> >   
> >> Johannes Luber schrieb:
> >>     
> >>> Paul Bouché (NSN) schrieb:
> >>>   
> >>>       
> >>>> Hi,
> >>>>
> >>>> I have a lexer which already recognizes valid tokens of different
> >>>>         
> >> types, 
> >>     
> >>>> e.g. an integer will generate an integer token, a quoted string a
> >>>>         
> >> string 
> >>     
> >>>> token, an ip address and ipaddress token etc.
> >>>> E.g:
> >>>>
> >>>> property : key '=' value;
> >>>> key : Name;
> >>>> value : Integer | String | Ipaddress;
> >>>> Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+
> >>>> Integer : ('+'|'-')? ('0'..'9')+;
> >>>> Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.'
> ('0'..'9')+
> >>>> // simplified, actual grammar is correct max of three digits
> >>>> String :  ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\''
> >>>>          | '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"'
> >>>>          );
> >>>> WHITESPACE
> >>>>        :
> >>>>        ( ' ' | '\t' | '\n' )+
> >>>>        { skip(); }
> >>>>        ;
> >>>>
> >>>> All works fine. Now I need to include unquoted strings with blanks.
> The
> >>>> problem is '0 ' (zero blank - without quotes of course). I cannot get
> >>>> the lexer to match this as an Integer as before. Basically I want a
> >>>>         
> >> rule 
> >>     
> >>>> which says, if it is not something of the previous tokens, try if is
> an
> >>>> unquoted string. Of course an unquoted string may not have newlines.
> >>>> Any hints on how to archive this?
> >>>> I tried everything and ran several times into code too large
> exceptions
> >>>> because the actual grammar is much more complex (there are more
> >>>>         
> >> unquoted 
> >>     
> >>>> values which are recognized by certain prefixed characters such as <
> 0x
> >>>> :: etc.).
> >>>>
> >>>> Thanks a bunch!
> >>>> Paul
> >>>>
> >>>>     
> >>>>         
> >>> Try to set the appropriate type later like it is done here:
> >>>
> >>>       
> >>
> <http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs>
> >>     
> >>> Johannes
> >>>   
> >>>       
> >
> >   
> 
> 
> -- 
> Paul Bouché
> Voice: +49 30 590080-1284
>  
> Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin,
> Germany
> Sitz der Gesellschaft: München / Registered office: Munich
> Registergericht: München / Commercial registry: Munich, HRA 88537
> WEEE-Reg.-Nr.: DE 52984304
> 
> Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens
> Networks Management GmbH
> Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke
> Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri
> Kivinen
> Sitz der Gesellschaft: München / Registered office: Munich
> Registergericht: München / Commercial registry: Munich, HRB 163416
> 

-- 
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL 
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a


More information about the antlr-interest mailing list