[antlr-interest] Lexer ambigiuoties
Johannes Luber
JALuber at gmx.de
Wed Feb 18 06:03:20 PST 2009
> Johannes Luber schrieb:
> > The deeper problem lies in the fact that ANTLR uses an insufficent
> algorithm to sort out - for humans - non-ambiguous input in all cases correctly.
> From the book I glean that LL(*) does cover all context free languages.
> Those for humans non ambiguous but for computers ambiguous cases are
> only non ambiguous to humans because they have context? Because a blank
> or any other character for that matter may be interpreted as white space
> in one case it shall be interpreted differently in another case. The
> difference between those cases is context, i.e. what came before and
> what the next k-ahead symbol is.
>
> Or could you be more concrete by what you mean with "uses an insufficent
> algorithm" - ah I just thought that the parser is LL(*) but the lexer
> uses a cyclic DFA for prediction which may not cover all context free
> languages and certainly not context-sensitive.
I actually refer to the way how ANTLR decides which token has to be generated next. The simplest case would be that one has a NUMBER rule, a DOT rule and a FLOGTING_POINT rule. With the input "1." ANTLR could theoritically create a NUMBER token followed by a DOT token, but just tries to match FLOATING_POINT, which fails.
Johannes
>
> BR,
> Paul
>
> Paul
> > Not sure if changing the algorithm would help here, too, but it would
> at least simplify the common cases. Unfortunately, it isn't clear when Ter
> does finally do a rewrite here.
> >
> > Johannes
> >
> >> Johannes Luber schrieb:
> >>
> >>> Paul Bouché (NSN) schrieb:
> >>>
> >>>
> >>>> Hi,
> >>>>
> >>>> I have a lexer which already recognizes valid tokens of different
> >>>>
> >> types,
> >>
> >>>> e.g. an integer will generate an integer token, a quoted string a
> >>>>
> >> string
> >>
> >>>> token, an ip address and ipaddress token etc.
> >>>> E.g:
> >>>>
> >>>> property : key '=' value;
> >>>> key : Name;
> >>>> value : Integer | String | Ipaddress;
> >>>> Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+
> >>>> Integer : ('+'|'-')? ('0'..'9')+;
> >>>> Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.'
> ('0'..'9')+
> >>>> // simplified, actual grammar is correct max of three digits
> >>>> String : ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\''
> >>>> | '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"'
> >>>> );
> >>>> WHITESPACE
> >>>> :
> >>>> ( ' ' | '\t' | '\n' )+
> >>>> { skip(); }
> >>>> ;
> >>>>
> >>>> All works fine. Now I need to include unquoted strings with blanks.
> The
> >>>> problem is '0 ' (zero blank - without quotes of course). I cannot get
> >>>> the lexer to match this as an Integer as before. Basically I want a
> >>>>
> >> rule
> >>
> >>>> which says, if it is not something of the previous tokens, try if is
> an
> >>>> unquoted string. Of course an unquoted string may not have newlines.
> >>>> Any hints on how to archive this?
> >>>> I tried everything and ran several times into code too large
> exceptions
> >>>> because the actual grammar is much more complex (there are more
> >>>>
> >> unquoted
> >>
> >>>> values which are recognized by certain prefixed characters such as <
> 0x
> >>>> :: etc.).
> >>>>
> >>>> Thanks a bunch!
> >>>> Paul
> >>>>
> >>>>
> >>>>
> >>> Try to set the appropriate type later like it is done here:
> >>>
> >>>
> >>
> <http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs>
> >>
> >>> Johannes
> >>>
> >>>
> >
> >
>
>
> --
> Paul Bouché
> Voice: +49 30 590080-1284
>
> Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin,
> Germany
> Sitz der Gesellschaft: München / Registered office: Munich
> Registergericht: München / Commercial registry: Munich, HRA 88537
> WEEE-Reg.-Nr.: DE 52984304
>
> Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens
> Networks Management GmbH
> Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke
> Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri
> Kivinen
> Sitz der Gesellschaft: München / Registered office: Munich
> Registergericht: München / Commercial registry: Munich, HRB 163416
>
--
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
More information about the antlr-interest
mailing list