[antlr-interest] Lexer ambigiuoties
"Paul Bouché (NSN)"
paul.bouche at nsn.com
Wed Feb 18 05:27:12 PST 2009
Johannes Luber schrieb:
> The deeper problem lies in the fact that ANTLR uses an insufficent algorithm to sort out - for humans - non-ambiguous input in all cases correctly.
From the book I glean that LL(*) does cover all context free languages.
Those for humans non ambiguous but for computers ambiguous cases are
only non ambiguous to humans because they have context? Because a blank
or any other character for that matter may be interpreted as white space
in one case it shall be interpreted differently in another case. The
difference between those cases is context, i.e. what came before and
what the next k-ahead symbol is.
Or could you be more concrete by what you mean with "uses an insufficent
algorithm" - ah I just thought that the parser is LL(*) but the lexer
uses a cyclic DFA for prediction which may not cover all context free
languages and certainly not context-sensitive.
BR,
Paul
Paul
> Not sure if changing the algorithm would help here, too, but it would at least simplify the common cases. Unfortunately, it isn't clear when Ter does finally do a rewrite here.
>
> Johannes
>
>> Johannes Luber schrieb:
>>
>>> Paul Bouché (NSN) schrieb:
>>>
>>>
>>>> Hi,
>>>>
>>>> I have a lexer which already recognizes valid tokens of different
>>>>
>> types,
>>
>>>> e.g. an integer will generate an integer token, a quoted string a
>>>>
>> string
>>
>>>> token, an ip address and ipaddress token etc.
>>>> E.g:
>>>>
>>>> property : key '=' value;
>>>> key : Name;
>>>> value : Integer | String | Ipaddress;
>>>> Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+
>>>> Integer : ('+'|'-')? ('0'..'9')+;
>>>> Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+
>>>> // simplified, actual grammar is correct max of three digits
>>>> String : ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\''
>>>> | '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"'
>>>> );
>>>> WHITESPACE
>>>> :
>>>> ( ' ' | '\t' | '\n' )+
>>>> { skip(); }
>>>> ;
>>>>
>>>> All works fine. Now I need to include unquoted strings with blanks. The
>>>> problem is '0 ' (zero blank - without quotes of course). I cannot get
>>>> the lexer to match this as an Integer as before. Basically I want a
>>>>
>> rule
>>
>>>> which says, if it is not something of the previous tokens, try if is an
>>>> unquoted string. Of course an unquoted string may not have newlines.
>>>> Any hints on how to archive this?
>>>> I tried everything and ran several times into code too large exceptions
>>>> because the actual grammar is much more complex (there are more
>>>>
>> unquoted
>>
>>>> values which are recognized by certain prefixed characters such as < 0x
>>>> :: etc.).
>>>>
>>>> Thanks a bunch!
>>>> Paul
>>>>
>>>>
>>>>
>>> Try to set the appropriate type later like it is done here:
>>>
>>>
>> <http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs>
>>
>>> Johannes
>>>
>>>
>
>
--
Paul Bouché
Voice: +49 30 590080-1284
Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin, Germany
Sitz der Gesellschaft: München / Registered office: Munich
Registergericht: München / Commercial registry: Munich, HRA 88537
WEEE-Reg.-Nr.: DE 52984304
Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens Networks Management GmbH
Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke
Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri Kivinen
Sitz der Gesellschaft: München / Registered office: Munich
Registergericht: München / Commercial registry: Munich, HRB 163416
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090218/d35449cf/attachment.html
More information about the antlr-interest
mailing list