[antlr-interest] Why is this nondeterministic?

Terence Parr parrt at jguru.com
Sun May 26 17:20:47 PDT 2002


On Sunday, May 26, 2002, at 05:04  PM, Brian Smith wrote:

> Terence,
>
> I'm a newbie and I don't understand your answer (and I am not the
> original poster). There is a "simplified" version of the problem below
> (fewer subrules) that has a similar (the same?) nondeterminism. There is
> no way that REGULAR_BIT could match 'a' 'b'. So, whenever I see an 'a'
> 'b' it must be a SPECIAL, right? In fact, that is what the generated
> code does.

In this particular case, it's a weakness of the linear approximate 
lookahead (i.e., it's weaker than your intuitive "this can't match ab" 
abilities).  See FAQ entitled:

"Can you explain more about ANTLR's tricky lexical lookahead issues 
related to seeing past the end of a token definition into the start of 
another?"

http://www.jguru.com/faq/view.jsp?EID=64316

> Also, if you take the closure (...)+ off of regular_bit then it works.

Correct, you've removed the loop which prevents 'b' from being in the 
2nd lookahead position at k=2, which keeps antlr from thinking 'b' can 
follow 'a' in that rule :)

Ter

> So, it must be a problem caused by strings longer than three characters,
> or a problem with the end-of-input case, AFAIK. Could somebody please
> demonstrate a string that is ambiguous with this lexer?
>
> Thanks,
> Brian
>
> class AbcLexer extends Lexer;
>
> options
> {
>      k = 2;
>      charVocabulary = 'a'..'c';
> }
>
> REGULAR:
>      (REGULAR_BIT)+;
>
> SPECIAL:
>      'a' 'b'
> ;
>
> protected
> REGULAR_BIT:
>      ('b')
> |   ('a' 'c')
> ;
>
> Terence Parr wrote:
>> On Sunday, May 26, 2002, at 03:53  PM, danfuzz wrote:
>>
>>
>>> The following is a very simplified version of a grammar I'm working
>>> on (attached below). When I try to compile it, I get the following
>>> warning:
>>>
>>> warning: lexical nondeterminism between rules REGULAR and SPECIAL upon
>>> /home/danfuzz/cvs/local/stuplate/com/milk/stuplate/abc.g:0: k==1:'a'
>>> /home/danfuzz/cvs/local/stuplate/com/milk/stuplate/abc.g:0: k==2:'b'
>>>
>>> I don't understand why it's nondeterministic, and so I'm not sure how
>>> to change it. Help would be appreciated. Thanks!
>>
>>
>> Hi Dan,
>>
>> The problem is that 'a' 'b' can start both REGULAR (via REGULAR_BIT) 
>> and
>> SPECIAL.  The lexer, given "ab" input would not know which token to
>> make.  This is an ambiguous grammar, which implies that it is also
>> nondeterministic.
>>
>> Regards,
>> Terence
>>
>>
>>> class AbcLexer extends Lexer;
>>>
>>> options
>>> {
>>>    k = 2;
>>>    charVocabulary = 'a'..'c';
>>> }
>>>
>>> REGULAR:
>>>    (REGULAR_BIT)+;
>>>
>>> SPECIAL:
>>>    'a' 'b'
>>> ;
>>>
>>> protected
>>> REGULAR_BIT:
>>>    ('b' | 'c')
>>> |   ('a' 'a')
>>> |   ('a' 'c')
>>> ;
>>>
>>>
>>>
>>>
>>> Your use of Yahoo! Groups is subject to
>>> http://docs.yahoo.com/info/terms/
>>>
>>>
>>
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
>>
>>
>>
>>
>> Your use of Yahoo! Groups is subject to 
>> http://docs.yahoo.com/info/terms/
>>
>
>
>
>
>
> Your use of Yahoo! Groups is subject to 
> http://docs.yahoo.com/info/terms/
>


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list