[antlr-interest] Re: Syntactic predicates question

Mon Jan 30 14:36:50 PST 2006

Hmm, I'm really confused by the behavior then. "A12345" definitely doesn't 
match rule 'A' so (1) should fail and not consume the first character of 
the string. Shouldn't ANTLR examine at least k characters (in my case 
k=2, so it should be looking at 'A' and '1') from input stream before 
making a decision about which token matched? The generated code for 
matching 'A' in lexer is as follows:

if ((LA(1)=='A') && (true)) {
   match('A');
}

Shouldn't it be something similar to the following?

if ((LA(1)=='A') && (LA(2)==END_OF_TOKEN) {
   match('A');
}

I'm trying to use syntactic predicates for parsing a language with 
keywords that may be part of identifiers (e.g. keyword "Action", 
identifier "Action/*/123"). Is there a better approach than syntactic 
predicates to attack this scenario?

Thank you again for your help.

Sincerely,
Artem Dmytrenko

On Mon, 30 Jan 2006, Xue Yong Zhi wrote:

>
>
> Artem Dmytrenko wrote:
>
>> 
>> line 1:94: expecting ID, found 'A'
>> 
>> It appears that the match is stuck in the middle - e.g. ActionToken rule 
>> rejected the string but ID did not match it. Is that the expected behavior 
>> for syntactic predicates? Are there any workarounds for this problem?
>> 
>
> Your parser is thinking this way when parsing "A12345":
>
> 1. Try ActionToken, and match the first 'A'.
> 2. Try ActionToken again with the rest of the input "123456", do not match.
> 3. Then try ID, still no match.
> 4. Give you the warning.
>
> Most of the time Antlr does not follow "the longest one that matches wins" 
> rules.
>
> -- 
> Xue Yong Zhi
> http://seclib.blogspot.com
>
>