[antlr-interest] Trouble using backtracking (I think)

Ron Hunter-Duvar ron.hunter-duvar at oracle.com
Tue Apr 22 11:34:42 PDT 2008


Just a guess, but perhaps because the LL(*) look-ahead sees that it can 
resolve this, it never even tries backtracking. It may only backtrack 
when it encounters a situations where LL(*) fails. Hopefully someone 
more knowledgeable can jump in and settle this, though. I'm curious.

Ron


John Woods wrote:
> Thanks Ron,
>
> I see your point about preferring to place longer alternatives first 
> for performance reasons. And making the rearrangements you suggest 
> does make my parser work.
>
> But I'm curious if anyone could explain why rearranging the order of 
> alternatives would affect what input can be parsed when backtracking 
> is enabled. I understand that the order of alternatives will affect 
> which of several possible alternatives will be matched first. But in 
> the case where the first alternative doesn't match, and the second 
> does, I'm wondering why backtracking doesn't come back and try the 
> second alternative. Perhaps, as you suggest, there's a limit to how 
> much backtracking will back track.
>
>
> -----Original Message-----
> From: Ron Hunter-Duvar
> Sent: 04/22/2008 09:20 AM
>> I haven't worked with backtracking yet, but in general, your longer 
>> alternatives should come first. I think if you swap the order of the 
>> alternatives in the expression and lower_term rules it will work.
>>
>> I'm not sure how far back backtracking will go in order to try 
>> matching the input (backtracking all the way to the beginning and 
>> trying every possibility can lead to exponential parse times). In 
>> this case, expression successfully matches the first alternative, 
>> lower_term. Given this match, it's looking for the closing bracket 
>> back up in the input rule. I don't know at this point if it will then 
>> throw away the successful expression match and try the second 
>> alternative.
>>
>> Is there a good reason why you want to try the shorter alternative 
>> first, and backtrack if that ends up causing a downstream failure?
>>
>> Ron
>>
>>
>> John Woods wrote:
>>>
>>> I'm having trouble figuring out why my grammar isn't working as 
>>> expected, and would greatly appreciate any pointers on what I'm 
>>> misunderstanding.
>>>
>>> My generated parser behaves as if backtracking isn't working as I 
>>> would expect. And further, when I modify the grammar in a way that I 
>>> wouldn't expect to have an effect, backtracking appears to start 
>>> working.
>>>
>>> I've simplified the grammar as much as I could and still reproduce 
>>> the problem. And the simplified input I'm trying to parse is "(a = 
>>> b)" (without the quotes).
>>>
>>> Here's the test grammar:
>>>
>>>     grammar Test;
>>>
>>>     options {
>>>         language = Java;
>>>         output = AST;
>>>         backtrack = true;
>>>     }
>>>
>>>     input
>>>         : expression
>>>         | '(' input ')'
>>>         ;
>>>
>>>     expression
>>>         // Swapping the following two lines makes it
>>>         // work for input "(a = b)". But why won't it
>>>         // work with backtracking the way it is?
>>>         : lower_term
>>>         | term '=' term
>>>         ;
>>>
>>>     term
>>>         : lower_term
>>>         | UPPER
>>>         ;
>>>
>>>     lower_term
>>>         : LOWER
>>>         // Removing "term*" from the following line, or
>>>         // removing the line altogether makes it work.
>>>         // Why is that? Seems unrelated to given input.
>>>         | LOWER '(' term* ')'
>>>         ;
>>>
>>>     UPPER: 'A'..'Z';
>>>     LOWER: 'a'..'z';
>>>
>>>     WHITESPACE: (' '|'\t'|'\n'|'\r'|'\f') { skip(); };
>>>
>>> When I try to parse the input "(a = b)", I get the following error:
>>>
>>>     BR.recoverFromMismatchedToken
>>>     line 1:3 mismatched input '=' expecting ')'
>>>
>>> It appears that the parser attempts to parse the "expression" rule 
>>> using the "lower_term" alternative only, and when that fails, it 
>>> seems like backtracking isn't coming back and attempting the other 
>>> alternative of "term '=' term".
>>>
>>> I'm using ANTLR version 3.0.1 (August 13, 2007).
>>>
>>> Finally, here's the test driver:
>>>
>>>     import java.io.*;
>>>     import org.antlr.runtime.*;
>>>     import org.antlr.runtime.tree.*;
>>>
>>>     public class Test {
>>>         public static void main(String[] args) throws Exception {
>>>             String string = "(a = b)";
>>>             ByteArrayInputStream byteArrayInputStream =
>>>                 new ByteArrayInputStream(string.getBytes());
>>>             ANTLRInputStream antlrInputStream =
>>>                 new ANTLRInputStream(byteArrayInputStream);
>>>             TestLexer testLexer =
>>>                 new TestLexer(antlrInputStream);
>>>             CommonTokenStream commonTokenStream =
>>>                 new CommonTokenStream(testLexer);
>>>             TestParser testParser =
>>>                 new TestParser(commonTokenStream);
>>>             CommonTree commonTree =
>>>                 (CommonTree)testParser.input().getTree();
>>>             System.out.println(commonTree.toStringTree());
>>>         }
>>>     }
>>>
>>
>
>

-- 
Ron Hunter-Duvar | Software Developer V | 403-272-6580
Oracle Service Engineering
Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5

All opinions expressed here are mine, and do not necessarily represent
those of my employer.



More information about the antlr-interest mailing list