[antlr-interest] Honey Badger Theory

Sun Jan 22 11:58:09 PST 2012

Hi Jan, honey badger's parsing strategy is and adaptive or incremental version of LL(*). The reason that v3 ANTLR needed to backtrack was that LL(*) grammar analysis is undecidable statically.  When it failed at static analysis, it failed over to backtracking at runtime. However, at runtime, I have an actual input stream that I can work with. This renders the algorithm deterministic and so I don't need to backtrack. In a nutshell, like GLR I pursue all possible paths from the decision point in a breadth first manner, almost as if I had forked multiple threads to pursue the possibilities. Because we pursue all possibilities at once, there is no backtracking. We move one token at a time seeing where it takes us in all possible alternatives. When only a single alternative is left, we know to predict that Alternative. We rewind the input and then take the appropriate path.

LL(*) is O(n) for a given decision because in the worst case it might look scan until the end of the input. If we must make a decision at every token, that is an O(n^2) parsing strategy for n tokens.  That actually hides another complexity that generally does not appear. We are doing what amounts to a more complicated NFA to DFA conversion, which we know is exponential in complexity (in theory but not in practice). That means that a particular decision could hit a landmine at some point. I have seen one example of this. I have some interesting ideas for altering the algorithm so this does not occur.  I'll get to it.

To learn more about the static analysis, you can go here:

http://www.antlr.org/papers/LL-star-PLDI11.pdf

I hope to do a paper on this adaptive LL(*) at some point.

"It's pretty bad ass. It just doesn't give a shit." --honey badger

Ter
On Jan 22, 2012, at 2:34 AM, Jan Finis wrote:

> Hi Terence,
> 
> I am into parser generator theory, so I am wondering which concepts you 
> use to let Honey Badger "eat everything" (even left recursion) and never 
> backtrack. Could you tell me which concepts you use? I know I could just 
> check the code but I think it will be 1000 times faster if you explain 
> it to me and I think it will also be interesting for many others here.
> 
> And does never backtrack mean that the parser will always stay linear 
> like a packrat parser?
> 
> Best regards,
> Jan Finis
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address