[antlr-interest] Honey Badger Theory

Sun Jan 22 23:16:15 PST 2012

Thanks Terence,

that gave me a little clue. I have already read your papers but there is 
none about honey badger, yet. A paper would be really interesting.

Another thing I noticed concerning speed:
(Maybe there already is an option to bypass this. If so, just tell me 
and ignore my suggestion :)

You create a field in the rule context class for each token which is 
explicitly named using the a=b synax. For large rules, this might result 
in a very large context object. Since we can do left recursion now, 
there might be VERY large rules like the expr rule of the R grammar you 
posted. Now imagine I name each sub-expression in the rule differently:

expr    :       '{' l=exprlist '}'
         |       '(' subExpr=expr ')'
         |       e=expr '[[' params=sublist ']' ']'
         |       scope=expr ('::'|':::') member=expr
         |       lexpr=expr ('$'|'@') rexpr=expr
...

For the full expr rule, this would result in a context with around 50 
fields to hold the named tokens which would result in a very large 
memory footprint and unnecessary loss of performance due to increased 
allocation overhead. I see that it is partly my fault, because I named 
the exprs differently. However, people often want meaningful names and 
if rules contain tokens of different types, then you even NEED different 
fields for them.

Suggestion:
Add an option which makes the named tokens only local variables in the 
parsing function of the rule (each scoped locally in a block for the 
alternative). You can't use them in listeners then, but for people which 
write their code directly into the parser this would be a very welcomed 
improvement. People which use listeners just don't use the option. Btw 
there should be an option to disable listeners all together. Is there one?

I want to use ANTLR to parse really large inputs and the left recursive 
rules are really really awsome but I think the large context objects 
might ruin my performance.

Regards,
Jan

Am 22.01.2012 20:58, schrieb Terence Parr:
> Hi Jan, honey badger's parsing strategy is and adaptive or incremental version of LL(*). The reason that v3 ANTLR needed to backtrack was that LL(*) grammar analysis is undecidable statically.  When it failed at static analysis, it failed over to backtracking at runtime. However, at runtime, I have an actual input stream that I can work with. This renders the algorithm deterministic and so I don't need to backtrack. In a nutshell, like GLR I pursue all possible paths from the decision point in a breadth first manner, almost as if I had forked multiple threads to pursue the possibilities. Because we pursue all possibilities at once, there is no backtracking. We move one token at a time seeing where it takes us in all possible alternatives. When only a single alternative is left, we know to predict that Alternative. We rewind the input and then take the appropriate path.
>
> LL(*) is O(n) for a given decision because in the worst case it might look scan until the end of the input. If we must make a decision at every token, that is an O(n^2) parsing strategy for n tokens.  That actually hides another complexity that generally does not appear. We are doing what amounts to a more complicated NFA to DFA conversion, which we know is exponential in complexity (in theory but not in practice). That means that a particular decision could hit a landmine at some point. I have seen one example of this. I have some interesting ideas for altering the algorithm so this does not occur.  I'll get to it.
>
> To learn more about the static analysis, you can go here:
>
> http://www.antlr.org/papers/LL-star-PLDI11.pdf
>
> I hope to do a paper on this adaptive LL(*) at some point.
>
> "It's pretty bad ass. It just doesn't give a shit." --honey badger
>
> Ter
> On Jan 22, 2012, at 2:34 AM, Jan Finis wrote:
>
>> Hi Terence,
>>
>> I am into parser generator theory, so I am wondering which concepts you
>> use to let Honey Badger "eat everything" (even left recursion) and never
>> backtrack. Could you tell me which concepts you use? I know I could just
>> check the code but I think it will be 1000 times faster if you explain
>> it to me and I think it will also be interesting for many others here.
>>
>> And does never backtrack mean that the parser will always stay linear
>> like a packrat parser?
>>
>> Best regards,
>> Jan Finis
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address