[antlr-interest] Reg LR-style expression parsing in wiki

Terence Parr parrt at cs.usfca.edu
Mon Feb 22 14:13:01 PST 2010


cool. good info. i had not considered bytecode approach; would make retargeting trivial.
Ter
On Feb 22, 2010, at 2:00 PM, Ron Burk wrote:

>> That is interesting...can the interpreter fetch/exec loop fit in L1 cache?  maybe...
> 
> Things always get more complicated by the time you have a working product,
> but I had these wild-ass guesses in my notes:
> 
>    Back of envelope calculation: 2KB for basic parser framework,
>    suppose average of 3 rules per non-terminal, average of 5
>    non-terminal lookups per non-terminal (each with an associated
>    2-byte "address" to "call"), average of 5 symbols
>    per rule, summing to 3 * (10*2 + 5*2) = 90 bytes per non-terminal.
>    Let's round that up to 100 bytes per non-terminal to allow for
>    action/reduce opcodes, tail recursion optimization opcodes, etc.
> 
>    Say about 60 non-terminals for ANSI C parser. That would come
>    to only 6KB+2KB=8KB! Leaves a fair amount of room for user actions,
>    parser stack, value stack, etc. especially if you're just constructing
>    a parse tree. Even double that to be more pessimistic and 16KB to
>    parse ANSI C sounds pretty small, leaving 3/4 of L1 cache on many
>    modern Intel processors (and of course, if the entire parser fits in
>    16KB, something less than 100% of it will be "hot" enough to be
>    regularly demanded in cache; e.g., "struct" tends to be needed
>    during the first portion of a C source file, then later maybe not
>    at all).
> 
> I do intend to make each opcode do as much as possible, and the fact
> that I can boil a large expression grammar down to just a few rules helps
> as well. For C code generation, for example, one opcode would scan
> the list of eligible next terminals (simple list scans instead of sparse table
> lookups or compressed table schemes). For Java code generation,
> it appears likely just as well to replace the list-scanning opcode with
> one switch statement per non-terminal, given the lack of static array
> initialization and the fact that two of the highest-level Java VM opcodes
> are devoted to switch statement support.
> 
>> i was thinking SLR(0) also instead of the prec climbing
>> thing for expressions. I'll think about this hard.
> 
> So far, I have not found it objectionable to require no null rules
> in the expression part of the grammar; you can always wrap
> that sub-grammar with another nonterminal that is nullable.
> I also have not found any popular language expression construct
> whose ambiguous SLR(0) automaton can't correctly be
> disambiguated by the classic operator precedence table algorithm,
> at least on my whiteboard. I guess the comma operator in C
> requires a little massaging because it's ambiguous with function
> parameter separation, but that's true in an LALR grammar for
> ANSI C as well.



More information about the antlr-interest mailing list