[antlr-interest] Another parsing question

Tue Aug 5 22:17:25 PDT 2008

RE: a lazy token evaluator, Ter wrote:

> the infinite lookahead makes comm between the two impossible.

> It will simply consume less memory and allow interactive parsers.

Backtracking would certainly be slower, but isn't there *some* method that
would allow both infinite lookahead *and* lexer states, in theory?

What do you think about keeping multiple lookahead buffers, one per state?

1.  The position in the input stream is saved before every rule, in case the
parser sets a state (using a provided API, so that the lexer will be able to
handle the housekeeping)

2.  The parser asks the lexer for the next token (on demand -- if it needs
more while looking at that rule, it asks for more, up to the rest of the
tokens in the file, for an LL(*) parse)

3.  These tokens are buffered, just as they are now, so they can be reused
for backtracking, but each lexer state has its own buffer of lookahead
tokens.

4.  If a parser rule alters the lexer state, housekeeping is performed to
switch to that state's buffer and reset the input stream's current position
appropriately

5.  Whenever a rule is chosen, lookahead buffers for all other lexer states
are invalidated.

(This is required because tokens in different states could easily overlap.
For example, in one state, an entire line, or the contents of an entire
block could be returned as a single token.)

6.  Whenever a rule that changed state is exited, the previous state is
restored, if int Lexer.pushState(int) was called, instead of
Lexer.setState(int), which would leave the state unchanged, until called
again.

Brent

> -----Original Message-----

> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-

> bounces at antlr.org] On Behalf Of Terence Parr

> Sent: Tuesday, August 05, 2008 2:53 PM

> To: Randall R Schulz

> Cc: antlr-interest at antlr.org

> Subject: Re: [antlr-interest] Another parsing question

> 

> 

> On Aug 5, 2008, at 2:35 PM, Randall R Schulz wrote:

> 

> > On Tuesday 05 August 2008 14:22, Terence Parr wrote:

> >> On Aug 5, 2008, at 12:44 PM, Foust wrote:

> >>>> Lexing is all done up front with no input from the parser at all.

> >>>

> >>> Why was that done? Is it the price you pay for infinite lookahead?

> >>

> >> Speed of access for ith char / token.  Will build a lazy version. :)

> >

> > That's probably a good idea, but will you preserve the current

> > semantics, making the lazy version simply a performance variant, or

> > will you allow the parser to influence the behavior of the lexer, as

> > so

> > many people initially believe to be possible when the come to ANTLR?

> 

> Nope...the infinite lookahead makes comm between the two impossible.

> It will simply consume less memory and allow interactive parsers.

> 

> Ter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080805/257afe4a/attachment-0001.html