[antlr-interest] Can I restart lexing from definite positionindocument?

Fri Apr 17 11:36:18 PDT 2009

On Sat, Apr 18, 2009 at 12:04 AM, Sam Harwell
<sharwell at pixelminegames.com> wrote:
> You should tokenize on a per-line basis. Never allow a token to span
> multiple lines, and never allow lookahead/back to cross a newline
> boundary. I've documented this process in my blog followed by a email on
> this list earlier this week with an improvement. Here's the original
> article:
> http://blog.280z28.org/archives/2008/10/21/
>
> Sam
>
Unless your editor framework makes lines significant or your language
has no multi-line tokens and no across-line lookeahead\back then I
don't see any reason to do incremental lexing on a per-line basis. For
a general treatment of incremental lexical analysis see
http://www.cs.berkeley.edu/Research/Projects/harmonia/papers/twagner-lexing.pdf
and other papers from the Harmonia project
(http://harmonia.cs.berkeley.edu/harmonia/index.html). The basic idea
is to track the lookahead used by each token and relex from the first
token that looked ahead into the changed region until the new token
stream resyncs witht the previous token stream and has no lookahead
into changed characters. The implementation is quite simple and
handles arbitrary lookahead\lookback without requiring any
restrictions on your grammar.
If your language has no multi-line tokens or your framework requires
it (as visual studio apparently does) then lexing on a line-by-line
basis is a reasonable choice, otherwise it probably is not be the best
method.

Tom.