[antlr-interest] Saving Lexer State?

Thu Sep 27 11:57:00 PDT 2007

On 9/27/07, Tom Davies <tom at atlassian.com> wrote:
> I'm trying to use an ANTLR 2.x lexer fir syntax highlighting in
> Intellij IDEA.
>
> The lexer interface it assumes can save the state of the lexer (which
> is just a single int) and later restore it. This allows incremental
> re-lexing as you type.
>
> Is there any way to do that with an ANTLR lexer? Obviously I can
> record the input string leading up to the state and simply re-lex it,
> but I'm wondering if there's a cheaper approach?
I have implemented incremental lexing on top of ANTLR3 and the method
would apply to ANTLR 2.x. The algorithm I used is from Tim Wagner of
the Harmonia project
(http://www.cs.berkeley.edu/~harmonia/harmonia/index.html). It is
detailed in General Incremental Lexical Analysis, available from the
Harmonia site. It is not especially complicated to implement once you
get your head around it.
It involves tracking the lookahead used by each token so that when the
document is altered the set of tokens that relied on characters that
were changed can be found and re-lexed. Any lexer state (variables
used in predicates) must also be tracked for each token.
I'm not familiar with IDEA APIs so not sure how easy it would be to
plug into it. It was easy enough to hook into eclipse. Sounds like the
IDEA APIs you mention are at a higher level, not sure how they do the
invalidating and re-syncing of the token stream so they may or may not
be appropriate. This will also depend on how complex your lexer is. If
there are lower level APIs then you may be able to use those, you just
need to be able to get the extents of the document change and provide
a token region that was invalidated.

Tom.
>
> Thanks,
>    Tom
>