[antlr-interest] [antlr-dev] Syntax highlighting and performance possibilities

Fri May 22 16:39:44 PDT 2009

Hi George,

Thanks for this feedback. The method I was describing is a form of
incremental lexing, but is quite different from the one you referenced.
I'll be looking to see if I can combine each of their strengths as I
work. :)

Common features:

*         Both methods are incremental. Visual Studio's (VS) incremental
lexing restarts at the beginning of the line, and General Incremental
Lexing (GIL) restarts at the first affected token.

*         Both methods stop/suspend/defer the incremental updating
process when the last on-screen token is processed.

Strengths of GIL over VS:

*         Allows lookahead past newlines.

*         Doesn't have to restart at the beginning of a line.

*         Allows true multiline tokens.

Strengths of VS over GIL:

*         Able to incrementally parse recursive constructs, such as
languages that allow nested /* */ block comments.

*         Smaller lower-bound on processing requirements.

*         Much smaller memory overhead.

If this is correct, then from what I can tell it would be beneficial to
use the method I described as long as you don't have very long lines of
text. Also, the SlimToken is actually lighter than a flyweight token,
but again it can only be used as long as you don't need more information
than it's able to store.

Sam

From: George Scott [mailto:george.scott at gmail.com] 
Sent: Friday, May 22, 2009 4:09 PM
To: Sam Harwell
Cc: antlr-interest at antlr.org; antlr-dev at antlr.org
Subject: Re: [antlr-dev] Syntax highlighting and performance
possibilities

Sam,

Have you looked at Incremental lexing?  I think it provides very good
performance and used by a number of IDEs.  A great reference on
incremental lexing is this paper:

http://harmonia.cs.berkeley.edu/papers/twagner-lexing.pdf

To reduce memory you can use flyweight tokens (one token instance shared
by all token streams) for token types whose length does not vary.  You
can use this for keywords, common white-space patterns such as a
single-space, etc.  The trade-off is that you have to compute the
start/stop indexes for tokens based on a nearby non-flyweight token and
the known-length of the flyweight.  Generally, not a problem since
syntax highlighting finds a start token given a line number and walks
forward in token order, so you can keep a running count.

When using incremental lexing with syntax highlighting, you generally
only have to re-lex from the point of the edit to the token containing
the last visible character on screen, so there is not a large cost even
if editing at the beginning of the file.  As the user scrolls the
document, you continue lexing from the last token.

It is pretty straight-forward to modify the ANTLR runtime to use these
techniques.

George

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090522/c3e373d3/attachment.html