[antlr-interest] Visual Studio syntax highlighting for an Antlrgrammar

Tue Dec 6 10:12:51 PST 2005

> Pete Gonzalez wrote:
> Currently I'm just focusing on syntax highlighting (which 
> uses the lexer 
> but not the parser).  The IDE text editor is optimized to prevent the 
> entire file from being rescanned whenever something changes.  
> The required 
> C# interface looks like this:
> 
>    void IScanner.SetSource(string source, int offset);
>    bool IScanner.ScanTokenAndProvideInfoAboutIt(TokenInfo tokenInfo,
>      ref int state);
> 
> The idea is that the editor passes a single line of text to 
> SetSource(), 
> and then calls ScanTokenAndProvideInfoAboutIt() repeatedly to 
> obtain the 
> colored tokens for that line.  In this situation, the only context 
> available to the lexer is a single "state" integer (which for 
> Babel stores 
> flex's "yy_start" global variable).  Unfortunately, since Antlr is a 
> recursive descent design, there isn't an obvious way to 
> restart the lexer 
> e.g. in the middle of a multiline comment.  Has anyone else 
> dealt with this 
> problem before?

An ANTLR v2 lexer's state is encapsulated in the LexerShareInputState(+ the
InputBuffer). You also need to track the Stream's state of course. Assuming
you can seek forwards and backwards at will in your stream, won't copying
the state object(s) and storing in an array give you a single state integer
(the array index)?.

Once can imagine methods int GetLexerState(), void RestoreLexerState(int)
and void DeleteLexerState(int)...

> Later, Pete Gonzalez also wrote:
> The impression I'm getting is that recursive descent lexers 
> are actually 
> inferior with the kind of optimizations required for 
> responsive syntax 
> highlighting.

Perhaps. Or it may just be that VSIP's design favours flex/bison-like tools.

> I looked at the code for another high-quality 
> text editor, 
> and they use hand-coded lexers for each language, with a 
> global integer 
> state just like Flex.

Is this an open-source editor?. Id like to see the code you refer to.

> Maybe Microsoft's interface is intentionally 
> encouraging this approach?  My kludge was to substitute a 
> flex-style lexer 
> in C#; it works great and was very easy to integrate.
> 
> It is aesthetically pleasing that Antlr's lexer and parser 
> share a common 
> algorithm.  However, this experience is building a case that Antlr's 
> approach is less versatile (and possibly slower?) when it 
> comes to the lexer.

ANTLR's lexer are generally currently slower in v2 than the Flex variant.
But they are more versatile (that's why they are slower). And they can be
developed (and debugged) much more quickly esp. for more complex lexing
scenarios (at least in my experience).

Cheers,

Micheal