[antlr-interest] Visual Studio syntax highlighting for an Antlrgrammar

Mon Dec 5 10:47:10 PST 2005

Don Caton wrote:
> I'm in the process of doing just that.  You have to take a slightly
> different approach to your lexer.  Normally, you lex comments as a single
> token which is ultimately discarded (e.g. $setType( Token::Skip )).
> 
> In a syntax highlighting parser, you want to parse the comment begin and end
> markers separately and don't discard them.  Once you've seen a begin comment
> token you need to remember that, which you can do by using the 'state'
> parameter to ScanTokenAndProvideInfoAboutIt().  Once you're in a comment
> "state", force the color for each successive token to be the comment color
> until you see an ending comment token.

It sounds like your approach is to manually handle each problem with custom 
modifications.  I think this might be difficult in the general case of 
multiline strings, XML comments, or nested languages (e.g. we have islands 
of SQL expressions in one of our grammars).  The end result might be a less 
readable grammar, and a fair amount of work compared to the fully-automatic 
scenario with Flex.

The impression I'm getting is that recursive descent lexers are actually 
inferior with the kind of optimizations required for responsive syntax 
highlighting.  I looked at the code for another high-quality text editor, 
and they use hand-coded lexers for each language, with a global integer 
state just like Flex.  Maybe Microsoft's interface is intentionally 
encouraging this approach?  My kludge was to substitute a flex-style lexer 
in C#; it works great and was very easy to integrate.

It is aesthetically pleasing that Antlr's lexer and parser share a common 
algorithm.  However, this experience is building a case that Antlr's 
approach is less versatile (and possibly slower?) when it comes to the lexer.

> I'm still working on this, but it seems to work ok.  I briefly considered
> using the Babel interface but it's not well documented, the quality of the
> sample code leaves something to be desired, and it seems to have fewer
> capabilities than the managed language service interfaces.  And I really
> didn't want to spend the time learning flex/bison when I already have an
> Antlr grammar for my language.

  Also, the managed interface is an actual supported API, whereas Babel is 
an MFC-style "framework" of cut+paste code fragments.

Cheers,
-Pete