[antlr-interest] Visual Studio syntax highlighting for
an Antlrgrammar
Pete Gonzalez
pgonzalez at bluel.com
Mon Dec 5 10:47:10 PST 2005
Don Caton wrote:
> I'm in the process of doing just that. You have to take a slightly
> different approach to your lexer. Normally, you lex comments as a single
> token which is ultimately discarded (e.g. $setType( Token::Skip )).
>
> In a syntax highlighting parser, you want to parse the comment begin and end
> markers separately and don't discard them. Once you've seen a begin comment
> token you need to remember that, which you can do by using the 'state'
> parameter to ScanTokenAndProvideInfoAboutIt(). Once you're in a comment
> "state", force the color for each successive token to be the comment color
> until you see an ending comment token.
It sounds like your approach is to manually handle each problem with custom
modifications. I think this might be difficult in the general case of
multiline strings, XML comments, or nested languages (e.g. we have islands
of SQL expressions in one of our grammars). The end result might be a less
readable grammar, and a fair amount of work compared to the fully-automatic
scenario with Flex.
The impression I'm getting is that recursive descent lexers are actually
inferior with the kind of optimizations required for responsive syntax
highlighting. I looked at the code for another high-quality text editor,
and they use hand-coded lexers for each language, with a global integer
state just like Flex. Maybe Microsoft's interface is intentionally
encouraging this approach? My kludge was to substitute a flex-style lexer
in C#; it works great and was very easy to integrate.
It is aesthetically pleasing that Antlr's lexer and parser share a common
algorithm. However, this experience is building a case that Antlr's
approach is less versatile (and possibly slower?) when it comes to the lexer.
> I'm still working on this, but it seems to work ok. I briefly considered
> using the Babel interface but it's not well documented, the quality of the
> sample code leaves something to be desired, and it seems to have fewer
> capabilities than the managed language service interfaces. And I really
> didn't want to spend the time learning flex/bison when I already have an
> Antlr grammar for my language.
Also, the managed interface is an actual supported API, whereas Babel is
an MFC-style "framework" of cut+paste code fragments.
Cheers,
-Pete
More information about the antlr-interest
mailing list