[antlr-interest] Has anybody ever tried to integrate with VS?

Wed Apr 5 06:29:20 PDT 2006

Patrick:

I've built a language package for VS that uses Antlr.  It is a commercial
product so I can't share the code, but I can give you a few tips.

You'll need a lexer for syntax highlighting.  There's no need for a parser,
the language service just needs a sequence of tokens with certain
information such as the starting column and length, the text type (i.e.
text, comment, etc.), and some other things.

The lexer is called once for each line of source code, so it should be as
simple and efficient as possible (e.g. don't use semantic predicates), and
you should reuse the same lexer instance for the entire time your package is
loaded.   You don't want to create a new instance of the lexer each time it
is needed, the initialization costs are very high (especially the init of
the tokens table) and it will slow down typing and scrolling in the editor
if it isn't very efficient.

I use a static stringstream object to feed the lexer and just reinitialize
it each time the lexer is called.  In addition, you need to call
lex->setColumn(0) and lex->getInputBuffer().reset() to "reset" the lexer
each time you need to parse a new line.  You don't need to worry about the
line counter, since you will never be parsing more than one line at a time.

You also do not want to ignore comments in this lexer (since they must be
colorized too) and you must maintain state for multiline comments and you
cannot assume that when you encounter the beginning of a ML-comment you will
find the end.  The end might be on a different line and you only get to lex
one line at a time, so you must return a state code to the language service,
which it will then return to you when it asks you to tokenize the next line.
You must structure your lexer to be able to be entered in a state where you
are within a multiline comment.

If your language has any other constructs that can span multiple lines (like
literal strings in C/C++) then you must maintain state for that as well.
You cannot assume that you will be lexing each line in sequence, or if you
will be lexing any particular line at all.  The language service maintains
your returned state code for each line that has been colorized, and you will
always be given the state code for the preceding line when you are asked to
lex another line.

To get tokens without using a parser, do something like this:

   RefToken tok = lex->nextToken();
   while ( tok->getType() != Token::EOF_TYPE )
   {
      // ...
      tok = lex->nextToken();
   }

Next, you will need another lexer and a parser to support code collapsing,
intellisense and various other things that the VS editor supports.  This is
more like a traditional parser, where you want to ignore comments and
whitespace.  You will be asked to parse all, or part of the source file.
Rules in your parser will have to return the proper information to the
language service, depending on the reason for the parse.  There are numerous
reasons for invoking a full or partial parse (code collapsing, intellisense,
etc.).  

If your parser is efficient enough, it is easier to parse the entire source
file each time, but the language service will also give you the coordinates
of only the text that's changed.  You can maintain state information and
only parse the changed source or parse the entire source file each time.
Although colorizing and intellisense operations are generally called on
background threads, you want them to be as efficient as possible so there is
no noticeable delay to the end user.

This lexer and parser generally does not need to be as thorough as those
used in a compiler.  You only need to parse things to a point that you can
supply the needed information back to the language service.  So, you don't
really need to be concerned with things like operator precedence and things
of that nature that are only significant if you are actually going to
generate code.  Again, you want to make this as efficient as possible so
there is no noticeable delay to the end user.

Some language services like C# actually attempt to do a background compile
(minus codegen) while you type, so that syntax and semantic errors can be
displayed in real time in the error list pane.  That requires a more through
parser that does semantic analysis, type resolution and so forth, and you
may want to generate and persist an AST if you attempt this.  Some of C#'s
advanced intellisense features also require this level of parsing.  

Writing a Visual Studio language (or project) package is not a trivial thing
to undertake, especially given the quality of the sample code and
documentation that's provided, so be prepared for a bit of work.

Don

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org 
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of P. van 
> der Velde
> Sent: Wednesday, April 05, 2006 1:23 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Has anybody ever tried to integrate with VS?
> 
> Hi All
> 
> I want to build a new language integration tool for VS2005. I 
> want to integrate Latex into VS (I'm lazy, I'm spoiled and I 
> need my code complete thingies ;-). However to do that I need 
> a parser and a lexer. 
> The documents assume you use Flex and Bison, however I was 
> thinking about Antlr. So now my questions are:
> 
> 1) Has anybody ever build a language package for VS with antlr
> 2) Has anybody ever created a LaTex grammar?
> 3) Has anybody ever tried to create those two things.
> 
> Also if anybody has any hints or tips I would love to hear those
> 
> Regards
> 
> Patrick