[antlr-interest] Extracting autocompletion rules from antlr grammar

Sam Harwell sharwell at pixelminegames.com
Fri Nov 14 11:12:26 PST 2008


ANTLR itself is not the correct method for implementing autocompletion
features. Our IntelliSense engine (our support for Visual Studio's
autocompletion & code navigation features) does involve ANTLR, but maybe
not in the way you expect.

Autocomplete is not a syntactical analysis process; it is a semantic
analysis process. For example, public and protected members of base
classes are available in derived classes in C++. There is no way for
ANTLR to understand this type of dependence. I'll start with some
reasons why you shouldn't even attempt to use ANTLR for the full
autocompletion implementation. Then I'll cover the various tasks that
ANTLR is slated for in the process.

1. Performance. In order to gather context information for a file, ANTLR
processes a file-at-a-time. When providing editor features such as
autocomplete, you have 20ms to process and provide the correct
information in a list. If you don't address this, you'll end up with a
product like Eclipse, and no one wants that. Even if you have a fast
grammar (LL(k) for low k, few or no synpreds or sempreds, no
backtracking), large files will take more than that amount of time to
process.
2. Robustness. As users are typing, they constantly introduce syntax
errors. These errors throw wrenches in the tracks of ANTLR's lookahead
decision making process. Incomplete statements are extremely likely to
cause ANTLR to either stop processing in the context of these errors, or
provide misleading context information if it made bad decisions. Since
the users watching this process are unaware of what ANTLR is trying to
do behind the scenes, the only conclusion they can draw from the
misleading information is "this product has no idea what is going on?!"

The largest task ANTLR *is* appropriate for is gathering source
information for an entire project. We take the parse results from all
the files in the project and build an IntelliSense-appropriate cache
that is directly used for later lookups in auto-completion or code
navigation. IntelliSense engines have many unique demands on the cached
information. One example I have is for C++: cached classes cannot point
directly to the class information of a base class, but instead must do
so via a unique ID derived from the name of the base class. That way, if
the programmer changes the base class and that class is reparsed, the
derived class is automatically able to use the new information for
auto-complete.

Another task we use ANTLR for is expression parsing. We have a small
ANTLR rule that builds a tree for a single postfix-expression in source
code, and is specially designed handle malformed expressions (they're
really just incomplete expressions but ANTLR doesn't know that). The
expression parser was easy to prevent ambiguities because we strictly
limited it to postfix operations. The operators in that parse tree are
replaced with functors that access the cache, and the end result is an
enumeration of the desired auto-complete items. If the user requested a
"goto declaration" operation, the functors result in a location in
source code. Here's an example of these operations:

User enters: myVariable.
ANTLR tree: (MEMBER myVariable ?)
Operation autocomplete: (get-members (resolve myVariable (source-context
(cursor-position))))
Operation goto-declaration: (resolve ? (get-members (resolve myVariable
(source-context (cursor-position)))))

User enters: myVariable.f().yourVariable
ANTLR tree: (MEMBER (CALL (MEMBER myVariable f) ()) yourVariable)
Operation goto-declaration: (resolve yourVariable (get-members
(return-value f () (resolve myVariable (source-context
(cursor-position))))))

Sam

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Bo Ba
Sent: Friday, November 14, 2008 11:07 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Extracting autocompletion rules from antlr
grammar

Hello,

Since autocompletion is not yet supported by ANTLR
(http://groups.google.com/group/il-antlr-interest/browse_thread/thread/7
c6d05a35046854d), I am thinking of implementing this simple reverse
engineering approach.

1. Take a programm, which is complete in terms of the grammar.
2. Parse it in antlr parser.
3. During parsing of this "complete program", every current token type
will get a set of token types, which can follow this current token.
4. As user types his program, the latter will be parsed on every key
stroke (incrementally or not), and a current token will be extracted,
which then will be used to get a corresponding set of possible
predictions.
5. Optionally, the predictions will be filtered by their semantic
validity, before presented to the user.

Am I overlooking something obvious feature of ANTLR, which could
simplify this procedure? Specifically, are items 1 to 3 above the best
way to obtain the raw predictions for every token type?

Thanks.

V

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-addr
ess



More information about the antlr-interest mailing list