[antlr-interest] Building syntax highlighters with ANTLR

Wed Apr 15 23:13:32 PDT 2009

In this example, I was trying to get the correct extent for a comment.
The part that makes it challenging in this application is the fact that
each line of source code is sent to the syntax highlighter individually,
so there's no way to look ahead past the end of the current line. The
highlighter expects the scanner state (user-controlled 32-bit int) to be
saved after parsing a line, where the state must contain all the
information needed to accurately start lexing the next line.

>From the syntax highlighter's perspective, a multi-line comment is never
a matched pair extent. Instead, it is comprised of the following three
independent parts, where each line is processed by its own unique
instance of the lexer class.

/* Part one, open of comment to the end of the line

 * Part two, zero or more lines that are in the comment but don't have
either a start or close delimiter

 * Part three, beginning of final line through the closing delimiter of
the comment. */

Sam

From: Gerald Rosenberg [mailto:gerald at certiv.net] 
Sent: Thursday, April 16, 2009 12:24 AM
To: Sam Harwell; antlr-interest at antlr.org
Subject: RE: [antlr-interest] Building syntax highlighters with ANTLR

Looked like you were trying to discern the true extent of a comment,
which is just a delimited block.  That is what a pair matcher does.  My
specific application is different, but the technique is the same.

At 10:06 PM 4/15/2009, Sam Harwell wrote:

It looks like we are working towards two very different goals. I'm not
trying to do any parsing, block structure analysis, pair matching, etc.
I'm just trying to color comments, identifiers, keywords, etc with
individual colors.

Sam

From: antlr-interest-bounces at antlr.org [
mailto:antlr-interest-bounces at antlr.org
<mailto:antlr-interest-bounces at antlr.org> ] On Behalf Of Gerald
Rosenberg
Sent: Wednesday, April 15, 2009 11:29 PM
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Building syntax highlighters with ANTLR

A better approach is to use a predicate.  That way you don't have to
intercept every lexer token and makes it considerably easier to handle
multiple pair sets.  

@lexer ::members {
  public boolean pairMatch(int limit) {
    return PairMatcherHelper.pairMatch(input, limit);
  }
  public boolean pairMatch(int limit, char beg, char end) {
    return PairMatcherHelper.pairMatch(input, limit, beg, end);
  }
}

BRACE_BLOCK :'{' { pairMatch(200) }? ;
BRACKET_BLOCK :'[' { pairMatch(50, '[', ']') }? ;

PairMatcherHelper#pairMatch then does full nested pair matching, subject
to certain limitations.  Does respect Antlr's backtracking semantics.

Note, the attached version is set up for just for single char
delimiters.

At 10:56 AM 4/15/2009, Sam Harwell wrote:

The new method uses a very different override of NextToken(). The outer
loop is largely a duplication of the functionality of Lexer.NextToken().
I've highlighted the key section that reliably manages the lexer state
information (yay HTML email).

public override IToken NextToken()
{
    for ( ; ; )

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090416/069aa546/attachment.html