[antlr-interest] Building syntax highlighters with ANTLR

Wed Apr 15 10:56:43 PDT 2009

Here's a short summary of the process implementing improved syntax
highlighter support. This doesn't address stacked lexers, which are an
extended version of the new method presented here. The key section that
changed is labeled "Multi-line tokens: block comments" on my blog
article: http://blog.280z28.org/archives/2008/10/21/

Old Method

My blog article is a good reference for this method. The old method
relies on a token to mark the end of a multi-line token, such as the
END_BLOCK_COMMENT='*/'; token used on my blog. The problem with this
method is the difficulty of keeping track of state for intermediate
tokens, and source code like this will break it unless you start using
"really sneaky methods":

/*

 * " my code */ char x = '"';

New Method

Instead of creating an END_BLOCK_COMMENT token, create the COMMENT rule
as follows. Compared to my blog notes, you'll get rid of the
END_BLOCK_COMMENT token but you'll keep the ANYCHAR token (it's used for
something other than just a helper to the continued comment).

COMMENT

        :       '/*' {InBlockComment=true;}

                CONTINUE_COMMENT

        ;

fragment

CONTINUE_COMMENT

        :       (       ('*' ~'/') => '*'

                |       ~'*'

                )*

                (       '*/' {InBlockComment=false;}

                |       /* allowed to skip in colorizer */

                )

        ;

The new method uses a very different override of NextToken(). The outer
loop is largely a duplication of the functionality of Lexer.NextToken().
I've highlighted the key section that reliably manages the lexer state
information (yay HTML email).

public override IToken NextToken()

{

    for ( ; ; )

    {

        if ( input.LA( 1 ) == (int)CharStreamConstants.EOF )

        {

            return Token.EOF_TOKEN;

        }

        state.token = null;

        state.channel = Antlr.Runtime.Token.DEFAULT_CHANNEL;

        state.tokenStartCharIndex = input.Index();

        state.tokenStartCharPositionInLine = input.CharPositionInLine;

        state.tokenStartLine = input.Line;

        state.text = null;

        try

        {

            if ( InBlockComment )

            {

                mCONTINUE_COMMENT();

                if ( state.token == null )

                    Emit();

                IToken next = state.token;

                next.Type = COMMENT;

                return next;

            }

            else

            {

                IToken next = base.NextToken();

                return next;

            }

        }

        catch ( NoViableAltException nva )

        {

            ReportError( nva );

            Recover( nva ); // throw out current char and try again

        }

        catch ( RecognitionException re )

        {

            ReportError( re );

            // match() routine has already called recover()

        }

    }

}

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Sam Harwell
Sent: Thursday, April 09, 2009 3:38 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Building syntax highlighters with ANTLR

I've made a few posts about this in the past, and it looks like another
one is on the way. I designed a new, much easier, robust, and general
way to make a syntax highlighter from a grammar, and it even allows
clean "stacking" of lexers. As a quick example, my primary grammar
recognizes a block comment, but it doesn't recognize Doxygen syntax
within that comment. By stacking a Doxygen-style fragment lexer in the
colorizer, and telling it to post-process any token of primary type
"DOC_COMMENT", I ended up with this.

I'd give the gory details now, but it's 3:30am - just had to share
because figuring this out is what kept me up! I can't wait to see
multi-level lexer stacking for my StringTemplate highlighter. J

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090415/0a33d8ba/attachment.html