[antlr-interest] Strategy for mapping output to line numbers from a tree walker

Stanislav Sokorac sokorac at gmail.com
Fri Aug 21 13:47:32 PDT 2009


What is the best way to handle this problem when the children of a node are
coming from different CharStreams (include files, macros, what have you...),
and you could expect to have the first or last token be from another stream?

I.e. your tree is (PROGRAM STATEMENT STATEMENT STATEMENT), and your first
statement came from an include file, while the others are in the main file.
Retrieving the start token of the PROGRAM node will yield the first token of
the first STATEMENT, whose index will be the location in the include file,
and not the first token of the PROGRAM (which is most likely "#include").

It seems to me that the tree walker has no way of determining the location
of the first character in PROGRAM without us tracking the locations of char
stream switches during lexing, but that creates a special case to be checked
for every one of the nice and simple methods below. Is there a more elegant
solution available? Off the top of my head, I'm thinking that everything
would work out very nicely if the lexer could insert a "blank" token at the
beginning and the end of every included char stream, except that the extra
token would confuse the parser.

Stan

On Tue, Aug 18, 2009 at 3:57 PM, Jim Idle <jimi at temporal-wave.com> wrote:

> Joe Schmoe wrote:
> > I have a grammar that produces an AST and a tree walker, and I am
> > coming up blank trying to figure out a reasonable way to track line
> > number information in the tree walker so that I can map my output to
> > source file lines.
> >
> > The only solution I have come up with is incredibly wordy, which is to
> > make sure that every time I see a token name in the walker I track the
> > line number.
> >
> > rule : ^(TOKENNAME .... ) { TrackLine($TOKENNAME.line) }
> >       ;
> >
> > I am sure there are better solutions than this but I haven't been able
> > to figure out what they are.
> Search the list for code to do this.
>
> Basically, create yourself some helper methods that given a tree node or
> a token, work this out for you. Form a tree node, you can ask for the
> starting token and ending token and use those tokens to compute the span
> of the tree by asking for their start and end indexes (for error
> underlining and other similar purposes). The best thing is to create a
> little framework for handling errors, picking up information and so on -
> you can reuse it for many grammars.
>
> Here are a few methods you might find useful for stealing ideas from
> (they are just excerpts but show you the methods to use)
>
> Jim
>
>
>    protected int pos() {
>
>        return pos(input.LT(1));
>    }
>
>    /**
>     * Calculates the character position of the first character of the text
>     * in the input stream that the supplied token represents.
>     * @param tok The token to locate in the input stream
>     * @return The character position of the next non-whitespace token
> in the input stream
>     */
>    protected int pos(Token tok) {
>
>        return ((CommonToken)tok).getStartIndex();
>    }
>
>    /**
>     * Calculates the character position of the last character of the text
>     * in the input stream that the supplied token represents.
>     * @param tok The token to locate in the input stream
>     * @return The character position of the next non-whitespace token
> in the input stream
>     */
>    protected int endPos(Token tok) {
>
>        return ((CommonToken)tok).getStopIndex();
>    }
>
>  /**
>     * Log a particular message into the message log (typically from
> syntax errors and so on)
>     *
>     * @param msgDesc   The message we want to log
>     * @param line      The line that this message appertains to
>     * @param colPos    The column position in the line
>     * @param startPos  The start offset of the error
>     * @param endPos    The end offset of the error
>     * @param args      The arguments for formatting the message
>     */
>    public void
>    logMsg(MessageDescriptor msgDesc, int line, int colPos, int
> startPos, int endPos, Object... args)
>    {
>        // Instantiate the message
>        //
>        Message m = new Message(msgDesc, fileName, line, colPos,
> startPos, endPos, args);
>
>        // Store in the vector
>        //
>        messages.add(m);
>
>        // Count the severity
>        //
>        switch (msgDesc.getSeverity()) {
>
>            case ERROR:
>
>                errorCount++;
>                break;
>
>            case WARNING:
>
>                warningCount++;
>                break;
>
>            case FATAL:
>
>                errorCount++;
>                fatalCount++;
>                break;
>
>            // No default as Enum is type safe
>        }
>    }
>
>    /**
>     * Create a message regarding a single token, taking the start and
> end positions
>     * from the token.
>     *
>     * @param m    The type of message you want to create
>     * @param ct   The CommonToken that we are reporting against
>     * @param args The parameters for the message we are raising
>     */
>    public void logMsg(MessageDescriptor m, Token ct, Object... args) {
>
>        // Call the standard logger, using the information in the token
>        //
>        logMsg(m, ((CommonToken)ct).getLine(),
> ((CommonToken)ct).getCharPositionInLine(),
> ((CommonToken)ct).getStartIndex(), ((CommonToken)ct).getStopIndex(), args);
>
>    }
>
>    /**
>     * Create a message regarding a root node such as an expression,
> taking the start and end positions
>     * from the tree node.
>     *
>     * @param m    The type of message you want to create
>     * @param ct   The CommonToken that we are reporting against
>     * @param args The parameters for the message we are raising
>     */
>    public void logMsg(MessageDescriptor m, CommonTree ct, Object... args) {
>
>        CommonToken st;
>        CommonToken et;
>
>        st = (CommonToken)(tokens.get(ct.getTokenStartIndex()));
>        et = (CommonToken)(tokens.get(ct.getTokenStopIndex()));
>
>        // Call the standard logger, using the information in the tokens
>        //
>        logMsg(m, st.getLine(), st.getCharPositionInLine(),
> st.getStartIndex(), et.getStopIndex(), args);
>
>    }
>    /**
>     * Create a message regarding a single token, taking the start position
>     * from the token but the end position as specified
>     *
>     * @param m      The type of message you want to create
>     * @param ct     The CommonToken that we are reporting against
>     * @param endPos The absolute offset where the span of the error
> message should end
>     * @param args   The parameters for the message we are raising
>     */
>    public void logMsg(MessageDescriptor m, Token ct, int endPos,
> Object... args) {
>
>        // Call the standard logger, using the information in the token
>        //
>        logMsg(m, ((CommonToken)ct).getLine(),
> ((CommonToken)ct).getCharPositionInLine(),
> ((CommonToken)ct).getStartIndex(), endPos, args);
>
>    }
>
>    /**
>     * Create a message that appertains to a span of tokens, such as
> when the
>     * result of an expression is incorrect and so on.
>     *
>     * @param m         The type of message you want to create
>     * @param startCt   The CommonToken that starts the message
>     * @param stopCt    The commonToken that ends the message
>     * @param args      The parameters for the message we are raising
>     *
>     */
>    public void logMsg(MessageDescriptor m, Token startCt, Token stopCt,
> Object... args) {
>
>        // Call the standard logger, using the information in the token
>        //
>       logMsg(m, ((CommonToken)startCt).getLine(),
> ((CommonToken)startCt).getCharPositionInLine(),
> ((CommonToken)startCt).getStartIndex(),
> ((CommonToken)stopCt).getStopIndex(), args);
>    }
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090821/5f6b6307/attachment.html 


More information about the antlr-interest mailing list