[antlr-interest] Strategy for mapping output to line numbers from a tree walker

Fri Aug 21 18:06:12 PDT 2009

Thanks for the replies, guys... I think I wasn't clear enough when I
described the situation, though. Certainly, you want to track where
individual tokens came from, which ANTLR does for you well as long as you
set the name for each of your char streams. And, that works great. My
question is, what is a good way to handle the situation where tokens *within
the same node* come from different char streams?

Here's a more concrete example. Say you run into this line in your C-like
language:

if (VALUE + a > 0) { echo "hi"; }

where 'VALUE' is a macro that's defined in an include file. Your lexer
substituted VALUE with the defined value (say '1.0'), and marked the char
stream appropriately. Now, your tree walker comes upon 1.0+a, and say your
language doesn't allow additional of reals and integers, so you want to
mark/underline the expression "VALUE + a" and say "No adding of reals and
integers".

So, your tree node looks like this: (+ 1.0 a). If you ask ANTLR to give you
the starting token for this tree node, it will correctly give you the "1.0"
token. If you look inside it, you'll find that it came from an include file,
at starting character 10, for example. Then you get the end token, which is
'a', and it came from 'main.c' ending at character 540.

Now what, how do you underline 'VALUE + a'? I.e. how do you figure out the
starting and ending character of your expression in 'main.c'? The user
doesn't want to see the VALUE definition in another file underlined as
there's nothing wrong with the line of code.

A similar problem occurs if you have a list of statements, and the first (or
last) few came from an include file.. if you wanted to show the proper range
in the original file, you can't determine the location of the 'include'
statement by only examining your "list of statements" tree node and the
tokens in it.

I hope what I'm describing makes sense.

Thanks,
Stan

On Fri, Aug 21, 2009 at 5:19 PM, Gavin Lambert <antlr at mirality.co.nz> wrote:

> At 08:47 22/08/2009, Stanislav Sokorac wrote:
>
>> What is the best way to handle this problem when the children of a node
>> are coming from different CharStreams (include files, macros, what have
>> you...), and you could expect to have the first or last token be from
>> another stream?
>>
> [...]
>
>> It seems to me that the tree walker has no way of determining the location
>> of the first character in PROGRAM without us tracking the locations of char
>> stream switches during lexing, but that creates a special case to be checked
>> for every one of the nice and simple methods below. Is there a more elegant
>> solution available?
>>
>
> You're going to want to track that information anyway, in order to provide
> good error reporting (people will want to know that an error is on line 16
> of include file B, not on line 152 of the input after preprocessing).
>
> ANTLR already includes a line number with each token, and the filename is
> in the char stream.  You just need to link the token with the char stream it
> came from (which might even already be done; I haven't checked) and you
> should be good to go.
>
>
> If you don't care about the error reporting, then the simplest thing to do
> is to run a complete preprocessing pass to merge all the include files
> before starting the "real" lex/parse/treewalk.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090821/dc8b8162/attachment.html