[antlr-interest] Strategy for mapping output to line numbers from a tree walker

Mon Aug 24 08:04:18 PDT 2009

Thanks for the tips, Gavin. I ended up doing something along the lines of
your "each token keeps a list of includes" idea, with a slight modification.
I did the following:

- each token in the "main" file stays as is, just a plain CommonToken
- each token included from another stream has two additional fields which
contain the first and the last token of the statement that caused the stream
switch (i.e. an include statement, a macro being used, etc.). This is used
for any nested includes, as well.

This way, I can follow a chain of includes up to any level I want, but
without having to keep a full list for each token... in most cases, I'll
want to go up the chain until the node root, the start token, and the end
token are all at the same level (i.e. same char stream), and then I can
underline the node in one file. Of course, I can also choose to go up until
I reach a CommonToken for all 3 and locate the originating include/macro
statement, if needed. ("No adding of reals and integers on line 23 of
foo.inc, included from main.c, line 543").

I think it's a flexible solution, and it's not too bad on memory usage.

Stan

On Sat, Aug 22, 2009 at 12:04 AM, Gavin Lambert <antlr at mirality.co.nz>wrote:

> At 13:06 22/08/2009, Stanislav Sokorac wrote:
>
>> if (VALUE + a > 0) { echo "hi"; }
>>
>> where 'VALUE' is a macro that's defined in an include file. Your lexer
>> substituted VALUE with the defined value (say '1.0'), and marked the char
>> stream appropriately. Now, your tree walker comes upon 1.0+a, and say your
>> language doesn't allow additional of reals and integers, so you want to
>> mark/underline the expression "VALUE + a" and say "No adding of reals and
>> integers".
>>
>
> Well, the error itself is anchored on the + (since, after all, each operand
> is ok, it's only when you try to add them that there's a problem).  You
> could probably get by with just flagging the + itself as the error and not
> worrying about where the operands came from.  Failing that, you can use the
> location of the + to decide whether you've found the "right" location for
> the operands or not.
>
>  Now what, how do you underline 'VALUE + a'? I.e. how do you figure out the
>> starting and ending character of your expression in 'main.c'? The user
>> doesn't want to see the VALUE definition in another file underlined as
>> there's nothing wrong with the line of code.
>>
>
> Either don't have your lexer do the substitution (which is impractical if
> this is a C-style preprocessor that can have complex replacements), or
> expand the token definition so that the resulting "1.0" still remembers
> where the call site (the VALUE) was, so that you can use that for error
> reporting.
>
>  A similar problem occurs if you have a list of statements, and the first
>> (or last) few came from an include file.. if you wanted to show the proper
>> range in the original file, you can't determine the location of the
>> 'include' statement by only examining your "list of statements" tree node
>> and the tokens in it.
>>
>
> You could add an extra value to the token definition and lexer members (an
> "include list").  This starts out empty.  When any token is generated, the
> current include list is attached to it.  When that happens to be an include
> statement, the current include list is cloned and a reference to the include
> statement itself is added to the copy, then the copy is passed into the
> sublexer as its include list.
>
> That way, every token would carry with it the full chain of include files
> (and line numbers of the include statements) that it took to get there,
> which would make for very useful error messages.
>
> Obviously this will increase memory usage a bit (depending on how many
> levels of nested includes you have), but it's probably fairly minimal.  Just
> make sure you don't clone the list for each individual token ;)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090824/cafbe438/attachment.html