[antlr-interest] Keeping track of lengths

Thiago Arrais thiago.arrais at gmail.com
Tue Jan 31 11:29:37 PST 2006


Has anyone tried to keep track of the lenghts of the language constructs?

For instance, suppose we are parsing a Java method

public void aMethod() {
    return;
}

I would like to record on the resulting AST node that the source code
for this method declaration is 38 characters long or, at least, that
it ends at line/column 3:2. How could we do that?

I have thought that the second (relaxed) version -- recording the end
token's position -- could be achieved by capturing the last token and
inspecting its source location. But there are some complications:

1. A rule calls other rules, so I don't ultimately know which token
ends a construct. I don't even necessarily know which child-rule was
applied at the end of the father-rule. This prevents me from capturing
the last token by using a label. Maybe there is something like a
getLastConsumedToken() method somewhere? (I couldn't find it). If not,
which would be an elegant way to implement it? Overriding the consume
method to store the last token?

2. On my context not all the tokens come directly from the underlying
input. Some of them are virtual tokens, created by a decorator token
stream that inserts them to help the parser. So, the method I want
would really rather be called getLastConsumedRealToken()

3. My parser is supposed to record the lengths of most constructs.
Inserting a call to the getLastConsumedToken (shall it really exist
somewhere) at the end of all rules as an action would be very
error-prone, and I would like to avoid it if possible. Is there a way
to insert a common action to be executed after every rule?

4. It would be very useful for me to have character length instead of
last token location info, if possible.

Any ideas?

Cheers,

Thiago Arrais


More information about the antlr-interest mailing list