[antlr-interest] C target character position

Fri Nov 19 09:59:17 PST 2010

The very first token gives you a =1 for the char position in line I am
afraid, I need to work around that I think, but the indexes are pointers in
to memory (your input) and not 0, 1, 2 etc. Note that the token also
remembers that start of the line that it is located on. 

If the start of the first token is not the start of your data, then perhaps
there are comments and newline tokens that are skipped before the first
token that the parser sees? If this did not work, there would be a lot of
broken parsers out there.

So, use the pointer to get the start, subtract it from the end pointer to
get the length and print out that many characters, which will show you what
the token matched. The line start is updated when a '\n' is seen by the
parser, but you can change the character. This is useful for error messages
when you want to print the text line that an error occurs in.

The offset of the token is the start point minus the input start (use the
address you pass in (databuffer) and not input->data), however, the pointer
is pointing directly at that anyway. I think that you are forgetting that
the token stream does not return off channel tokens or SKIP()ed tokens.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of A Z
> Sent: Friday, November 19, 2010 4:44 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] C target character position
> 
> Hello,
> 
>   I'm trying to record the offset of the start of a token, relative to
> the beginning of the input buffer. My program passes a (char *) buffer
> to ANTLR and then runs a simple grammar that builds a data structure
> containing the element types and pointer to their position in the text
> buffer. The problem is I can't find a way to get the true character
> offset from ANTLR in order to set the pointer. Below it prints out the
> results of most of the values for the ANTLR3_COMMON_TOKEN for the very
> first token. The two subsequent values are the data member and the
> address of the character buffer. I would expect start, getStartIndex
> and input->data to be the same but they are different. How can I find
> the offset of a token, in terms of the number of characters from the
> start of the stream?
> 
> Thanks
> 
> charPosition          : -1
> getCharPositionInLine : -1
> getLine               : 1
> getStartIndex         : 23213648
> getStopIndex          : 23213653
> getTokenIndex         : 0
> index                 : 0
> line                  : 1
> lineStart             : 23213648
> start                 : 23213648
> stop                  : 23213653
> 
> (pANTLR3_INPUT_STREAM)input->data 23217928
> (uint8_t*)dataBuffer              23213624
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address