[antlr-interest] C target character position

A Z asicaddress at gmail.com
Sat Nov 20 18:29:15 PST 2010


Thanks for the quick response.  There was a bug in my printf
statements causing the pointer addresses to be incorrect. I was fairly
certain they worked as you described but I wanted to be sure.



On 11/19/10, Jim Idle <jimi at temporal-wave.com> wrote:
> The very first token gives you a =1 for the char position in line I am
> afraid, I need to work around that I think, but the indexes are pointers in
> to memory (your input) and not 0, 1, 2 etc. Note that the token also
> remembers that start of the line that it is located on.
>
> If the start of the first token is not the start of your data, then perhaps
> there are comments and newline tokens that are skipped before the first
> token that the parser sees? If this did not work, there would be a lot of
> broken parsers out there.
>
> So, use the pointer to get the start, subtract it from the end pointer to
> get the length and print out that many characters, which will show you what
> the token matched. The line start is updated when a '\n' is seen by the
> parser, but you can change the character. This is useful for error messages
> when you want to print the text line that an error occurs in.
>
> The offset of the token is the start point minus the input start (use the
> address you pass in (databuffer) and not input->data), however, the pointer
> is pointing directly at that anyway. I think that you are forgetting that
> the token stream does not return off channel tokens or SKIP()ed tokens.
>
> Jim
>
>
>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of A Z
>> Sent: Friday, November 19, 2010 4:44 AM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] C target character position
>>
>> Hello,
>>
>>   I'm trying to record the offset of the start of a token, relative to
>> the beginning of the input buffer. My program passes a (char *) buffer
>> to ANTLR and then runs a simple grammar that builds a data structure
>> containing the element types and pointer to their position in the text
>> buffer. The problem is I can't find a way to get the true character
>> offset from ANTLR in order to set the pointer. Below it prints out the
>> results of most of the values for the ANTLR3_COMMON_TOKEN for the very
>> first token. The two subsequent values are the data member and the
>> address of the character buffer. I would expect start, getStartIndex
>> and input->data to be the same but they are different. How can I find
>> the offset of a token, in terms of the number of characters from the
>> start of the stream?
>>
>> Thanks
>>
>> charPosition          : -1
>> getCharPositionInLine : -1
>> getLine               : 1
>> getStartIndex         : 23213648
>> getStopIndex          : 23213653
>> getTokenIndex         : 0
>> index                 : 0
>> line                  : 1
>> lineStart             : 23213648
>> start                 : 23213648
>> stop                  : 23213653
>>
>> (pANTLR3_INPUT_STREAM)input->data 23217928
>> (uint8_t*)dataBuffer              23213624
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list