[antlr-interest] C target recognition problem

Pierre Attar pat at tireme.fr
Mon Mar 31 01:44:10 PDT 2008


Jim,

Thanks a lot for your answer. I'm a little bit new to the design 
architecture of antlr and really don't know where to find, both in the 
3.0.1 or 3.1 source the "bugletted" substring().

Any idea on a workaround ? Something like a code converting wchar_t to 
one antlr type def ?

Pierre

Jim Idle a écrit :
> This is fixed in ANTLR 3.1 and the C++ integration is done correctly in this version too. You don't need extern "C" any more, you just compile the generated code as C++. Remember to keep as little code as possible in your grammar rules and use helper classes though.
>
> Also, when using operating specific implementations of wide characters, remember that the representation can change between 16 and 32 bits (for instance wchar_t), which can completely break certain codes. If you use the ANTLR typedefs, they will ensure that the characters are always 16 bits. ANTLR 3.1 also has conversion routines (from the Unicode.org standard issue) to convert between encoding types if that helps.
>
> Jim
>
> PS: The fix is to with substring() in the UCS2 input stream, which has a small buglette. You can probably fix it by looking at the 3.1 code if you must stay with 3.0.1 for now. This week I hope to fix any outstanding 3.1 bugs, test remote debugging and then 3.1 can be released as soon as Ter thinks that the time is right.
>
>
>   
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of Pierre Attar
>> Sent: Sunday, March 30, 2008 2:51 AM
>> To: ANTLR
>> Subject: [antlr-interest] C target recognition problem
>>
>> Hi,
>>
>> I'm running the following rule in ANTLRWorks with an input such as
>> f"blabla"f :
>>
>> FString : '\u0022'  ~('\u0022')+ '\u0022';
>>  and "blabla" is recognized as a string .... it works perfectly.
>>
>>
>> But in fact, I'm using the C generator in a C++ environment so all code
>> is included as extern "C".
>> Also, in my reality, the string to analyze is created in memory by an
>> other ANTLR recognition wich creates XMLString (wchar).
>>
>> So my lexer recognizer is defined as
>>         input = antlr3NewUCS2StringInPlaceStream ((pANTLR3_UINT16)
>> str,(ANTLR3_UINT64) XMLString::stringLen(str), NULL);
>>
>> Doing that, it seems that the lexer is able to recognize the str but
>> when I try to get the text from the recognized string
>> FString2->getText(FString2)->chars
>>
>>  I have a nothing ("") sting.
>>
>> Any ideas on where may be the problem ? I'm quite sure it is a
>> character
>> coding problem but I'm not able to find where are the contradictions
>> ...
>>
>> Thaks al lot for help,
>>
>> Pierre
>>
>>
>>     
>
>
>
>
>
>
>
>   



More information about the antlr-interest mailing list