[antlr-interest] C target recognition problem

Mon Mar 31 10:08:33 PDT 2008

If you are using Windows, then wchar_t will be 16 bits, but is likely to be 32bits on Unix. If you are receiving 32 bit characters as inputs then you can either implement a 32 bit input stream (though I will do that before too long), or convert the 32 bit chars to 16 bit chars (of course, if this is UTF32 and not just UCS2 in 32 bits then you will need to be careful).

Just grep/search for the string 'Substr' in *.c and you will find antlr3UCS2Substr in antr3ucs2inputstream.c

Jim

> -----Original Message-----
> From: Pierre Attar [mailto:pat at tireme.fr]
> Sent: Monday, March 31, 2008 1:44 AM
> To: Jim Idle
> Cc: ANTLR
> Subject: Re: [antlr-interest] C target recognition problem
> 
> Jim,
> 
> Thanks a lot for your answer. I'm a little bit new to the design
> architecture of antlr and really don't know where to find, both in the
> 3.0.1 or 3.1 source the "bugletted" substring().
> 
> Any idea on a workaround ? Something like a code converting wchar_t to
> one antlr type def ?
> 
> Pierre
> 
> Jim Idle a écrit :
> > This is fixed in ANTLR 3.1 and the C++ integration is done correctly
> in this version too. You don't need extern "C" any more, you just
> compile the generated code as C++. Remember to keep as little code as
> possible in your grammar rules and use helper classes though.
> >
> > Also, when using operating specific implementations of wide
> characters, remember that the representation can change between 16 and
> 32 bits (for instance wchar_t), which can completely break certain
> codes. If you use the ANTLR typedefs, they will ensure that the
> characters are always 16 bits. ANTLR 3.1 also has conversion routines
> (from the Unicode.org standard issue) to convert between encoding types
> if that helps.
> >
> > Jim
> >
> > PS: The fix is to with substring() in the UCS2 input stream, which
> has a small buglette. You can probably fix it by looking at the 3.1
> code if you must stay with 3.0.1 for now. This week I hope to fix any
> outstanding 3.1 bugs, test remote debugging and then 3.1 can be
> released as soon as Ter thinks that the time is right.
> >
> >
> >
> >> -----Original Message-----
> >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> >> bounces at antlr.org] On Behalf Of Pierre Attar
> >> Sent: Sunday, March 30, 2008 2:51 AM
> >> To: ANTLR
> >> Subject: [antlr-interest] C target recognition problem
> >>
> >> Hi,
> >>
> >> I'm running the following rule in ANTLRWorks with an input such as
> >> f"blabla"f :
> >>
> >> FString : '\u0022'  ~('\u0022')+ '\u0022';
> >>  and "blabla" is recognized as a string .... it works perfectly.
> >>
> >>
> >> But in fact, I'm using the C generator in a C++ environment so all
> code
> >> is included as extern "C".
> >> Also, in my reality, the string to analyze is created in memory by
> an
> >> other ANTLR recognition wich creates XMLString (wchar).
> >>
> >> So my lexer recognizer is defined as
> >>         input = antlr3NewUCS2StringInPlaceStream ((pANTLR3_UINT16)
> >> str,(ANTLR3_UINT64) XMLString::stringLen(str), NULL);
> >>
> >> Doing that, it seems that the lexer is able to recognize the str but
> >> when I try to get the text from the recognized string
> >> FString2->getText(FString2)->chars
> >>
> >>  I have a nothing ("") sting.
> >>
> >> Any ideas on where may be the problem ? I'm quite sure it is a
> >> character
> >> coding problem but I'm not able to find where are the contradictions
> >> ...
> >>
> >> Thaks al lot for help,
> >>
> >> Pierre
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
> >
> >