[antlr-interest] Re: Unicode-16 xml parser

Livio dalloro at gmail.com
Fri Feb 17 03:10:47 PST 2006


Sorry for the duplicate post,
no hints for my grammar?

thanks anyway.

On 2/15/06, Livio <dalloro at gmail.com> wrote:
>
> Hi,
>
> I'm not sure this is the right list to ask this question...
> I'm using antlr-2.7.6 to produce c++ code and I'm trying to parse
> simplified xml unicode (not UTF-8) files.
> I've read a couple of threads about unicode parsing in antlr and I've seen
>
> that antlr doesn't support wchar. But it shouldn't be a problem, since
> what I need to produce is an "old style" parser/scanner able to handle
> unicode
> input, not a full unicode parser/scanner.
> I've also had a look to the unicode exmaple in the antlr examples
> directory, but
> it seemed too tailored to UTF-8 inputs.
>
> My first attempt was to produce a grammar that could work with ANSI, and I
>
> succeded on my first attempt.
> The grammar in attach is the result of my porting to unicode-16
> (In my plans there's a porting to unicode-32), that first grammar, and
> didn't work.
> The ansi scanner had a k=2, the unicode version has a k=4 because of the
> internal
> "char" representation that antlr uses.
>
> The problem shows up immediately at the beginning of the parse phase.
> May you give me some hints to discover and solve the error(s)?
> I know I'm a newbie, so I hope my problem don't get you bored too much.
>
> I thank you in advance.
>
> P.S.
> Note that the first 2 byte that usually "mark" the beginning of a unicode
> file have
> been removed to simpify the grammar.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060217/4663efc1/attachment.html


More information about the antlr-interest mailing list