[antlr-interest] Solved (my fault!) Test cases that work under Java target fail under CSharp2 target

Thu Jul 30 23:13:40 PDT 2009

This turned out to be my fault, not a discrepancy between the CSharp2 and Java targets. And it's probably a funny story (from the outside, at least :-) ), so I'll record it here.

While getting started with antlr a few weeks back, I tried a few approaches to setting up my input stream, lexer and parser.

I remember having trouble ensuring that the parser correctly ignored whitespace tokens, and experimenting with a few calls before getting the desired result. I guess it was about then that I inserted this line:

   var cts = new CommonTokenStream(); // fine...

   cts.DiscardTokenType(Token.HIDDEN_CHANNEL); // not fine.

While I guessed that DiscardTokenType would discard a class of tokens based on the assigned channel, it was actually just discarding all occurrences of whatever token happened be arbitrarily assigned internal reference number 99 (the value of Token.HIDDEN_CHANNEL) in this particular revision of my lexer definition.

Sigh.

That explains about two weeks' worth of weird behaviour where seemingly trivial modifications to the grammar introduced or resolved parse errors that I just could not reconcile with my grammar definition.

It also explains why things worked just fine when running in AntlrWorks under the Java target - unlike _my_ setup code, the AntlrWorks test harness probably doesn't randomly throw away tokens from the input stream :-).

Regards,

Andrew

---- original message below -----

Hi,

   Are there any differences or known issues with the CSharp2 target that cause it to behave differently to the "equivalent" Java target generated lexer and parser?

   I'm asking because my grammar, which works on 105 of 105 test cases when I use the Java target, fails on 6 of 105 test cases when I use the CSharp2 target.

   I'm not using actions or anything target language-specific.

   Using AntlrWorks' integrated and remote debuggers and my limited knowledge, I've got one lead right now: I've noticed that the Java version logs LT(1) events for each token, so that as the parser pulls tokens from an input stream:

      A B C

   I see LT(1) 'A' then LT(1) 'B' then LT(1) 'C'. That seems right to me.

   But when the problematic test cases are parsed in the CSharp2 target, I see LT(1) 'A' and then for some reason the next LT(1) sees 'C', not 'B'. This seems wrong.

   I'm new to Antlr - does this point to an issue with the generated lexer or the parser? What's next troubleshooting step would you suggest?

   Happy to respond with specifics of the grammar and the error messages if you think they'd be useful.

Thanks in advance,
Andrew

_________________________________________________________________
What goes online, stays online Check the daily blob for the latest on what's happening around the web
http://windowslive.ninemsn.com.au/blog.aspx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090731/e75a1287/attachment.html