[antlr-interest] Have I found an Antlr CSharp3 lexer bug if...

chris king kingces95 at gmail.com
Thu Jul 28 17:04:51 PDT 2011


Sam, thanks so much for taking the time to look at that. If I could, let me
try and explain what I'm trying to do and tell me if you think it's
possible. For my own edification, I'm trying to implement a C# grammar. I'd
like to implement the pre-processor at the moment. Implementations I've seen
generally using only a lexer and use some type of trick to maintain a stack
(e.g. for nested ifdefs and simple if/elif expressions). I figure why not
use a parser to maintain the stack -- isn't that the reason
for existence for parsers anyway? So that's what I'm trying to do -- use a
lexer and parser to implement the pre-processor.

The big difficulty is changing the lexer rules depending on whether I'm in a
#if def block that is active or not. I figured with ANTLR I'd be able to
compute if the #ifdef block is active and then throw a switch to either
parse tokens and hand those tokens off to the C# parser or consume and
ignore all input up to the next pre-processor instruction thereby disabling
that chunk of code. If I can do this then I could put the pre-processor and
parser in the same file and construct the AST in one pass! Would that be
cool? And clean? And maybe worth making a goal for ANTLR to be able to do?
:)

To be a bit more concrete: Here is the production for matching newline at
the end of pre-processor instructions. The idea would be to enable
PP_SKIPPED_CHARACTERS only if inside a disabling #ifdef block which would
consume all characters till the next pre-processing instruction.

pp_new_line
  : SINGLE_LINE_COMMENT? ((NEW_LINE! PP_SKIPPED_CHARACTERS) | EOF!)
  ;


Here is what I was hoping would work as PP_SKIPPED_CHARACTERS. Unfortunately I
don't seem to understand how to flip lexer rules on and off well enough to
make this work...


PP_SKIPPED_CHARACTERS
  : { IfDefedOut }? ( ~(F_NEW_LINE_CHARACTER | F_POUND_SIGN)
F_INPUT_CHARACTER* F_NEW_LINE )*
  ;


I hope that is enough to give you an idea of what I'm trying to do. This
approach just seems so elegant to me (by which I mean almost all declarative
-- no need to sprinkle procedural logic in among my productions to maintain
a stack or whatever) that I'd hope that it would be do able in ANTLR. What
do you think? Is it a worthy goal? Does it feel possible to you? If not, is
a goal worth trying to achieve?

Thanks,
Chris



On Thu, Jul 28, 2011 at 2:37 PM, Sam Harwell <sharwell at pixelminegames.com>wrote:

> Hi Chris,****
>
> ** **
>
> Lookahead prediction occurs before predicates are evaluated. If fixed
> lookahead uniquely determines the alternative with a  semantic predicate,
> the predicate will not be evaluated as part of the decision process. I’m
> guessing (but not 100% sure) if you use a gated semantic predicate, then it
> will not be entering the rule:****
>
> ** **
>
> PP_SKIPPED_CHARACTERS****
>
>   : {false}? => ( ~(F_NEW_LINE_CHARACTER | '#') F_INPUT_CHARACTER*
> F_NEW_LINE )*****
>
>   ;****
>
> ** **
>
> Also, a word of warning: this lexer rule can match a zero-length character
> span, which could result in an infinite loop. You should always ensure that
> every path through any lexer rule that’s not marked “fragment” will consume
> at least 1 character. There’s also a bug with certain exceptions in the
> lexer that can cause infinite loops – this has been resolved for release 3.4
> but I haven’t released it yet.****
>
> ** **
>
> Sam****
>
> ** **
>
> *From:* chris king [mailto:kingces95 at gmail.com]
> *Sent:* Thursday, July 28, 2011 4:19 PM
> *To:* antlr-interest at antlr.org; Sam Harwell
> *Subject:* Have I found an Antlr CSharp3 lexer bug if...****
>
> ** **
>
> Have I found an Antlr lexer CSharp3 bug if I can alter program execution
> (exception instead of no exception) by introducing a lexer production with a
> predicate that is always false? For example****
>
> ** **
>
> PP_SKIPPED_CHARACTERS****
>
>   : { false }? ( ~(F_NEW_LINE_CHARACTER | '#') F_INPUT_CHARACTER*
> F_NEW_LINE )*****
>
>   ;****
>
> ** **
>
> I would think that such a production should always be ignored because it's
> predicate is always false and therefore would never alter program execution.
> Yet I'm seeing a change in the execution of my program. I'm seeing it enter
> this function and throw a FailedPredicateException. I wouldn't have expected
> that this function should ever even have been executed because the predicate
> is always false.****
>
> ** **
>
>      [GrammarRule("PP_SKIPPED_CHARACTERS")]****
>
>      private void mPP_SKIPPED_CHARACTERS()****
>
>      {****
>
>           EnterRule_PP_SKIPPED_CHARACTERS();****
>
>           EnterRule("PP_SKIPPED_CHARACTERS", 31);****
>
>           TraceIn("PP_SKIPPED_CHARACTERS", 31);****
>
>           try****
>
>           {****
>
>               int _type = PP_SKIPPED_CHARACTERS;****
>
>               int _channel = DefaultTokenChannel;****
>
>               // CSharp\\CSharpPreProcessor.g:197:3: ({...}? (~ (
> F_NEW_LINE_CHARACTER | F_POUND_SIGN ) ( F_INPUT_CHARACTER )****
>
>               DebugEnterAlt(1);****
>
>               // CSharp\\CSharpPreProcessor.g:197:5: {...}? (~ (
> F_NEW_LINE_CHARACTER | F_POUND_SIGN ) ( F_INPUT_CHARACTER )****
>
>               {****
>
>               DebugLocation(197, 5);****
>
>               if (!(( false )))****
>
>               {****
>
>                    throw new FailedPredicateException(input,
> "PP_SKIPPED_CHARACTERS", " False() ");****
>
>               }****
>
> ** **
>
> Sam, I'm on an all CSharp stack v3.3.1.7705. I'm using your VS plugin
> (which is wonderful) and build integration to generate the lexer/parser
> (also wonderful) and then running on top of your CSharp port of the runtime.
> If you think this is a bug and you'd like to have a look at the repro please
> let me know. The project is open source up on CodePlex. ****
>
> ** **
>
> Thanks,
> Chris****
>


More information about the antlr-interest mailing list