[antlr-interest] Have I found an Antlr CSharp3 lexer bug if...

Sam Harwell sharwell at pixelminegames.com
Thu Aug 4 07:04:08 PDT 2011


Hi Chris,

 

I'm using the released version 3.4.0 of the ANTLR CSharp3 target. I
copy/pasted the grammar below (aside from renaming it to Preprocessor) and
it passed the following unit test.

 

[TestMethod]

public void TestEmptyComment()

{

    string inputText = "/**/";

    var input = new ANTLRStringStream(inputText);

    var lexer = new PreprocessorLexer(input);

    var tokenStream = new CommonTokenStream(lexer);

    tokenStream.Fill();

 

    List<IToken> tokens = tokenStream.GetTokens();

    Assert.AreEqual(2, tokens.Count);

    Assert.AreEqual(PreprocessorLexer.DELIMITED_COMMENT, tokens[0].Type);

    Assert.AreEqual(inputText, tokens[0].Text);

    Assert.AreEqual(PreprocessorLexer.EOF, tokens[1].Type);

}

 

Sam

 

From: chris king [mailto:kingces95 at gmail.com] 
Sent: Thursday, August 04, 2011 3:48 AM
To: Sam Harwell; antlr-interest at antlr.org
Subject: Re: Have I found an Antlr CSharp3 lexer bug if...

 

Sam, while trying build my pre-processor with a mixed parser/lexer I ran
across what I think might be a bug. I reduced the repro below. I expected
the program below to accept "/**/ " but instead fails because the lexer
prediction enters PP_SKIPPED_CHARACTERS. That rule has a gated semantic
predicate which is always false. I expected a lexer rule with a gated
semantic predicate which is always false to never be matched. If I comment
out the PP_SKIPPED_CHARACTERS rule then it does match "/**/ ". So the
inclusion of that rule is cause the problem. Let me know if you think this
is a bug and if you can repro.

 

Thanks,
Chris

 

grammar Bug; 

 

options {

   language=CSharp3;

   output=AST;

}

 

public start

  : DELIMITED_COMMENT !EOF

  ;

  

PP_SKIPPED_CHARACTERS

  : { false }? => ~(F_NEW_LINE_CHARACTER | F_PP_POUND_SIGN)
F_INPUT_CHARACTER*

  ;

  

DELIMITED_COMMENT

  : { true }? => '/*' .* '*/'

  ;

  

WHITESPACE

  : F_WHITESPACE {skip();}

  ;

  

fragment F_WHITESPACE

  : (' ' | '\t' | '\v' | '\f')+ 

  ;

 

fragment F_NEW_LINE_CHARACTER

  : '\r'

  | '\n'

  ;

  

fragment F_PP_POUND_SIGN

  : '#'

  ;

  

fragment F_INPUT_CHARACTER

  : ~F_NEW_LINE_CHARACTER

  ;



More information about the antlr-interest mailing list