[antlr-interest] Losing a character when switching parsers

Tue Nov 8 08:05:08 PST 2005

I've been trying to figure this out for a few days, so I hope someone can
help me.
I have a situation similar to the JavaDoc example in the antlr distribution,
of text within text that has different syntax rules.
My problem is that when I move from each parser/lexer, I lose a character,
so for input "some text [!some other text]" , the secondary parser only sees
"[!", "ome other text", and "]" as the tokens.

I won't post the whole parser/lexer files, but here are the important parts:
In the main antlr file, in the parser:

------------------
protected pageOption returns [WikiPageOptionToken defaultToken = null]
	:
		StartWikiPageOption
		{
			WikiPageOptionParser parser = new
WikiPageOptionParser(getInputState());
			defaultToken = parser.pageOption();
		}
	;
------------------

And in the lexer:

------------------
StartWikiPageOption : "[!"  {
TokenParser.selector.push("WikiPageOptionLexer"); };
------------------

In the secondary antlr file, in the parser:

------------------
pageOption returns [WikiPageOptionToken defaultToken = null]
	{TextToken pageOptionValue = null;} : 
		w1:Word { TextToken pageOptionName = new
TextToken(w1.getText()); }
		(PageOptionSeperator w2:Word {pageOptionValue = new
TextToken(w2.getText());})? 
		EndWikiPageOption
		{ 
			defaultToken = new WikiPageOptionToken(); 
			defaultToken.AddChildToken(pageOptionName);	

			if ((pageOptionValue as TextToken)!=null)
			{
				defaultToken.AddChildToken(pageOptionValue);
			}
		}
	;
------------------

And in the lexer:

------------------
StartPageOption : "[!";
EndPageOption: "]" { TokenParser.selector.pop(); };
PageOptionSeperator: ":";

Word: ('a'..'z' | 'A'..'Z')+;
------------------

For input "[!PreserveWhiteSpace]", I get the following exception:

------------------
TestCase 'JustWikiUnitTest.HtmlTokens.TestTokenParser.TestTranslateToHtml'
failed: System.Exception : wiki page content: [!PreserveWhiteSpace]
  ----> antlr.TokenStreamRecognitionException : unexpected char: 'P'

c:\projects\justwiki\justwikiunittest\htmltokens\testtokenparser.cs(73,0):
at JustWikiUnitTest.HtmlTokens.TestTokenParser.TestTranslateToHtml()
	--Exception
	c:\projects\justwiki\justwiki_generated\justwikilexer.cs(105,0): at
JustWiki.JustWikiLexer.nextToken()
	at antlr.TokenStreamSelector.nextToken()
	at antlr.TokenBuffer.fill(Int32 amount)
	at antlr.TokenBuffer.LA(Int32 i)
	at antlr.LLkParser.LA(Int32 i)
	at antlr.Parser.consumeUntil(BitSet bset)
	at antlr.Parser.recover(RecognitionException ex, BitSet tokenSet)

c:\projects\justwiki\justwiki_generated\wikipageoptionparser.cs(113,0): at
JustWiki.WikiPageOptionParser.pageOption()
	c:\projects\justwiki\justwiki_generated\justwikiparser.cs(129,0): at
JustWiki.JustWikiParser.pageOption()
	c:\projects\justwiki\justwiki_generated\justwikiparser.cs(80,0): at
JustWiki.JustWikiParser.startRule()
	C:\Projects\JustWiki\Tokens\TokenParser.cs(33,0): at
JustWiki.Tokens.TokenParser.Parse(Controller controller, WikiWord wikiPage)

c:\projects\justwiki\justwikiunittest\htmltokens\testtokenparser.cs(23,0):
at JustWikiUnitTest.HtmlTokens.TestTokenParser.ParseWikiPage(String
mockWikiPageContent)

c:\projects\justwiki\justwikiunittest\htmltokens\testtokenparser.cs(62,0):
at JustWikiUnitTest.HtmlTokens.TestTokenParser.TestTranslateToHtml()
------------------

And if I debug it, in the part where it looks for the Word token, it has
this:
------------------
	w1 = LT(1);
	match(Word);
------------------

It fails on the LT(1) specifically, though in the debug view it shows LT(1)
as a token of type Word with the text "reserveWhiteSpace", so it looks like
I'm losing a character somewhere... I'm hoping someone can help me.

Thanks, Miki Watts