[antlr-interest] Losing a character when switching parsers
Miki Watts
mikiw at orb-software.com
Tue Nov 8 08:05:08 PST 2005
I've been trying to figure this out for a few days, so I hope someone can
help me.
I have a situation similar to the JavaDoc example in the antlr distribution,
of text within text that has different syntax rules.
My problem is that when I move from each parser/lexer, I lose a character,
so for input "some text [!some other text]" , the secondary parser only sees
"[!", "ome other text", and "]" as the tokens.
I won't post the whole parser/lexer files, but here are the important parts:
In the main antlr file, in the parser:
------------------
protected pageOption returns [WikiPageOptionToken defaultToken = null]
:
StartWikiPageOption
{
WikiPageOptionParser parser = new
WikiPageOptionParser(getInputState());
defaultToken = parser.pageOption();
}
;
------------------
And in the lexer:
------------------
StartWikiPageOption : "[!" {
TokenParser.selector.push("WikiPageOptionLexer"); };
------------------
In the secondary antlr file, in the parser:
------------------
pageOption returns [WikiPageOptionToken defaultToken = null]
{TextToken pageOptionValue = null;} :
w1:Word { TextToken pageOptionName = new
TextToken(w1.getText()); }
(PageOptionSeperator w2:Word {pageOptionValue = new
TextToken(w2.getText());})?
EndWikiPageOption
{
defaultToken = new WikiPageOptionToken();
defaultToken.AddChildToken(pageOptionName);
if ((pageOptionValue as TextToken)!=null)
{
defaultToken.AddChildToken(pageOptionValue);
}
}
;
------------------
And in the lexer:
------------------
StartPageOption : "[!";
EndPageOption: "]" { TokenParser.selector.pop(); };
PageOptionSeperator: ":";
Word: ('a'..'z' | 'A'..'Z')+;
------------------
For input "[!PreserveWhiteSpace]", I get the following exception:
------------------
TestCase 'JustWikiUnitTest.HtmlTokens.TestTokenParser.TestTranslateToHtml'
failed: System.Exception : wiki page content: [!PreserveWhiteSpace]
----> antlr.TokenStreamRecognitionException : unexpected char: 'P'
c:\projects\justwiki\justwikiunittest\htmltokens\testtokenparser.cs(73,0):
at JustWikiUnitTest.HtmlTokens.TestTokenParser.TestTranslateToHtml()
--Exception
c:\projects\justwiki\justwiki_generated\justwikilexer.cs(105,0): at
JustWiki.JustWikiLexer.nextToken()
at antlr.TokenStreamSelector.nextToken()
at antlr.TokenBuffer.fill(Int32 amount)
at antlr.TokenBuffer.LA(Int32 i)
at antlr.LLkParser.LA(Int32 i)
at antlr.Parser.consumeUntil(BitSet bset)
at antlr.Parser.recover(RecognitionException ex, BitSet tokenSet)
c:\projects\justwiki\justwiki_generated\wikipageoptionparser.cs(113,0): at
JustWiki.WikiPageOptionParser.pageOption()
c:\projects\justwiki\justwiki_generated\justwikiparser.cs(129,0): at
JustWiki.JustWikiParser.pageOption()
c:\projects\justwiki\justwiki_generated\justwikiparser.cs(80,0): at
JustWiki.JustWikiParser.startRule()
C:\Projects\JustWiki\Tokens\TokenParser.cs(33,0): at
JustWiki.Tokens.TokenParser.Parse(Controller controller, WikiWord wikiPage)
c:\projects\justwiki\justwikiunittest\htmltokens\testtokenparser.cs(23,0):
at JustWikiUnitTest.HtmlTokens.TestTokenParser.ParseWikiPage(String
mockWikiPageContent)
c:\projects\justwiki\justwikiunittest\htmltokens\testtokenparser.cs(62,0):
at JustWikiUnitTest.HtmlTokens.TestTokenParser.TestTranslateToHtml()
------------------
And if I debug it, in the part where it looks for the Word token, it has
this:
------------------
w1 = LT(1);
match(Word);
------------------
It fails on the LT(1) specifically, though in the debug view it shows LT(1)
as a token of type Word with the text "reserveWhiteSpace", so it looks like
I'm losing a character somewhere... I'm hoping someone can help me.
Thanks, Miki Watts
More information about the antlr-interest
mailing list