[antlr-interest] Lookahead problems - Bug in C++ Runtime?
Martin Probst
mail at martin-probst.com
Fri Sep 17 04:03:25 PDT 2004
Hi,
I've done some work to track this down and it seems to be a bug in the
C++ runtime. I have created a very simple sample grammar and run it
through ANTLR one time as language="Cpp" and once as language="Java".
The result is that in java mode everything works fine and the parser is
LL(1). In C++ mode somehow two tokens are read from the lexer in
advance.
See this trace, first, Java is alright:
=== snip ===
martin at perseus Parser $ echo {{{foo}}} | java -classpath
/usr/share/antlr/lib/antlr.jar:. TestMain
> expr; > lexer mLCURLY; c=={
< lexer mLCURLY; c=={
LA(1)=={
> enclosedExpr; LA(1)=={
> expr; > lexer mLCURLY; c=={
[...]
=== snap ===
Then C++:
=== snip ===
martin at perseus Parser $ echo {{{foo}}} | ./TestMain
> expr; LA(1)== > lexer mLCURLY; c==123
< lexer mLCURLY; c==123
> lexer mLCURLY; c==123
< lexer mLCURLY; c==123
{
> enclosedExpr; LA(1)=={
> expr; LA(1)== > lexer mLCURLY; c==123
< lexer mLCURLY; c==102
{
=== snap ===
You can see that it's reading one more token than it should. This should
usually be no problem but when you want to (read: have to) change the
state of the lexer within special grammar rules things get broken
because the next token is already recognised in a wrong way. This way I
can't use ANTLR :-/
I'm attaching the grammar, a TestMain.java and a TestMain.cpp source.
Everything should be really straightforward.
mfg
Martin
Am Do, den 16.09.2004 schrieb Martin Probst um 13:29:
> Hello,
> I have a lookahead problem with my grammar. I have a parser which has k=1
> but it actually seems to be looking ahead further than it should. See this
> output of ANTLR with -traceParser -traceLexer:
>
> In the state before these steps my parser has recognized a "dirAttribute".
> It looks ahead, finds a "=" and a '"' and then descends into a
> dirAttributeValue. That's expected and good.
>
> === snip ===
> > dirAttributeValue; LA(1)== > lexer mNEXT; c==104
> > lexer mQUOT_ATTR_CONTENT; c==104
> < lexer mQUOT_ATTR_CONTENT; c==123
> < lexer mNEXT; c==123
> "
> > lexer mNEXT; c==123
> > lexer mLCURLY; c==123
> < lexer mLCURLY; c==32
> < lexer mNEXT; c==32
> > quotAttrValueContent; LA(1)==http://www.w3
> < quotAttrValueContent; LA(1)== > lexer mNEXT; c==32
> > lexer mWS; c==32
> < lexer mWS; c==34
> < lexer mNEXT; c==34
> > lexer mNEXT; c==34
> > lexer mSTRING_LITERAL; c==34
> > lexer mQUOT; c==34
> < lexer mQUOT; c==46
> > lexer mQUOT; c==34
> < lexer mQUOT; c==32
> < lexer mSTRING_LITERAL; c==32
> < lexer mNEXT; c==32
> {
> > quotAttrValueContent; LA(1)=={
> > attrCommonContent; LA(1)=={
> > expr; LA(1)== > lexer mNEXT; c==32
> > lexer mWS; c==32
> < lexer mWS; c==125
> < lexer mNEXT; c==125
> > lexer mNEXT; c==125
> > lexer mRCURLY; c==125
> < lexer mRCURLY; c==32
> < lexer mNEXT; c==32
> .org
> [ ca. 15 grammatical steps removed ]
> > literal; LA(1)==.org
> > stringLiteral; LA(1)==.org
> < stringLiteral; LA(1)== > lexer mNEXT; c==32
> > lexer mWS; c==32
> < lexer mWS; c==47
> < lexer mNEXT; c==47
> > lexer mNEXT; c==47
> > lexer mSLASH; c==47
> < lexer mSLASH; c==49
> < lexer mNEXT; c==49
> }
> < literal; LA(1)==}
>
> === snap ===
>
> Now the rule for "attrCommonContent" states:
> attrCommonContent:
> /* some more alts */
> | LCURLY expr RCURLY
> The lookeahed of the RCURLY should by that be sufficient to exit the
> attrCommonContent rule. So why does the parser require more lookahead from
> the lexer when exiting stringLiteral?
>
> The problem with that is that within dirAttributeValue the lexer has to
> throw tokens in a different manner than within the following expr rules.
> This means I have to switch the lexer to a different state (done with
> actions within {} in the grammar). I can't switch the state before the
> parser leaves the attrCommonContent section (that means, the statement has
> to be directly behind the RCURLY within that one). But at that point the
> parser has obviously already fetched more tokens behind the RCURLY which
> leads to errors.
>
> My lexer has k=2 and the whole stuff uses C++ with the runtime and
> generator from antlr-2.7.4. Can anyone help me with this? Am I
> missunderstanding ANTLRs behaviour in general or is this a bug or what?
>
> Thanks,
> Martin
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>
>
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TestMain.cpp
Type: text/x-c++src
Size: 670 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20040917/720897ea/TestMain.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TestMain.java
Type: text/x-java
Size: 280 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20040917/720897ea/TestMain-0001.bin
-------------- next part --------------
options {
language = "Cpp";
}
class TestParser extends Parser;
options {
k=1;
}
expr:
(primaryExpr | enclosedExpr)*;
enclosedExpr:
LCURLY expr RCURLY;
primaryExpr:
FOO;
class TestLexer extends Lexer;
options { k=1; }
LCURLY: "{";
RCURLY: "}";
FOO: "foo";
More information about the antlr-interest
mailing list