[antlr-interest] Matching multiple occurrences of quoted text joined by 'and' (i.e. "a" and "b" and "c")

Wed Nov 3 15:40:14 PDT 2010

Hmmm - I think I might be running into a bug - either in the code or my
understanding (almost certainly my understanding!).

I have created a simple grammar which demonstrates the problem (I am testing
the first parser rule called 'rule1') :

--- start
grammar QuotedText;

@parser::header {
package examples.aandb;
}

@lexer::header {
package examples.aandb;
}

rule1
: a=QUOTED_TEXT 'and' b=QUOTED_TEXT 'and' c=QUOTED_TEXT
{ System.out.println("rule1: " + a.getText() + ", " + b.getText() + "," +
c.getText());}
;
ruleThatShouldBeIgnored
: 'and whose' 'external'? 'resource is' theResource=('this' | QUOTED_TEXT)
{ System.out.println("taskResource: " + $theResource);}
;

QUOTED_TEXT : '"' (~'"')* '"';
WS
    : (' '|'\t'|'\n'|'\r')+ {skip();}
    ;
--- end

My test case is as follows:

--- start
package examples.aandb;

import org.junit.Test;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.RecognitionException;

import java.io.IOException;

public class TestCase {

    @Test
    public void happyPath() throws IOException, RecognitionException {
        String dsl = "\"a\" and \"b\" and \"c\"";
        createParser(dsl).rule1();
    }

    private QuotedTextParser createParser(String testString) throws
IOException {
        QuotedTextLexer lexer = createLexer(testString);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        return new QuotedTextParser(tokens);
    }

    private QuotedTextLexer createLexer(String testString) throws
IOException {
        CharStream stream = new ANTLRStringStream(testString);
        return new QuotedTextLexer(stream);
    }
}

--- end

If I run that (in IDEA 8 using latest antlrworks and antlr 3.2 then I get
the following output:

--- start
lline 1:8 mismatched character '"' expecting 'w'
line 1:9 no viable alternative at character 'b'
line 1:17 no viable alternative at character 'c'
line 1:19 mismatched character '<EOF>' expecting '"'
line 1:10 missing 'and' at '" and "'
line 0:-1 mismatched input '<EOF>' expecting 'and'
--- end

if however, I comment out the second rule ('ruleThatShouldBeIgnored') then
everything works as expected.  The output is:

--- start
rule1: "a", "b","c"
--- end

I don't understand this behaviour - I don't see why
'ruleThatShouldBeIgnored' is having any influence.

Any ideas?

Thanks,

Col

On 3 November 2010 19:37, Colin Yates <colin.yates at gmail.com> wrote:

> Thanks Gordon,
>
> That doesn't work either.  I think I need to separate out just this
> fragment into its own grammar to ensure that the rest of the grammar
> isn't having any unexpected side effects.
>
> I will report back once I have isolated these two rules... Thanks!
>
> Sent from my iPad
>
> On 3 Nov 2010, at 19:25, Gordon Tyler <Gordon.Tyler at quest.com> wrote:
>
> >> QUOTED_TEXT : '\"' ( options {greedy=false;} : .)* '\"'
> >
> > Try this:
> >
> > QUOTED_TEXT : '"' (~'"')* '"'
> >
> > In English: Match '"', then match zero or more characters which are not
> '"', then match '"'.
> >
> > Ciao,
> > Gordon
> >
>