[antlr-interest] Multiple newlines question

Tue Aug 30 07:06:24 PDT 2011

Hi. I am very new to parser generators, BNF grammars and stuff. I am reading
The definitive ANTLR reference and try things out. Currently I have the
following simple grammar:

grammar Test;

@header {
package test;
}

@lexer::header {
package test;
}

program:
    INT (NEWLINE+ INT)* EOF;

INT: '0'..'9'+;
NEWLINE: '\r'?'\n';
WS: (' ' | '\t' | '\r' | '\n')+ {skip();};

and the following test (TestNG):

package test;

import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.TokenStream;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;

public class ANTLRTest {

    @DataProvider(name = "data")
    public Object[][] getData() {
        return new Object[][] {
                { "1\n2\n\n3" }
                };
    }

    @Test(dataProvider = "data")
    public void test(String value) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream(value);
        TestLexer l = new TestLexer(in);
        TokenStream ts = new CommonTokenStream(l);
        TestParser p = new TestParser(ts);
        p.program();
    }
}

So, a program is an INT, followed by a sequence of one or many NEWLINES
followed by an INT; at the very end is an EOF (I think I have to do it to
force ANTLR read the whole input?). However, I get the following error
printed out to stderr:

line 4:0 extraneous input '3' expecting EOF

The parser doesn't like the two adjacent newlines. If I leave only one, it
works without errors, but I would like it to be able to deal with multiple
newlines.

Could anybody more experienced with me please take a look and tell me what I
am doing wrong here?

Q2. The same rules (almost - the newlines) are in the NEWLINE and WS tokens
- how does it work that once the newlines are matched and sometimes skipped?
Is the sequence in the grammar file important in this case?

Q3. For some reason, I get the errors printed out on System.err, and the
test passes even with errors. This is not exactly what I would need, I would
need it to fail laudly on any erros (be it lexical or parsing). Any quick
pointer would be nice, but if not, I guess the book will deal cover it
sooner or later.

Best regards,
wujek