[antlr-interest] Strange parsing behavior

Tue Apr 7 09:46:12 PDT 2009

Handling only a single block was a mistake on my part in the email. The rule
should have been:

prog : (b=ID '{' (ID ';')+ '}')+ { System.out.println("Found block: " +
$b.text);} ;

I cut too much out when I composed the email. The language this came from is
very complicated and was trying to simplify the issue. I just went too far.
:)

In the original grammar where I found the issue, it handled multiple blocks
correctly. And as long as there were no semicolons between blocks,
everything was fine. As we were developing the imlementation for the syntax
error handling, one engineer decided to try the semicolon between blocks and
we found that it did not generate a syntax error and didn't parse the rest
of the file also.

Another interesting note is that if the above grammar is changed to remove
the semicolon-terminated item inside the block, such as just having '{' ID+
'}', it correctly detects the semicolon between the blocks as an error.

Thanks!

- Dan -

On Tue, Apr 7, 2009 at 9:30 AM, Jim Idle <jimi at temporal-wave.com> wrote:

>  Dan Baumberger wrote:
>
> I am working with an ANTLR grammar for a custom language and have
> encountered a strange parsing issue. Here is a highly simplified grammar for
> the issue I've found:
>  grammar Test;
> @members {
>     public static void main(String[] args) throws Exception {
>         TestLexer lex = new TestLexer(new ANTLRFileStream(args[0]));
>         CommonTokenStream tokens = new CommonTokenStream(lex);
>         TestParser parser = new TestParser(tokens);
>         try {
>             parser.prog();
>         } catch (RecognitionException e) {
>             e.printStackTrace();
>         }
>     }
> }
>
>  prog : b=ID '{' s=ID ';' '}' { System.out.println("Found block: " +
> $b.text);} ;
> ID : ('A'..'Z' | 'a'..'z') ('A'..'Z' | 'a'..'z' | '0'..'9' | '_')* ;
> WS : (' '|'\r'|'\t'|'\u000C'|'\n'|'\u0000') {$channel=HIDDEN;} ;
>
>  If I give it an input of:
>
>  foo { a; }; bar { b;}
>
>  If it displays:
>
>  Found block: foo
>
>  It does not flag any errors and all blocks following the semicolon are
> ignored. It works correctly without the semicolon (the normal case) with
> both block names displayed but it should at least flag some kind of error if
> the semicolon is there. I've tried this with ANTLR v3.1.1 and v3.1.3 with
> both Java and C targets and all behave the same. Does anyone know what is
> going on?
>
>   Well, it is doing what you asked it to :-)
>
> First, your prog rule only parses one block, then it seems a ';' which you
> have not catered for. Because you don;t have EOF as the last terminal of
> your prog rule, ANTLR assumes that you want it to stop. If you use:
>
> prog : b=ID '{' s=ID ';' '}' { System.out.println("Found block: " +
> $b.text);} EOF;
>
> Then it will give you a syntax error at ';'
>
> However, if you want it to go for another block, you will have to make your
> grammar look for multiple blocks of course.
>
> Jim
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090407/030636d4/attachment.html