[antlr-interest] parsing ugly grammars
Sam Harwell
sharwell at pixelminegames.com
Sat Apr 4 09:41:35 PDT 2009
There’s a big problem. You can parse this using a synpred, but it will have O(n²) complexity in the length of your input. Is there another possible terminator for the raw data? Newline maybe? With newline terminating the raw data, you’d still have O(n²) complexity in the length of the line. You should seriously consider requiring the ';;' terminator for the raw data.
Sam
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Tomasz Jastrzebski
Sent: Saturday, April 04, 2009 11:21 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] parsing ugly grammars
Hello all,
Writing a parser for some ugly grammar I came across a problem I do not know how to approach. Here is a sample grammar illustrating the problem:
grammar test;
program : (statement)*;
statement
: RawData
| Identifier ';'
;
RawData: 'data;' ((options {greedy=false;} : .)* ';;')? ;
Identifier : ('a'..'z')+;
WhiteSpace : (' ' | '\t' | '\r\n' | '\r')+ { $channel=HIDDEN; };
The RawData can contain data ended with ‘;;’ or can be empty. Two sample valid inputs:
data; some raw data here;; identifier;
data; identifier;
The parser does not correctly recognize the second input (mismatched character '<EOF>' expecting ';') .
Parser does not realize that what follows ‘data’ keyword is not followed by ‘;;’ so it is not "raw data" and should be interpreted as Identifier.
I am clueless. Could anyone help?
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090404/1bf71b99/attachment.html
More information about the antlr-interest
mailing list