[antlr-interest] parsing ugly grammars

Sat Apr 4 09:41:35 PDT 2009

There’s a big problem. You can parse this using a synpred, but it will have O(n²) complexity in the length of your input. Is there another possible terminator for the raw data? Newline maybe? With newline terminating the raw data, you’d still have O(n²) complexity in the length of the line. You should seriously consider requiring the ';;' terminator for the raw data.

Sam

From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Tomasz Jastrzebski
Sent: Saturday, April 04, 2009 11:21 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] parsing ugly grammars

Hello all,

Writing a parser for some ugly grammar I came across a problem I do not know how to approach. Here is a sample grammar illustrating the problem: 

grammar test; 

program : (statement)*; 

statement 
   : RawData 
   | Identifier ';' 
   ; 

RawData: 'data;' ((options {greedy=false;} : .)* ';;')? ; 

Identifier : ('a'..'z')+; 

WhiteSpace : (' ' | '\t' | '\r\n' | '\r')+ { $channel=HIDDEN; }; 

The RawData can contain data ended with ‘;;’ or can be empty. Two sample valid inputs: 
data; some raw data here;; identifier; 
data; identifier; 

The parser does not correctly recognize the second input (mismatched character '<EOF>' expecting ';') .
Parser does not realize that what follows ‘data’ keyword is not followed by ‘;;’ so it is not "raw data" and should be interpreted as Identifier. 

I am clueless. Could anyone help? 

Thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090404/1bf71b99/attachment.html