[antlr-interest] parsing ugly grammars

Tomasz Jastrzebski tdjastrzebski at yahoo.com
Sat Apr 4 09:21:27 PDT 2009


Hello all,
 
Writing a parser for some ugly grammar I came across a problem I do not know how to approach. Here is a sample grammar illustrating the problem: 
  
grammar test; 

program : (statement)*; 

statement 
   : RawData 
   | Identifier ';' 
   ; 

RawData: 'data;' ((options {greedy=false;} : .)* ';;')? ; 

Identifier : ('a'..'z')+; 

WhiteSpace : (' ' | '\t' | '\r\n' | '\r')+ { $channel=HIDDEN; }; 
 
The RawData can contain data ended with ‘;;’ or can be empty. Two sample valid inputs: 
data; some raw data here;; identifier; 
data; identifier; 
 
The parser does not correctly recognize the second input (mismatched character '<EOF>' expecting ';') .
Parser does not realize that what follows ‘data’ keyword is not followed by ‘;;’ so it is not "raw data" and should be interpreted as Identifier. 

I am clueless. Could anyone help? 
 
Thomas


      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090404/b2cd8e7a/attachment.html 


More information about the antlr-interest mailing list