[antlr-interest] Problems in skipping unwanted text.

Dugald Wilson Dugald.Wilson at aveva.com
Thu Oct 6 01:15:19 PDT 2011


Hi,

For quite some time I have working on this home project trying to parse a very complex grammar.  I had a brain wave yesterday on how to skip the difficult bits, at least for the moment. If I were to handle the difficult bits now, I would end up producing a parser for a complete programming language, almost.  Below is an example of the sort of thing I am trying to skip - i.e. the 'where' statement.  Now, because the grammar is such that the 'where' statement, or statements, immediately precede the 'end_type' keyword, I thought I'd gobble to 'end_type'.

In the example below, at the moment, 'typeid' and 'underlyingtype' eventually come down to a simple string identifier of 'a..z'('a-z'|'_')*

type dayinmonth = integer;
 where
    validrange : {1 <= self <= 31};
end_type;
  
The modified grammar for this is...

typedecl
    : 'type' typeid '=' underlyingtype ';' (options {greedy=false;} : .* ) 'end_type' ';'
    ;

I can't just skip to the next ';' because there may be several statements i.e.

Where
   Label1 : stuff1;
   Label2: stuff2;

What I find is that the '{' and '}' within the gobble process become significant.  In other similar cases I find a '|', or even a carriage return '\r', is significant.  Using the Eclipse add-in, testing just this sub-graph produces different (although successful in both cases) results depending on whether the '{' or '}' is surrounded by whitespace or not.  Somehow, if it is surrounded by whitespace, the '{' token disappears from the parse tree.  But when trying to parse the text properly in context, it throws up an error.  I also found that changing the '{' to '(' removed the error.

In the end, I managed to parse an 12000 line file with only this type of error.

This was a long introduction for just a couple of short questions.  Are there significant characters that can affect the gobble process?  Do I need other options to be able to skip everything to 'end_type'?

Thanks.

Dugald Wilson

_____________________________________________________________________
The information contained in this message, together with any attachments, may be legally privileged or confidential and is intended only for the use of the individual(s) or entity named above. If you are not the intended recipient, you are notified that any dissemination, distribution or copying of this message is strictly prohibited.  If you have received this message in error, please notify us immediately before deleting it.

This message has been checked for all known viruses through MessageLabs Virus Control Centre, for and on behalf of the AVEVA Group. Although no viruses were found it is the recipient's responsibility to ensure that this message is safe for use on their system.

AVEVA Group plc is a Public Limited Company registered in England with registered number 2937296.  The registered office of AVEVA Group plc is High Cross, Madingley Road, Cambridge, England CB3 0HB


More information about the antlr-interest mailing list