[antlr-interest] ANTLR Basic Question

Fri Jul 9 14:48:44 PDT 2010

Hi Klaus,

This is quite a coincidence, as I have recently written a CIF parser for the
cctbx (Computational Crystallograpy Toolbox) using ANTLR.  You can find a C
language version of an ANTLR CIF grammar here:

http://cctbx.svn.sourceforge.net/viewvc/cctbx/trunk/iotbx/cif/cif.g?view=markup

It is somewhat convoluted with building the CIF model during parsing, but
you should be able to strip away that and get a working CIF parser in your
chosen target language (it looks like you are wanting Java).

Thanks,

Richard

On 9 July 2010 20:10, Klaus Martinschitz <klausmartinschitz at gmail.com>wrote:

>  Hi ANTLR Gurus,
>
> A beginner's question.
> I want to write a compiler for Crystallographic Information File Format
> ' (CIF). I don't want to explain the syntax in detail only the problem I
> have to face with.
>
> The data starts with a token
>
> 'data_'
>
> followed by arbitrary characters and an EOL, e.g.
>
> data_global
> .
>
> There is also a token
>
> 'loop_';
>
> Somewehere in my BNF I write something like
>
> DATA
>     :(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_')
>     ;
>
> LOOP
>     :
>     (('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_')
>     ;
>
> dataBlockHeading
>     :    (DATA NONBLANCKCHAR+ EOL)
>     ;
>
> dataItem
>     :    (tag WHITESPACE value) | (LOOP loopHeader loopBody)
>     ;
>
> The first two expressions are tokens the second are rules. My problem is
> following. The file starts with
>
> data_global
>
> BUT the *lo* of data_g*lo*bal is parsed from the LOOP token. How can
> this be if the parser is in the dataBlockHeadingrule? The parser must
> know that the characters *lo* belong to NONBLANCKCHAR and not to LOOP,
> or?
>
> I have attached the whole syntax at the end of the file
>
> Thanks for help
>
> Regards,
> Klaus
>
>
>
>
>
>
>
>
>
>
>
>
> grammar CIF1_1;
>
> options{
> language=Java;
> }
>
> @lexer::header{
> package at.netcrystals.cif_1_1.parser;
> }
>
> @parser::header{
> package at.netcrystals.cif_1_1.parser;
> }
>
>
> DATA
>     :(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_')
>     ;
>
> LOOP
>     :
>     (('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_')
>     ;
>
> fragment ORDINARYCHAR
>     :     '!' | '%' | '&' | '(' | ')' | '*' | '+' | ',' | '-' | '.' |
> '/' | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | ':' |
> '<' | '=' | '>' | '?' | '@' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' |
> 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' |
> 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' | '\\' | '^' | '\`' | 'a' | 'b'
> | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n'
> | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z'
> | '{' | '|' | '}' | '~'
>     ;
>
>
> NONBLANCKCHAR
>     :    ORDINARYCHAR | '"' | '#' | '$' | '\'' | '_' | ';' | '[' | ']'
>     ;
>
>
>
> WHITESPACE
>     :    '\t'|' '
>     ;
>
>
>
> /************************************************************************************************
>     WhiteSpace and Comments
>
> ************************************************************************************************/
>
>
>
>
>
>
> EOL
>     :'\n'|'\r\n'
>     ;
>
>
>
>
>
>
>
> /************************************************************************************************
> *
> * Root
> *
>
> ************************************************************************************************/
>
> cif
>     :      (dataBlock)   EOF
>     ;
>
> dataBlock
>     :    (dataBlockHeading dataItems)
>     ;
>
> dataBlockHeading
>     :    (DATA NONBLANCKCHAR+ EOL)
>     ;
>
>
> dataItems
>     :    dataItem* EOL
>     ;
>
> dataItem
>     :    (tag WHITESPACE value) | (LOOP loopHeader loopBody)
>     ;
>
> tag
>     :    NONBLANCKCHAR+
>     ;
>
>
> value
>     :    '.' | '?' | charString
>     ;
>
> charString
>     :    singleQuotedString
>     ;
>
> singleQuotedString
>     :    '\'' NONBLANCKCHAR* '\''
>     ;
>
> loopHeader
>     :    ( (WHITESPACE tag)+)
>     ;
>
> loopBody
>     :    value (WHITESPACE value)+
>     ;
>
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>