[antlr-interest] ANTLR Basic Question
Richard Gildea
rgildea at gmail.com
Fri Jul 9 14:48:44 PDT 2010
Hi Klaus,
This is quite a coincidence, as I have recently written a CIF parser for the
cctbx (Computational Crystallograpy Toolbox) using ANTLR. You can find a C
language version of an ANTLR CIF grammar here:
http://cctbx.svn.sourceforge.net/viewvc/cctbx/trunk/iotbx/cif/cif.g?view=markup
It is somewhat convoluted with building the CIF model during parsing, but
you should be able to strip away that and get a working CIF parser in your
chosen target language (it looks like you are wanting Java).
Thanks,
Richard
On 9 July 2010 20:10, Klaus Martinschitz <klausmartinschitz at gmail.com>wrote:
> Hi ANTLR Gurus,
>
> A beginner's question.
> I want to write a compiler for Crystallographic Information File Format
> ' (CIF). I don't want to explain the syntax in detail only the problem I
> have to face with.
>
> The data starts with a token
>
> 'data_'
>
> followed by arbitrary characters and an EOL, e.g.
>
> data_global
> .
>
> There is also a token
>
> 'loop_';
>
> Somewehere in my BNF I write something like
>
> DATA
> :(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_')
> ;
>
> LOOP
> :
> (('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_')
> ;
>
> dataBlockHeading
> : (DATA NONBLANCKCHAR+ EOL)
> ;
>
> dataItem
> : (tag WHITESPACE value) | (LOOP loopHeader loopBody)
> ;
>
> The first two expressions are tokens the second are rules. My problem is
> following. The file starts with
>
> data_global
>
> BUT the *lo* of data_g*lo*bal is parsed from the LOOP token. How can
> this be if the parser is in the dataBlockHeadingrule? The parser must
> know that the characters *lo* belong to NONBLANCKCHAR and not to LOOP,
> or?
>
> I have attached the whole syntax at the end of the file
>
> Thanks for help
>
> Regards,
> Klaus
>
>
>
>
>
>
>
>
>
>
>
>
> grammar CIF1_1;
>
> options{
> language=Java;
> }
>
> @lexer::header{
> package at.netcrystals.cif_1_1.parser;
> }
>
> @parser::header{
> package at.netcrystals.cif_1_1.parser;
> }
>
>
> DATA
> :(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_')
> ;
>
> LOOP
> :
> (('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_')
> ;
>
> fragment ORDINARYCHAR
> : '!' | '%' | '&' | '(' | ')' | '*' | '+' | ',' | '-' | '.' |
> '/' | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | ':' |
> '<' | '=' | '>' | '?' | '@' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' |
> 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' |
> 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' | '\\' | '^' | '\`' | 'a' | 'b'
> | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n'
> | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z'
> | '{' | '|' | '}' | '~'
> ;
>
>
> NONBLANCKCHAR
> : ORDINARYCHAR | '"' | '#' | '$' | '\'' | '_' | ';' | '[' | ']'
> ;
>
>
>
> WHITESPACE
> : '\t'|' '
> ;
>
>
>
> /************************************************************************************************
> WhiteSpace and Comments
>
> ************************************************************************************************/
>
>
>
>
>
>
> EOL
> :'\n'|'\r\n'
> ;
>
>
>
>
>
>
>
> /************************************************************************************************
> *
> * Root
> *
>
> ************************************************************************************************/
>
> cif
> : (dataBlock) EOF
> ;
>
> dataBlock
> : (dataBlockHeading dataItems)
> ;
>
> dataBlockHeading
> : (DATA NONBLANCKCHAR+ EOL)
> ;
>
>
> dataItems
> : dataItem* EOL
> ;
>
> dataItem
> : (tag WHITESPACE value) | (LOOP loopHeader loopBody)
> ;
>
> tag
> : NONBLANCKCHAR+
> ;
>
>
> value
> : '.' | '?' | charString
> ;
>
> charString
> : singleQuotedString
> ;
>
> singleQuotedString
> : '\'' NONBLANCKCHAR* '\''
> ;
>
> loopHeader
> : ( (WHITESPACE tag)+)
> ;
>
> loopBody
> : value (WHITESPACE value)+
> ;
>
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list