[antlr-interest] ANTLR Basic Question

Fri Jul 9 12:10:55 PDT 2010

  Hi ANTLR Gurus,

A beginner's question.
I want to write a compiler for Crystallographic Information File Format 
' (CIF). I don't want to explain the syntax in detail only the problem I 
have to face with.

The data starts with a token

'data_'

followed by arbitrary characters and an EOL, e.g.

data_global
.

There is also a token

'loop_';

Somewehere in my BNF I write something like

DATA
     :(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_')
     ;

LOOP
     :
     (('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_')
     ;

dataBlockHeading
     :    (DATA NONBLANCKCHAR+ EOL)
     ;

dataItem
     :    (tag WHITESPACE value) | (LOOP loopHeader loopBody)
     ;

The first two expressions are tokens the second are rules. My problem is 
following. The file starts with

data_global

BUT the *lo* of data_g*lo*bal is parsed from the LOOP token. How can 
this be if the parser is in the dataBlockHeadingrule? The parser must 
know that the characters *lo* belong to NONBLANCKCHAR and not to LOOP,
or?

I have attached the whole syntax at the end of the file

Thanks for help

Regards,
Klaus

grammar CIF1_1;

options{
language=Java;
}

@lexer::header{
package at.netcrystals.cif_1_1.parser;
}

@parser::header{
package at.netcrystals.cif_1_1.parser;
}

DATA
     :(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_')
     ;

LOOP
     :
     (('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_')
     ;

fragment ORDINARYCHAR
     :     '!' | '%' | '&' | '(' | ')' | '*' | '+' | ',' | '-' | '.' | 
'/' | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | ':' | 
'<' | '=' | '>' | '?' | '@' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 
'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 
'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' | '\\' | '^' | '\`' | 'a' | 'b' 
| 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' 
| 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' 
| '{' | '|' | '}' | '~'
     ;

NONBLANCKCHAR
     :    ORDINARYCHAR | '"' | '#' | '$' | '\'' | '_' | ';' | '[' | ']'
     ;

WHITESPACE
     :    '\t'|' '
     ;

/************************************************************************************************
     WhiteSpace and Comments
************************************************************************************************/

EOL
     :'\n'|'\r\n'
     ;

/************************************************************************************************
*
* Root
*
************************************************************************************************/

cif
     :      (dataBlock)   EOF
     ;

dataBlock
     :    (dataBlockHeading dataItems)
     ;

dataBlockHeading
     :    (DATA NONBLANCKCHAR+ EOL)
     ;

dataItems
     :    dataItem* EOL
     ;

dataItem
     :    (tag WHITESPACE value) | (LOOP loopHeader loopBody)
     ;

tag
     :    NONBLANCKCHAR+
     ;

value
     :    '.' | '?' | charString
     ;

charString
     :    singleQuotedString
     ;

singleQuotedString
     :    '\'' NONBLANCKCHAR* '\''
     ;

loopHeader
     :    ( (WHITESPACE tag)+)
     ;

loopBody
     :    value (WHITESPACE value)+
     ;