[antlr-interest] ANTLR Basic Question
Klaus Martinschitz
klausmartinschitz at gmail.com
Fri Jul 9 12:10:55 PDT 2010
Hi ANTLR Gurus,
A beginner's question.
I want to write a compiler for Crystallographic Information File Format
' (CIF). I don't want to explain the syntax in detail only the problem I
have to face with.
The data starts with a token
'data_'
followed by arbitrary characters and an EOL, e.g.
data_global
.
There is also a token
'loop_';
Somewehere in my BNF I write something like
DATA
:(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_')
;
LOOP
:
(('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_')
;
dataBlockHeading
: (DATA NONBLANCKCHAR+ EOL)
;
dataItem
: (tag WHITESPACE value) | (LOOP loopHeader loopBody)
;
The first two expressions are tokens the second are rules. My problem is
following. The file starts with
data_global
BUT the *lo* of data_g*lo*bal is parsed from the LOOP token. How can
this be if the parser is in the dataBlockHeadingrule? The parser must
know that the characters *lo* belong to NONBLANCKCHAR and not to LOOP,
or?
I have attached the whole syntax at the end of the file
Thanks for help
Regards,
Klaus
grammar CIF1_1;
options{
language=Java;
}
@lexer::header{
package at.netcrystals.cif_1_1.parser;
}
@parser::header{
package at.netcrystals.cif_1_1.parser;
}
DATA
:(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_')
;
LOOP
:
(('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_')
;
fragment ORDINARYCHAR
: '!' | '%' | '&' | '(' | ')' | '*' | '+' | ',' | '-' | '.' |
'/' | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | ':' |
'<' | '=' | '>' | '?' | '@' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' |
'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' |
'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' | '\\' | '^' | '\`' | 'a' | 'b'
| 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n'
| 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z'
| '{' | '|' | '}' | '~'
;
NONBLANCKCHAR
: ORDINARYCHAR | '"' | '#' | '$' | '\'' | '_' | ';' | '[' | ']'
;
WHITESPACE
: '\t'|' '
;
/************************************************************************************************
WhiteSpace and Comments
************************************************************************************************/
EOL
:'\n'|'\r\n'
;
/************************************************************************************************
*
* Root
*
************************************************************************************************/
cif
: (dataBlock) EOF
;
dataBlock
: (dataBlockHeading dataItems)
;
dataBlockHeading
: (DATA NONBLANCKCHAR+ EOL)
;
dataItems
: dataItem* EOL
;
dataItem
: (tag WHITESPACE value) | (LOOP loopHeader loopBody)
;
tag
: NONBLANCKCHAR+
;
value
: '.' | '?' | charString
;
charString
: singleQuotedString
;
singleQuotedString
: '\'' NONBLANCKCHAR* '\''
;
loopHeader
: ( (WHITESPACE tag)+)
;
loopBody
: value (WHITESPACE value)+
;
More information about the antlr-interest
mailing list