[antlr-interest] ANTLR Basic Question
Jim Idle
jimi at temporal-wave.com
Fri Jul 9 15:00:20 PDT 2010
First add a catch all to your lexer as the last rule:
ANY : . { skip(); /* or error */ } ;
Then change your NONBLOCKING to:
CHARSEQ : ('a'..'z')+ /* or whatever it is */
And put this rule after the keywords.
If that fails then add a predicate.
Jim
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Klaus Martinschitz
> Sent: Friday, July 09, 2010 12:11 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] ANTLR Basic Question
>
> Hi ANTLR Gurus,
>
> A beginner's question.
> I want to write a compiler for Crystallographic Information File Format
> ' (CIF). I don't want to explain the syntax in detail only the problem
> I
> have to face with.
>
> The data starts with a token
>
> 'data_'
>
> followed by arbitrary characters and an EOL, e.g.
>
> data_global
> .
>
> There is also a token
>
> 'loop_';
>
> Somewehere in my BNF I write something like
>
> DATA
> :(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_')
> ;
>
> LOOP
> :
> (('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_')
> ;
>
> dataBlockHeading
> : (DATA NONBLANCKCHAR+ EOL)
> ;
>
> dataItem
> : (tag WHITESPACE value) | (LOOP loopHeader loopBody)
> ;
>
> The first two expressions are tokens the second are rules. My problem
> is
> following. The file starts with
>
> data_global
>
> BUT the *lo* of data_g*lo*bal is parsed from the LOOP token. How can
> this be if the parser is in the dataBlockHeadingrule? The parser must
> know that the characters *lo* belong to NONBLANCKCHAR and not to LOOP,
> or?
>
> I have attached the whole syntax at the end of the file
>
> Thanks for help
>
> Regards,
> Klaus
>
>
>
>
>
>
>
>
>
>
>
>
> grammar CIF1_1;
>
> options{
> language=Java;
> }
>
> @lexer::header{
> package at.netcrystals.cif_1_1.parser;
> }
>
> @parser::header{
> package at.netcrystals.cif_1_1.parser;
> }
>
>
> DATA
> :(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_')
> ;
>
> LOOP
> :
> (('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_')
> ;
>
> fragment ORDINARYCHAR
> : '!' | '%' | '&' | '(' | ')' | '*' | '+' | ',' | '-' | '.' |
> '/' | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | ':' |
> '<' | '=' | '>' | '?' | '@' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' |
> 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' |
> 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' | '\\' | '^' | '\`' | 'a' | 'b'
> | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n'
> | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z'
> | '{' | '|' | '}' | '~'
> ;
>
>
> NONBLANCKCHAR
> : ORDINARYCHAR | '"' | '#' | '$' | '\'' | '_' | ';' | '[' | ']'
> ;
>
>
>
> WHITESPACE
> : '\t'|' '
> ;
>
>
> /**********************************************************************
> **************************
> WhiteSpace and Comments
> ***********************************************************************
> *************************/
>
>
>
>
>
>
> EOL
> :'\n'|'\r\n'
> ;
>
>
>
>
>
>
> /**********************************************************************
> **************************
> *
> * Root
> *
> ***********************************************************************
> *************************/
>
> cif
> : (dataBlock) EOF
> ;
>
> dataBlock
> : (dataBlockHeading dataItems)
> ;
>
> dataBlockHeading
> : (DATA NONBLANCKCHAR+ EOL)
> ;
>
>
> dataItems
> : dataItem* EOL
> ;
>
> dataItem
> : (tag WHITESPACE value) | (LOOP loopHeader loopBody)
> ;
>
> tag
> : NONBLANCKCHAR+
> ;
>
>
> value
> : '.' | '?' | charString
> ;
>
> charString
> : singleQuotedString
> ;
>
> singleQuotedString
> : '\'' NONBLANCKCHAR* '\''
> ;
>
> loopHeader
> : ( (WHITESPACE tag)+)
> ;
>
> loopBody
> : value (WHITESPACE value)+
> ;
>
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address
More information about the antlr-interest
mailing list