[antlr-interest] same char but different context
Jim Idle
jimi at temporal-wave.com
Sat Nov 28 10:38:20 PST 2009
Hard to tell what the format is from this, but presumably each new single character type introducer is the first non-whitespace after a newline. If this is the case then you need to take the lexer tokens out of the tokens section and create real LEXER rules that have a predicate based on a Boolean switch, which is set to true after seeing a newline and set to false after seeing the single character. Then you don't want an ANY rule, you want a rule that consumes to end of line. So you want something like this:
grammar T;
options {
output = AST;
}
@lexer::members {
boolean isType = true;
}
start : header record+ EOF;
header : KEYWORD_TYPE RECORD ;
record : item+ END_OF_RECORD;
item : item_type RECORD ;
item_type : (TYPE_DATE
|TYPE_AMOUNT
|TYPE_MEMO
|TYPE_CLEARED
|TYPE_CHECK_NUMBER
|TYPE_PAYEE
|TYPE_PAYEE_ADDRESS
|TYPE_CATEGORY
|TYPE_REIMBURSE
|TYPE_SPLIT_CATEGORY
|TYPE_SPLIT_MEMO
|TYPE_SPLIT_AMOUNT
|TYPE_SPLIT_PERCENTAGE
|TYPE_SECURITY_NAME
|TYPE_PRICE
|TYPE_SHARE_QUANTITY
|TYPE_COMMISSION_COSTS
);
KEYWORD_TYPE : ('!Type:')=>'!Type:' { isType=false; };
END_OF_RECORD : '^';
TYPE_DATE : {isType}?=> 'D' { isType=false; };
TYPE_AMOUNT : {isType}?=> 'T' { isType=false; };
TYPE_MEMO : {isType}?=> 'M' { isType=false; };
TYPE_CLEARED : {isType}?=> 'C' { isType=false; };
TYPE_CHECK_NUMBER : {isType}?=> 'N' { isType=false; };
TYPE_PAYEE : {isType}?=> 'P' { isType=false; };
TYPE_PAYEE_ADDRESS : {isType}?=> 'A' { isType=false; };
TYPE_CATEGORY : {isType}?=> 'L' { isType=false; };
TYPE_REIMBURSE : {isType}?=> 'F' { isType=false; };
TYPE_SPLIT_CATEGORY : {isType}?=> 'S' { isType=false; };
TYPE_SPLIT_MEMO : {isType}?=> 'E' { isType=false; };
TYPE_SPLIT_AMOUNT : {isType}?=> '$' { isType=false; };
TYPE_SPLIT_PERCENTAGE : {isType}?=> '%' { isType=false; };
TYPE_SECURITY_NAME : {isType}?=> 'Y' { isType=false; };
TYPE_PRICE : {isType}?=> 'I' { isType=false; };
TYPE_SHARE_QUANTITY : {isType}?=> 'Q' { isType=false; };
TYPE_COMMISSION_COSTS : {isType}?=> 'O' { isType=false; };
fragment NLCHARS : '\r'|'\n';
NEWLINE : ('\r'? '\n')+ { isType=true; $channel=99; };
RECORD : {!isType}?=>(~NLCHARS)+ ;
This only works if NEWLINE is the end of one record, signifying the start of another. To be honest, this is so simple that a simple program to scan it and build it all at once may be simpler and better for you - it looks like the record format was designed for a simple scanner. Note that your example uses the command 'H', which is not in your command set, that I have assumed you end of record is on a new line of its own (if not then the record token also needs to exclude '^' in its set). Also note that this is just my best guess from trying to interpolate from the grammar you posted.
Jim
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of codeman at bytefusion.de
> Sent: Saturday, November 28, 2009 1:08 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] same char but different context
>
> Given is a record-per-line format like this:
>
> <single-char><sequence-of-chars><crlf>
>
> <single-char> => single letter
> <sequence-of-chars> => any except end-of-line
> <crlf> => end of line
>
> My problem is the following:
>
> WHello World
>
> "W" => recognized as single char
> "Hello " is broken, W seems to be a new start char
>
> Here is my grammer. Aimed target is to parse a quicken interchange
> format file. Any ideas?
>
>
> grammar myExample;
>
> options {
> output=AST;
> }
>
> tokens {
> TYPE_DATE = 'D';
> TYPE_AMOUNT = 'T';
> TYPE_MEMO = 'M';
> TYPE_CLEARED = 'C';
> TYPE_CHECK_NUMBER = 'N';
> TYPE_PAYEE = 'P';
> TYPE_PAYEE_ADDRESS = 'A';
> TYPE_CATEGORY = 'L';
> TYPE_REIMBURSE = 'F';
> TYPE_SPLIT_CATEGORY = 'S';
> TYPE_SPLIT_MEMO = 'E';
> TYPE_SPLIT_AMOUNT = '$';
> TYPE_SPLIT_PERCENTAGE = '%';
> TYPE_SECURITY_NAME = 'Y';
> TYPE_PRICE = 'I';
> TYPE_SHARE_QUANTITY = 'Q';
> TYPE_COMMISSION_COSTS = 'O';
>
> }
>
> start : header record+ NEWLINE* EOF;
>
> header : KEYWORD_TYPE description NEWLINE;
>
> description : ANY+;
>
> record : item+ END_OF_RECORD;
>
> item : item_type description NEWLINE;
>
> item_type : (TYPE_DATE
> |TYPE_AMOUNT
> |TYPE_MEMO
> |TYPE_CLEARED
> |TYPE_CHECK_NUMBER
> |TYPE_PAYEE
> |TYPE_PAYEE_ADDRESS
> |TYPE_CATEGORY
> |TYPE_REIMBURSE
> |TYPE_SPLIT_CATEGORY
> |TYPE_SPLIT_MEMO
> |TYPE_SPLIT_AMOUNT
> |TYPE_SPLIT_PERCENTAGE
> |TYPE_SECURITY_NAME
> |TYPE_PRICE
> |TYPE_SHARE_QUANTITY
> |TYPE_COMMISSION_COSTS
> );
>
>
> KEYWORD_TYPE : '!Type:';
> NEWLINE : ('\r'|'\n'|'\r\n');
> END_OF_RECORD : '^';
> ANY : ~(NEWLINE);
>
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address
More information about the antlr-interest
mailing list