[antlr-interest] ANTLR 3: Newbie EEPROM Binary Parsing

Wed Nov 8 12:32:31 PST 2006

If the construct is always header { yourstuff } stuff to throw away until end, then:

1) Make sure your lexer is able to match any character that might come up, perhaps with a final lexer rule to catch anything that isn't normally matched;
2) After the rule that matches the final brace have rule that consumes anything else (which I realize was in fact your question ;-);

This works with the "literal" input you give in your example:

grammar t;

file
	: version contents eeprom_data ;

version
	: VBF_VERSION EQ INT (DOT INT)? SEMI ;

contents
	: HEADER LBRACE
			CONTENTS
		 RBRACE
	;

eeprom_data
	: .*
	;

VBF_VERSION	:	'vbf_version' 	;
HEADER	:	'header'		;	

EQ		: 	'=' 			;
LBRACE	:	'{'			;
RBRACE	:	'}'			;

CONTENTS:	'/*Contents*/'		;

DOT		:	'.'			;
SEMI		:	';'			;
INT		: '0'..'9' ('0'..'9')*	;

WS		: (' ' | '\t' | '\n' | '\r')+	{ channel=99; }	;

DATA		: . 				;

However, it could be simpler than this, in that if you just want to stop doing anything after the RBRACE, then don't have the eeprom_rule at all. I think it will just naturally end. This assumes that there is an unambiguous end of course. 

If you don't have the lexer catch all though and there are characters that would other wise not match in the lexer you need to capture them as the lexer doesn't 'know' that you are done. You can also add a skip to the DATA lexing rule too of course. If there can only be one '}' then you could just use a lexer rule to find that then consume all the rest of the input as the text for RBRACE... and so on.

Experiment a bit, but that should give you the basic idea I think.

Jim

-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Christian Bryan

Hi,
	I am currently taking a crash course in ANTLR 3 and language  
recognition in general for my latest project which involves parsing a  
file from an automotive company. The format of the file is like the  
following:

vbf_version = 2.2;
header {
	/*Contents*/
} &^%$£@@&*% ...

I have not a problem in writing the lexer or parser rules to parse  
the vbf_version line the header or its contents. My problem is how do  
I write a rule to get the lexer to consume everything after the right  
curly brace including spaces as this is EEPROM data.

Cheers

Christian. 

-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.430 / Virus Database: 268.13.30/521 - Release Date: 11/7/2006 3:30 AM

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.430 / Virus Database: 268.13.32/523 - Release Date: 11/7/2006 1:40 PM