[antlr-interest] detecting transitions in stanza-based files

Chris Black cblack0 at yahoo.com
Mon May 9 13:18:53 PDT 2005


I've been writing grammars for various stanza-based
data files for awhile now and have discovered a few
good practices (some of which were guided by responses
to my post to this list a year ago:
http://www.antlr.org/pipermail/antlr-interest/2004-May/007890.html).


I am still experiencing some difficulties and would
like some input from the list.

Firstly the issue which provoked this posting, I have
a data format that looks like:

some,header
fields,I parse,fine

more,header,stuff
more,header,stuff
more,header,stuff

Data Type:,Foo,,,,,,,,,,,,,
real,data,num,num,num,num....
real,data,num,num,num,num....
real,data,num,num,num,num....
Data Type:,Bar,,,,,,,,,,,,,
real,data,num,num,num,num....
real,data,num,num,num,num....
real,data,num,num,num,num....

The problem I am having is detecting the transition
between real data lines and the start of the next
stanza starting with a data type header. In addition
sometimes the data type header is just:
Foo,,,,,,,

All the extra commas are sometimes there, sometimes
not, depending on whether the data file has been
mangled by excel or not.

Parts of my grammar are posted below. Note that I use
curDT to track the last seen data type header string
and use that to set the AST token type for the stanza.

In previous parsers I didn't have much of a problem
because there were newlines separating stanzas, but in
this case there aren't and my grammar does not seem to
detect the change from a bunch of record line rule
matches into a data header match rule.

What is the best way of handling this transition? I am
wondering if semantic/syntactic predicates may be the
best way of writing a grammar to handle this sort of
situation as currently even when working my grammars
be spittin' mad nondeterminism warnings on
compilations, yo!

I'd greatly appreciate any advice on how to handle
this transition or general pointers on stanza-based
parsers or things I'm doing wrong.

The relevant parts of my grammar are:

advancedDataTypeHeader:!
	{ System.err.println("adv header");
System.err.flush(); }
	FIELD DELIM
	dataType:FIELD
	(DELIM)*
	NEWLINE
	{
		curDT = dataType.getText();
	} ;

basicDataTypeHeader:!
	{ System.err.println("basic header");
System.err.flush(); }
	firstToken:FIELD
	(DELIM)* NEWLINE
	{
		String firstTokenStr = firstToken.getText();
		if(firstTokenStr.startsWith("Result")) {
			curDT = "Median";
		} else {
			curDT = "Count";
		}
	} ;

dataTypeHeader:! (advancedDataTypeHeader |
basicDataTypeHeader) ;

dataStanza: dataTypeHeader
	recordLine (recordLine)+ 
	(NEWLINE!)?
	{ 
		if(curDT.equals("Median")) {
			## = #([MEDIANSTANZA, curDT], ##);
		} else if(curDT.equals("Count")) {
			## = #([COUNTSTANZA, curDT], ##);
		} else {
			## = #([IGNORESTANZA, curDT], ##);
		}
	}
	;

recordLine: FIELD^ DELIM! optionalSampleName
	DELIM! FIELD 
	(DELIM! FIELD)+
	optionalNotes NEWLINE ;

Thanks in advance,
Chris

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the antlr-interest mailing list