[antlr-interest] detecting transitions in stanza-based files

Tue May 10 06:40:14 PDT 2005

Chris Black wrote:

>[snip]
>  
>
I decided perhaps paring down my query would make it a bit easier to 
read. Sorry for the initial long-winded post. My main problem is trying 
to detect a transition between lines of 3+ FIELDs long and one of less 
than 3 FIELDs. I have a token stream after the lexer has run like:
FIELD DELIM FIELD NEWLINE
FIELD DELIM FIELD DELIM FIELD DELIM FIELD DELIM NEWLINE
FIELD DELIM FIELD DELIM FIELD DELIM FIELD DELIM NEWLINE
FIELD DELIM FIELD DELIM FIELD DELIM FIELD DELIM NEWLINE
FIELD DELIM FIELD NEWLINE
FIELD DELIM FIELD DELIM FIELD DELIM FIELD DELIM NEWLINE
FIELD DELIM FIELD DELIM FIELD DELIM FIELD DELIM NEWLINE
FIELD DELIM FIELD DELIM FIELD DELIM FIELD DELIM NEWLINE

My difficulty is detecting the transitions from a series of long lines 
to the short line separating the stanzas.
What seems to be happening is my rule to match a long line is trying to 
be applied to the short line since I am in a rule looking for any number 
of long lines. Why is this? To simplify, it seems like if I have a few 
rules like:

multStanzas: (stanza)+
stanza: shortLine (longLine)+

shortLine: FIELD DELIM FIELD DELIM FIELD NEWLINE
longLine: FIELD DELIM FIELD (DELIM FIELD)+ NEWLINE

That it tries to match the whole file as one stanza. I thought that once 
the longLine match failed seeing a short line of less than three FIELDs 
that ANTLR would then try to match with a longLine rule. What am I 
missing or doing wrong?

Thanks,
Chris

>Firstly the issue which provoked this posting, I have
>a data format that looks like:
>
>some,header
>fields,I parse,fine
>
>more,header,stuff
>more,header,stuff
>more,header,stuff
>
>Data Type:,Foo,,,,,,,,,,,,,
>real,data,num,num,num,num....
>real,data,num,num,num,num....
>real,data,num,num,num,num....
>Data Type:,Bar,,,,,,,,,,,,,
>real,data,num,num,num,num....
>real,data,num,num,num,num....
>real,data,num,num,num,num....
>
>The problem I am having is detecting the transition
>between real data lines and the start of the next
>stanza starting with a data type header. In addition
>sometimes the data type header is just:
>Foo,,,,,,,
>
>All the extra commas are sometimes there, sometimes
>not, depending on whether the data file has been
>mangled by excel or not.
>
>Parts of my grammar are posted below. Note that I use
>curDT to track the last seen data type header string
>and use that to set the AST token type for the stanza.
>
>In previous parsers I didn't have much of a problem
>because there were newlines separating stanzas, but in
>this case there aren't and my grammar does not seem to
>detect the change from a bunch of record line rule
>matches into a data header match rule.
>
>What is the best way of handling this transition? I am
>wondering if semantic/syntactic predicates may be the
>best way of writing a grammar to handle this sort of
>situation as currently even when working my grammars
>be spittin' mad nondeterminism warnings on
>compilations, yo!
>
>I'd greatly appreciate any advice on how to handle
>this transition or general pointers on stanza-based
>parsers or things I'm doing wrong.
>
>The relevant parts of my grammar are:
>
>advancedDataTypeHeader:!
>	{ System.err.println("adv header");
>System.err.flush(); }
>	FIELD DELIM
>	dataType:FIELD
>	(DELIM)*
>	NEWLINE
>	{
>		curDT = dataType.getText();
>	} ;
>
>basicDataTypeHeader:!
>	{ System.err.println("basic header");
>System.err.flush(); }
>	firstToken:FIELD
>	(DELIM)* NEWLINE
>	{
>		String firstTokenStr = firstToken.getText();
>		if(firstTokenStr.startsWith("Result")) {
>			curDT = "Median";
>		} else {
>			curDT = "Count";
>		}
>	} ;
>
>dataTypeHeader:! (advancedDataTypeHeader |
>basicDataTypeHeader) ;
>
>dataStanza: dataTypeHeader
>	recordLine (recordLine)+ 
>	(NEWLINE!)?
>	{ 
>		if(curDT.equals("Median")) {
>			## = #([MEDIANSTANZA, curDT], ##);
>		} else if(curDT.equals("Count")) {
>			## = #([COUNTSTANZA, curDT], ##);
>		} else {
>			## = #([IGNORESTANZA, curDT], ##);
>		}
>	}
>	;
>
>recordLine: FIELD^ DELIM! optionalSampleName
>	DELIM! FIELD 
>	(DELIM! FIELD)+
>	optionalNotes NEWLINE ;
>
>Thanks in advance,
>Chris
>  
>