[antlr-interest] detecting transitions in stanza-based files
Richard Clark
rdclark at nextquestion.net
Tue May 10 09:14:43 PDT 2005
I've had better luck by keeping the ANTLR side simple and using a bit
of Java code. Something like this (which is off the top of my head):
tokens
{ LONG_LINE; SHORT_LINE; }
stanza
: STANZA
{ // Classify this according to length
AST probe = #stanza;
int length = 0;
while (probe != null) {
probe = probe.getNextSibling();
length++;
}
if (length > 3)
#stanza = (#[LONG_LINE, "long"], #stanza);
else
#stanza = (#[SHORT_LINE, "short"], #stanza);
}
;
STANZA
: (FIELD DELIM)+ NEWLINE
;
I'd normally run a tree walker next, which is why stanza is emitting
tokens at the end. You could choose to do that, or replace the
"#stanza = ..." code with actual action to take.
If you do run a tree walker next, you could hoist the logic up into
there, letting it count the list elements and make the appropriate
decision.
...Richard
On May 10, 2005, at 8:08, Chris Black wrote:
> Bryan Ewbank wrote:
>
>
>> The core problem is that shortLine and longLine have the same
>> left-match. If this is true, it's perhaps best to simply /parse/
>> everything (assume longLine?), then use a tree walker to break out
>> stanzas using a semantic predicate.
>>
>>
> shortLine and longLine do have the same left-match, but I thought a
> sufficient value for the k lookahead would take care of that. I
> don't understand why it doesn't. My current structure is a lexer
> that just generates FIELDs, DELIMs and NEWLINEs, a parser that
> looks at this token stream to generate an AST with the stanzas
> separated out using marker tokens and such, and then a tree parser
> that goes through this tree fetching out FIELDs and arranging them
> into a data structure. Is this a fundamentally incorrect approach?
> Since my lexer and tree parser already work (and still work fine on
> many input files), I was hoping to do the fix in parser space,
> perhaps using syntactic predicates.
> I've never done syntactic or semantic predicates and am reading up
> on them now. Shouldn't there be a way to handle the longLine/
> shortLine with syntactic predicates such as:
> line: FIELD DELIM FIELD DELIM FIELD NEWLINE => shortLine
> | FIELD DELIM FIELD DELIM FIELD DELIM FIELD => longLine
>
> With a sufficient lookahead? Would this work?
>
> Thanks again,
> Chris
>
>
>> On 5/10/05, Chris Black <chris at lotuscat.com> wrote:
>>
>>
>>> multStanzas: (stanza)+
>>> stanza: shortLine (longLine)+
>>>
>>> shortLine: FIELD DELIM FIELD DELIM FIELD NEWLINE
>>> longLine: FIELD DELIM FIELD (DELIM FIELD)+ NEWLINE
>>>
>>>
>
>
More information about the antlr-interest
mailing list