[antlr-interest] detecting transitions in stanza-based files

Tue May 10 09:14:43 PDT 2005

I've had better luck by keeping the ANTLR side simple and using a bit  
of Java code. Something like this (which is off the top of my head):

tokens
     { LONG_LINE; SHORT_LINE; }

stanza
     :    STANZA
         {  // Classify this according to length
             AST probe = #stanza;
             int length = 0;
             while (probe != null) {
                 probe = probe.getNextSibling();
                 length++;
             }
             if (length > 3)
                 #stanza = (#[LONG_LINE, "long"], #stanza);
             else
                 #stanza = (#[SHORT_LINE, "short"], #stanza);
       }
     ;

STANZA
     :    (FIELD DELIM)+ NEWLINE
     ;

I'd normally run a tree walker next, which is why stanza is emitting  
tokens at the end. You could choose to do that, or replace the  
"#stanza = ..." code with actual action to take.

If you do run a tree walker next, you could hoist the logic up into  
there, letting it count the list elements and make the appropriate  
decision.

  ...Richard

On May 10, 2005, at 8:08, Chris Black wrote:

> Bryan Ewbank wrote:
>
>
>> The core problem is that shortLine and longLine have the same
>> left-match.  If this is true, it's perhaps best to simply /parse/
>> everything (assume longLine?), then use a tree walker to break out
>> stanzas using a semantic predicate.
>>
>>
> shortLine and longLine do have the same left-match, but I thought a  
> sufficient value for the k lookahead would take care of that. I  
> don't understand why it doesn't. My current structure is a lexer  
> that just generates FIELDs, DELIMs and NEWLINEs, a parser that  
> looks at this token stream to generate an AST with the stanzas  
> separated out using marker tokens and such, and then a tree parser  
> that goes through this tree fetching out FIELDs and arranging them  
> into a data structure. Is this a fundamentally incorrect approach?  
> Since my lexer and tree parser already work (and still work fine on  
> many input files), I was hoping to do the fix in parser space,  
> perhaps using syntactic predicates.
> I've never done syntactic or semantic predicates and am reading up  
> on them now. Shouldn't there be a way to handle the longLine/ 
> shortLine with syntactic predicates such as:
> line: FIELD DELIM FIELD DELIM FIELD NEWLINE => shortLine
>     | FIELD DELIM FIELD DELIM FIELD DELIM FIELD => longLine
>
> With a sufficient lookahead? Would this work?
>
> Thanks again,
> Chris
>
>
>> On 5/10/05, Chris Black <chris at lotuscat.com> wrote:
>>
>>
>>> multStanzas: (stanza)+
>>> stanza: shortLine (longLine)+
>>>
>>> shortLine: FIELD DELIM FIELD DELIM FIELD NEWLINE
>>> longLine: FIELD DELIM FIELD (DELIM FIELD)+ NEWLINE
>>>
>>>
>
>