[antlr-interest] handling line-based data with stanzas

Tue May 11 07:59:48 PDT 2004

I have been writing a few grammars for a few different file formats
that are line-based but also are organized into stanzas.
Most of these look like:
---
header,stuff ican,parse,easily

start stanzatypefoo
column header,column header,column header
value,value,value
start stanzatypebar
column header,column header,column header
value,value,value
---

The way I approach this now is to have some shared lexers that just
spit out a TokenStream of FIELD, DELIM and NEWLINE tokens. Then I have
a parser which imports the exported vocab from a parser, and builds an
AST. In the parser I usually try to remove tokens I don't really care
about, like the DELIMs. Then I have a TreeParser which goes through
the AST and populates some data structures.

This works ok, but I think I am missing something. Often I want to
skip entire stanzas, etc. And since the AST is flat (I don't do any
special imaginary tokens or anything) the tree parser ends up having
most of the complication. I am now carefully reading through the tree
building section of the ANTLR documentation, but hoped that this was a
common/simple enough problem that someone might have some clues.

As an aside, some of this may be due to my seeming inability to match
string literals at the parser level. I try to define different stanza
rules based on what the stanza header contents are, but I don't seem
to be able to do this. I will get an error like:
line 18:1: expecting "Data:", found 'Data:'

When my grammar has:
matchRule: dataString DELIM FIELD

dataString: "Data:"

I believe this may be because I am importing the token vocab from the
shared lexer using importVocab, but I don't know.

How would someone who is a bit more experienced with ANTLR handle this
type of data so that I could walk around the tree and skip stanzas
easily? I think I should be doing something with imaginary tokens, but
when I experimented with them based on the examples in the
distribution it didn't quite seem to work the way I expected.

Does anyone with more expertise using antlr have any advice or a good
way of going about parsing stanza-based/line-based data coming from a
simplistic lexer that just gives FIELD, DELIM and NEWLINEs? I'd rather
not have to put more logic in the lexer, as then I couldn't share the
lexer as easily.

Thanks in advance,
Chris

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/