[antlr-interest] Parsing fields in a CSV file

Rick Schumeyer rschumeyer at gmail.com
Tue Mar 10 10:56:53 PDT 2009


What is the best strategy for parsing a CSV file?  Most CSV files I have
seen are simple one big data table.  The files I have are structured, with
different sections containing parameters, tables, etc.

I can easily parse the file into fields.  Most of the time, I don't care
what the field contains.  But sometimes, a field contains a timestamp like:

Fri Oct 24 09:54:27 EDT 2008,

I want to parse each individual piece of this field.  My understanding is
that ANTLR will try to return the longest match to a token it can find.  So
even if I want it to look for
STRING STRING INT INT:INT:INT etc.

it will simply return

FIELD

unless I do something with predicates.

If I do something in the parser like

timestamp : (STRING STRING INT INT:INT:INT etc) => (STRING STRING INT
INT:INT:INT etc);

will that work?  Will that cause the lexer to return each small token, or
will it still return a FIELD?

A couple of questions:

It seems to me that I want to use a predicate in the lexer based on what is
happening in the parser.  Is there an easy way to do that?  (I didn't see an
example of that in the ANTLR book).

Another alternative, which sounds crazy but I'll ask anyway, is to write a
separate parser that only parses the above date/time field, and call that
parser from within the rule that receives the timestamp field.

Thanks for any help!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090310/f847df4c/attachment.html 


More information about the antlr-interest mailing list