[antlr-interest] Parsing tab delimited datamatrix

Mon May 15 10:33:43 PDT 2006

on Mon, 15 May 2006 17:48:10 +0200, Martin Eklund wrote:
>I'm a complete newbie when it comes to ANTLR and would be very thankful
>to get a few pointers. I would like to parse a file containing a tab
>delimited datamatrix. The first row of the file contains column headers,
>whereas all the other rows contain first an identifier and then doubles
>separated by tabs. Ex:
>
>I'd	like	to	parse
>this	12.3	1.2	3
>type	1.54	5	21.1
>of	12.3	1	4
>file	7	1	4.9

I respectfully suggest that using Antlr (or any other parser
generation tool) is over-kill for this task.

Why not just write the few lines it takes to do this using Java's
Scanner class (please see the API description for java.util.Scanner).

>During the parsing I would like to put the column header identifiers
>into a array of string, the row "header" identifiers into another aray
>of strings and the doubles into a jama matrix (basically just a
>double[][]). My idea is that the wee parser and lexer below is pretty
>much what I need. However, I suppose I need to add some actions where
>the start are (please see below). The problem is what to add... How do I
>for instance know how big to make the String[] and the double[][]..?

I would also suggest use of java.util.Vector rather than an array;
thus avoiding the need to know in advance how big to make each of the
arrays.  If arrays are a requirement, then gather the data into
Vectors (or Lists) and, at the end of the input, translate those
results into the necessary arrays.

>columnHeaders
>	: (TEXT)+ NEWLINE
>	***** What goes here? *****
>	;
>rows
>	:TEXT (NUM)+
>	***** And what goes here? *****
>	;

If you insist on an Antlr solution, here is my suggestion (note that
this just off the top of my head, have not actually tried to run it
through the Antlr tool):

file : // main entry point for the parser, process an entire file.
    {
      Vector<String> heading = new Vector<String>();
      Vector<String> row_ids = new Vector<String>();
      Vector<Vector<Double>> data = new Vector<Vector<Double>>();
      // passing these results back out of the Antlr generated code is
      // left as an exercise for the reader...
    }

    columnHeaders[heading]

    (
      { Vector<Double> aRow=null; }
      aRow=rows[row_ids] {data.addElement(aRow);}
    )+

    EOF
    ;

columnHeaders [ Vector<String> h ] :
	(t:TEXT { h.addElement(t.getText()); } )+ NEWLINE
	;

rows [ Vector<String> r ] returns [ Vector<Double> d ] :

     { d = new Vector<Double>(); }

     t:TEXT { r.addElement(t.getText()); }

     ( n:NUM { d.addElement(Double.parseDouble(n.getText())); } )+

     NEWLINE
     ;

Hope this helps...
   -jbb