[antlr-interest] TET journal files

Mon Jul 23 12:01:20 PDT 2007

I have downloaded ANTLR 3.0 / ANTLRWorks 1.1 and have bought a copy of
the new ANTLR book.  Armed with these I decided to use ANTLR  to
generate a Python program to process journal files from TET, the Test
Environment Testkit (http://tetworks.opengroup.org/tet/).

A typical journal line is:

15|<activity> <version> <ICcount>|text

Each line is logically divided into three fields by pipe symbols.

The first field is a line type, one of 36 integer values between 0 and
900.

The second field has fixed number of zero or more space separated
subfields.  The number of subfields is dictated by the line type in
the first field.  Baring perverse choices of path names these
subfields do not contain either spaces or pipe symbols.

The third field can contain a general text string to the terminating
newline.  In particular, this string can contain spaces and pipe
symbols.

There is sufficient information to unambiguously identify the data in
these lines.

My first attempt was to code a grammar with rules for the lines of the
form:

testCaseManagerStart
        // 15|activity version ICcount|text
        :       '15' PIPE INT ' ' STRING ' ' INT PIPE TEXT NEWLINE
        ;

However, I ran into a problem with the lexer as the line type number
('15' in this case) would be selected as a separate token causing
confusion with the small integers that appear as subfields of the
second field.

Is there an ANTLR idiom used to handle situations like this?

Thanks
--
John Connett