[antlr-interest] Parsing a sequence of objects

Wed Jan 27 01:19:19 PST 2010

Given an attribute grammar (with probably only synthesized attributes), 
instead of parsing a sequence of terminal strings, I want to parse a 
sequence (array) of (Java) Objects. Each object o has 3 fields:
(1) String name
(2) Object[] p
(3) String c
The terminals in the grammar correspond exactly to the name field of an 
object (each o.name is a terminal), so parsing decisions should be done 
based on this field (perhaps no lexer is needed?).

In the attribute grammar the other two fields of the object must be used 
as attributes of the terminal (note that the values of these attributes 
are NOT given by a production in the grammar!! but instead are given 
(before parsing) in each object), and it must be possible to define the 
(synthesized) attributes of non-terminals in terms of the attributes of 
the terminals (namely, the o.p and o.c fields). To make it more clear, 
consider the following example (I will denote each object as a triple 
(name, p, c)):

Given a sequence of objects
    ("first", p1, "z"), ("first", p2, "y"), ("last", p3, "z"), ("last", 
p4, "x")

and the attribute grammar
  S ::=   FIRST LAST { $cSet = createset($FIRST.c, $LAST.c); }
        | FIRST S1=S LAST { $cSet = union($S1.cSet, createset($FIRST.c, 
$LAST.c)); }
where cSet an attribute of type 'set of Strings', createset creates a 
new set containing its parameters as elements of the set, and union(a,b) 
returns the union of the sets a and b

the parsing of the sequence of objects produces:

                        S.cSet = {"x","y","z"}
                       /          |           \
                      /           |            \
                     /            |             \
                    /             |              \
                   /              |               \
FIRST = ("first",p1,"z")   S.cSet = {"y", "z"}    LAST = ("last", p4, "x")
                              /       \
                             /         \
                            /           \
                           /             \
     FIRST = ("first", p2, "y")         LAST = ("last", p3, "z")

What would be the best way to implement this? Perhaps subclass the 
antlr.Token class to add the Object[] p and String c fields (if so, what 
would the best way to create a token stream from the given sequence of 
objects)? My current approach, which works but is not very elegant, is to
1) Concatenate all name attributes from the objects in the sequence to 
create a single string S
2) Add the array storing the sequence of objects as a @members variable 
to the grammar (let's call this array a).
3) In the attribute grammar, one can refer to the terminal attribute 
Object[] p of "first" by writing 'a[$FIRST.getTokenIndex()].p' where 
FIRST is a terminal defined in the lexer as FIRST: 'first';.
4) Call the parser with as input the string S formed in step 1