[antlr-interest] Parsing a sequence of objects
Stijn de Gouw
C.P.T.de.Gouw at cwi.nl
Wed Jan 27 01:19:19 PST 2010
Given an attribute grammar (with probably only synthesized attributes),
instead of parsing a sequence of terminal strings, I want to parse a
sequence (array) of (Java) Objects. Each object o has 3 fields:
(1) String name
(2) Object[] p
(3) String c
The terminals in the grammar correspond exactly to the name field of an
object (each o.name is a terminal), so parsing decisions should be done
based on this field (perhaps no lexer is needed?).
In the attribute grammar the other two fields of the object must be used
as attributes of the terminal (note that the values of these attributes
are NOT given by a production in the grammar!! but instead are given
(before parsing) in each object), and it must be possible to define the
(synthesized) attributes of non-terminals in terms of the attributes of
the terminals (namely, the o.p and o.c fields). To make it more clear,
consider the following example (I will denote each object as a triple
(name, p, c)):
Given a sequence of objects
("first", p1, "z"), ("first", p2, "y"), ("last", p3, "z"), ("last",
p4, "x")
and the attribute grammar
S ::= FIRST LAST { $cSet = createset($FIRST.c, $LAST.c); }
| FIRST S1=S LAST { $cSet = union($S1.cSet, createset($FIRST.c,
$LAST.c)); }
where cSet an attribute of type 'set of Strings', createset creates a
new set containing its parameters as elements of the set, and union(a,b)
returns the union of the sets a and b
the parsing of the sequence of objects produces:
S.cSet = {"x","y","z"}
/ | \
/ | \
/ | \
/ | \
/ | \
FIRST = ("first",p1,"z") S.cSet = {"y", "z"} LAST = ("last", p4, "x")
/ \
/ \
/ \
/ \
FIRST = ("first", p2, "y") LAST = ("last", p3, "z")
What would be the best way to implement this? Perhaps subclass the
antlr.Token class to add the Object[] p and String c fields (if so, what
would the best way to create a token stream from the given sequence of
objects)? My current approach, which works but is not very elegant, is to
1) Concatenate all name attributes from the objects in the sequence to
create a single string S
2) Add the array storing the sequence of objects as a @members variable
to the grammar (let's call this array a).
3) In the attribute grammar, one can refer to the terminal attribute
Object[] p of "first" by writing 'a[$FIRST.getTokenIndex()].p' where
FIRST is a terminal defined in the lexer as FIRST: 'first';.
4) Call the parser with as input the string S formed in step 1
More information about the antlr-interest
mailing list