[antlr-interest] Question about idiom.
John B. Brodie
jbb at acm.org
Sat Jan 9 18:40:04 PST 2010
Greetings!
On Sun, 2010-01-10 at 10:04 +0800, Michael Richter wrote:
> 2010/1/9 Kay Röpke <kroepke at classdump.org>
>
> >
> > On Jan 9, 2010, at 5:32 AM, Michael Richter wrote:
> >
> > > I keep coming across a pattern in a grammar I'm working on. This pattern
> > > looks something like this:
> > >
> > > - A production can be *A*.
> > > - A production can be *B*.
> > > - A production can be *A B.*
> > >
> > > In the grammar I'm transcribing this from, the notation used is *(A &
> > B)*.
> > > Is there some convenient way to code that in ANTLR's EBNF notation? I
> > keep
> > > having to do *(A | B | A B)*. As is that isn't all that onerous as-is, I
> > > admit, but imagine if A is five tokens long and B is also five tokens
> > long
> > > and then imagine this kind of pattern happening about twenty times in the
> > > grammar. Is there a way to concisely do this?
> >
> > What is the restriction on the parts of the production?
> > I.e. what differentiates a valid production from an invalid one?
> >
>
> The restriction is exactly as I put it: You can have A (where A is a
> multi-token set of specified order), B (where B is a multi-token set of
> specified order) or A B. It *must* be in the order provided and A and B are
> fixed token sets.
>
1) make a parser rule to recognize the sequence of Tokens (and/or other
parser rules) comprising A; and call it, say, as: recognize_A.
2) make a parser rule to recognize the sequence of Tokens(and/or other
parser rules) comprising B; and call it, say, as: recognize_B.
3) make a parser rule of the form:
an_A_or_B_or_AB : recognize_A ( recognize_B )? | recognize_B ;
observe the proper left-factoring in the above...
4) use the above parser rule `an_A_or_B_or_AB` from 3) everywhere you
have the (A|B|A B) stuff.
note that if A and B share a common prefix (e.g. a common left-factor)
you will probably experience issues with the above 4 steps.
> Think of it this way: you're declaring a variable. You have a token for the
> variable, then an optional type specification (A -- multiple tokens) and an
> optional initializer (B -- multiple tokens). Both parts are optional, but
> you *must* have at least one and the declarations *must* be in the order of
> type then initializer if both are present. The only way I've found to do it
> is (A | B | A B), but this is painful when A and B are more than one token
> in length and I've got about 20 of these things in the grammar. This is
> just begging for typos.
this example REALLY FAILS for me. It is hard for me to envision a
language the can initialize a variable (e.g. B) without any declaration
of that variable (e.g. A). So having a bare naked B under the above
example makes no sense to me. Maybe you meant something like: (A B? C?)
where A is the var decl, B is its type and C is its initial value...
Hope this helps....
-jbb
More information about the antlr-interest
mailing list