[antlr-interest] Question about idiom.

Sat Jan 9 18:40:04 PST 2010

Greetings!

On Sun, 2010-01-10 at 10:04 +0800, Michael Richter wrote:
> 2010/1/9 Kay Röpke <kroepke at classdump.org>
> 
> >
> > On Jan 9, 2010, at 5:32 AM, Michael Richter wrote:
> >
> > > I keep coming across a pattern in a grammar I'm working on.  This pattern
> > > looks something like this:
> > >
> > >   - A production can be *A*.
> > >   - A production can be *B*.
> > >   - A production can be *A B.*
> > >
> > > In the grammar I'm transcribing this from, the notation used is *(A &
> > B)*.
> > > Is there some convenient way to code that in ANTLR's EBNF notation?  I
> > keep
> > > having to do *(A | B | A B)*.  As is that isn't all that onerous as-is, I
> > > admit, but imagine if A is five tokens long and B is also five tokens
> > long
> > > and then imagine this kind of pattern happening about twenty times in the
> > > grammar.  Is there a way to concisely do this?
> >
> > What is the restriction on the parts of the production?
> > I.e. what differentiates a valid production from an invalid one?
> >
> 
> The restriction is exactly as I put it: You can have A (where A is a
> multi-token set of specified order), B (where B is a multi-token set of
> specified order) or A B.  It *must* be in the order provided and A and B are
> fixed token sets.
> 

1) make a parser rule to recognize the sequence of Tokens (and/or other
parser rules) comprising A; and call it, say, as: recognize_A.

2) make a parser rule to recognize the sequence of Tokens(and/or other
parser rules) comprising B; and call it, say, as: recognize_B.

3) make a parser rule of the form:

an_A_or_B_or_AB : recognize_A ( recognize_B )? | recognize_B ;

observe the proper left-factoring in the above...

4) use the above parser rule `an_A_or_B_or_AB` from 3) everywhere you
have the (A|B|A B) stuff.

note that if A and B share a common prefix (e.g. a common left-factor)
you will probably experience issues with the above 4 steps.

> Think of it this way: you're declaring a variable.  You have a token for the
> variable, then an optional type specification (A -- multiple tokens) and an
> optional initializer (B -- multiple tokens).  Both parts are optional, but
> you *must* have at least one and the declarations *must* be in the order of
> type then initializer if both are present.  The only way I've found to do it
> is (A | B | A B), but this is painful when A and B are more than one token
> in length and I've got about 20 of these things in the grammar.  This is
> just begging for typos.

this example REALLY FAILS for me. It is hard for me to envision a
language the can initialize a variable (e.g. B) without any declaration
of that variable (e.g. A). So having a bare naked B under the above
example makes no sense to me. Maybe you meant something like: (A B? C?)
where A is the var decl, B is its type and C is its initial value...

Hope this helps....
   -jbb