[antlr-interest] Question about idiom.

Kay Röpke kroepke at classdump.org
Sat Jan 9 06:41:33 PST 2010


On Jan 9, 2010, at 5:32 AM, Michael Richter wrote:

> I keep coming across a pattern in a grammar I'm working on.  This pattern
> looks something like this:
> 
>   - A production can be *A*.
>   - A production can be *B*.
>   - A production can be *A B.*
> 
> In the grammar I'm transcribing this from, the notation used is *(A & B)*.
> Is there some convenient way to code that in ANTLR's EBNF notation?  I keep
> having to do *(A | B | A B)*.  As is that isn't all that onerous as-is, I
> admit, but imagine if A is five tokens long and B is also five tokens long
> and then imagine this kind of pattern happening about twenty times in the
> grammar.  Is there a way to concisely do this?

What is the restriction on the parts of the production?
I.e. what differentiates a valid production from an invalid one?

I'll take a wild guess, maybe I'm right ;)
Given the tokens A, B, C, D, i suspect that the allowed combination is any permutation of these tokens,
i.e. A B C D, C B A, D, A, B etc are all valid inputs?

Then the question is, how do you a) make it easy to write in the grammar and b) still ensure no repeated element in the production.
One way to do it is to use semantic predicates (turning off or validating parts of the grammar depending on semantic infomation).
Depending on whether you want the FailedPredicateException or not, you would use a gated sempred ( {}?=> ) or a non-gated one ( {}? ).
Gated sempreds "turn off" parts of the grammar, while regular validating predicates do not.

Disclaimer: written in mail, assuming Java target, not enough coffee yadda yadda:

primaryOne
@init {
Map seenToken = new HashMap();
}
	:
	(	{! seenToken.containsKey(input.LT(1).getText()) }? prim=primaryOneToken
		{ seenToken.put($prim.start.getText(), Boolean.TRUE); }
	)+
	;

primaryOneToken
	:	'A'
	|	'B'
	|	'C'
	|	'D'
	;

expr	:	primaryOne '&' primaryOne 'A' /*  the 'A' is just to demonstrate that ANTLR will carry on matching input correctly */
	;

That should allow lists of non-repeated A, B, C, D in any order. Maybe there is a more clever way of writing that, but it eludes me right now.

Try it in ANTLRWorks on input like:
A B C & A A
and see what it matches where and what changes if you change the the sempred to a gated one.

cheers,
-k


More information about the antlr-interest mailing list