[antlr-interest] Can ANTLR parse this language?

Fri Sep 14 07:11:42 PDT 2007

Greetings,

Is the language presented below can be parsed with ANTLR? It is certainly not ambiguous and LL(1) is enough.

I have tokens of four different types:

S
A
B
D 

The informal rules are:

	Program consists of separate sentences.  
	Token D always terminates the current sentence (like a '.') but it is optional. 
	If the sentence already has A or B tokens the another A or B token starts another sentence. 

That's all.

As a result the sentence can have no more than one A or B token, so it may have none. 

The problem is to split token stream into sentences.

Below I illustrate it with some examples. I put a vertical bar where the stream should be split into the sentences.

S S    =>  S S |

S D S D D    =>  S D | S D | D

A S S D S =>  A S S D | S

A S S D S B S => A S S D | S B S

S A S S B S A S => S A S S | B S | A S

For now I solved the problem by adding a token filter between lexer and parser and inserting 
additional tokens (the same as shown with the vertical bar in the example). 
After the additional token V, which stands for vertical bar, the grammar is below. 

program
:
  sentence*
;

sentence
:
   S* ( A S* | B S* |) D? V
;

But I am curious whether the language can be described (see note below) with a context free grammar without 
introducing extra token and token filters.  

It requires some memory: A/B start a new sentence only if there are no A/B before in the 
same sentence.  Whether it is actually a context I do now know.

PS. May be describe is not a proper term here. Any combination of tokens is acceptable 
in the language, so there are no illegal programs. Consequently pumping lemma can not be used to 
prove that it is not context free. It is just the way I want to interpret the language.

Sincerely,
Pavel.

-- 
We've Got Your Name @ www.mail.com!!!
Get a FREE E-mail Account Today - Choose From 100+ Domains