[antlr-interest] Matching braces in grammar

Jim Idle jimi at temporal-wave.com
Mon May 21 08:34:39 PDT 2007


Do you wish to solve this in the lexer and just return a token BEANSHELL? Or are you trying to parse the code in the ${ } too?

If the former then you would probably have to pick some bounding character sequence that I not ambiguous/context sensitive, or write action code for the BEANCODE that consumes characters until the } that matches the ${ is found, assuming that it is not possible to have } in the block without a matching { preceding it. As you seem to want to allow statements like print then I think you could end up with print("}}}}}}"); which means you can't rely on a simple trick. Then you would get yourself to appoint where you are trying to parser your BEANSCRIPT in a lexer rule, and will have all sorts of trouble trying to do it manually.

So, if you can choose something 'more strange' than ${ } you might be able to do it. It is your language so this is possible, though if you are allowing literal strings, then it will always be possible that the literal string has the sequence you define in it. For instance suppose you were parsing code that was itself generating code of the same language?

So, I think that your answer is that you really need an island grammar. This island grammar does not necessarily have to be able to parse your BEANSTATEMENT completely, but just be able to scan through it until it finds the correct }. Look at the example of island grammar in the 3.0 examples. You will want something like this:

main.g

...
BEANSTATEMENT : '${'
			{
				// Call the BEANSTATEMENT CONSUMING GRAMMAR HERE
			}
		;

beanstatement.g

statement
		: beanstring RBRACE;

beanstring	: beanexpr+
		;

beanexpr	: STRING
		| IDPUNCT
		| LBRACE beanstring RBRACE
		;

STRING	: '"' ~('"')* '"' ;
LBRACE	: '{' ;
RBRACE	: '}' ;
IDPUNCT	: . ;

This should parse through the beanstatement just looking for the terminating }. Of course, you could hand code that in the lexer rule, but an island grammar allows you to actually parser the statement properly if you want to with a completely new lexer and parser as well as cater for nested '{' within the BEANSTATEMENT such as 

${
	if (expr) { print("dfdfddfd {{{ }}"); }
}

Jim

-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jukkis
Sent: Monday, May 21, 2007 4:05 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Matching braces in grammar

Hello all ANTLR fans!

I'm developing a small language with ANTLR. One feature is that my language can have BeanShell code written into the language in special BeanShell blocks.

Currently, I have a special kind of statement which takes the BeanShell code (which is essentially Java):

${"print(\"Hey, I'm BeanShell code\");"}

defined in my grammar as:

beanshell_statement
    : "$" LCURLY! STRING_LITERAL RCURLY!
    ;

The problem is that BeanShell code may contain the symbol '}' which I use to terminate the statement. Currently, I use STRING_LITERAL to work around this fact.

Now, what I would want is that there would be no need to write the BeanShell code inside a string. How can I make ANTLR understand that it would consider any curlys found INSIDE the MATCHING '{' ... '}' pair as just ordinary text?

Thank you very much for any advice!


...................................................................
Luukku Plus paketilla pääset eroon tila- ja turvallisuusongelmista.
Hanki Luukku Plus ja helpotat elämääsi. http://www.mtv3.fi/luukku



More information about the antlr-interest mailing list