[antlr-interest] Parser help with grabbing unparsed code blocks

Llew Mason llewmason at yahoo.com
Tue Mar 14 22:03:25 PST 2006


Hi all,

I'm trying to write a parser/lexer to deal with a language that  
contains code blocks that will not be interpreted by the parser, but  
I want the parser to extract them as chunks of text.

For example, here's a dummy piece of code to be parsed:

COMMAND {CAT, DOG}
{
    if (id.call() == true)
    {
	id.otherCall();
    }
}

I want the parser to understand the tokens COMMAND { CAT , DOG } and  
parse those, and then expect a code block in curly braces.  However,  
it shouldn't attempt to parse the contents of the code block.  The  
action for the command rule needs to pull the entire contents of the  
curly braces (because I want to pass them onto beanshell as code).

I've tried a bunch of different ways to get this to work, and seem to  
want something like the code below to work, but I can't work out what  
to put for 'ANYTHING' in the block below that gives me what I want.   
Having the lexer define ANYTHING appropriately (with the curly braces  
in the lexer rule instead of the parse rule) makes it gobble up  
things like {CAT, DOG} too.  I get the feeling that maybe predicates  
could be used in the lexer to solve my problem, but updating a state  
variable communicating between the parser and lexer didn't seem to  
work right.  I also briefly looked at using the multiplexing support,  
but I don't want to _parse_ the code block, just grab it.

command :
(
     "COMMAND" id "{" ANYTHING "}"
     {
         ... do something with the contents of the code block in the  
curly braces ...
     }
);

id :
(
     ("{" WORD ("," WORD)? "}")
     {
     }
);

Did this make any sense?  If so, can anyone point me in the right  
direction?  It seems like I'm missing something and there is an easy  
way to accomplish what I want.

Thanks,

Llew




More information about the antlr-interest mailing list