[antlr-interest] Parser help with grabbing unparsed code blocks

shmuel siegel antlr at shmuelhome.mine.nu
Wed Mar 15 13:10:08 PST 2006


I would say that your language is ambiguous unless you can tell me 
something about the rules of right braces in the inner block, like they 
must be balanced and they are meaningless inside of strings. Otherwise 
it is impossible to tell if a right brace is part of the text or the 
terminator of the text. Assuming reasonable rules for your sub-language, 
I would create a token string for "ANYTHING" that reads like,

ANYTHING: ( NormalText | String | Block)*;
NormalText :		(~ ('{' | '}' | '"'))*;
String: '"' (~ '"')* '"';
Block: '{' (NormalText | String)* '}';

Llew Mason wrote:
> Hi all,
> 
> I'm trying to write a parser/lexer to deal with a language that contains 
> code blocks that will not be interpreted by the parser, but I want the 
> parser to extract them as chunks of text.
> 
> For example, here's a dummy piece of code to be parsed:
> 
> COMMAND {CAT, DOG}
> {
>    if (id.call() == true)
>    {
>     id.otherCall();
>    }
> }
> 
> I want the parser to understand the tokens COMMAND { CAT , DOG } and 
> parse those, and then expect a code block in curly braces.  However, it 
> shouldn't attempt to parse the contents of the code block.  The action 
> for the command rule needs to pull the entire contents of the curly 
> braces (because I want to pass them onto beanshell as code).
> 
> I've tried a bunch of different ways to get this to work, and seem to 
> want something like the code below to work, but I can't work out what to 
> put for 'ANYTHING' in the block below that gives me what I want.  Having 
> the lexer define ANYTHING appropriately (with the curly braces in the 
> lexer rule instead of the parse rule) makes it gobble up things like 
> {CAT, DOG} too.  I get the feeling that maybe predicates could be used 
> in the lexer to solve my problem, but updating a state variable 
> communicating between the parser and lexer didn't seem to work right.  I 
> also briefly looked at using the multiplexing support, but I don't want 
> to _parse_ the code block, just grab it.
> 
> command :
> (
>     "COMMAND" id "{" ANYTHING "}"
>     {
>         ... do something with the contents of the code block in the 
> curly braces ...
>     }
> );
> 
> id :
> (
>     ("{" WORD ("," WORD)? "}")
>     {
>     }
> );
> 
> Did this make any sense?  If so, can anyone point me in the right 
> direction?  It seems like I'm missing something and there is an easy way 
> to accomplish what I want.
> 
> Thanks,
> 
> Llew
> 
> 



-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 268.2.3/281 - Release Date: 3/14/2006



-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 268.2.3/281 - Release Date: 3/14/2006



More information about the antlr-interest mailing list