[antlr-interest] A look-ahead/rewind problem

Mon Mar 23 05:20:15 PDT 2009

Lukasz Guminski schrieb:
> Hi all,
>
> I'm writing a translator which aims to convert input delimited by tags
>
> BLOCK <name>
>
> in the following way:
>
> BLOCK A      =>      start of block A
> BLOCK B      =>       start of block B
> BLOCK B      =>       end of block B
> BLOCK A      =>      end of block A
>
> The difficulty is caused by the fact that the opening tag (e.g. 'BLOCK 
> A') is exactly the same as the closing one ('BLOCK A'), so I need to 
> track on my own if it is an opening or a closing.
>
> The algorithm is simple: if I encounter a block with the name that is 
> already known, then I have encountered a closing tag. If the name is 
> unknown, than I open a new block. 
>
> BLOCK A      =>      start of block A
> BLOCK A      =>      end of block A
> BLOCK A      =>      start of block A
> BLOCK A      =>      end of block A
>
> The problem is that I can make the decision only when I have collected 
> the entire name of a block. If I can see, that the name is already 
> known (so it is not an opening of a new block), then I need to unput 
> the chars and go to closing rule. Unfortunately there is nothing like 
> the /unput()/ function in antlr, so I need to use syntactic predicate 
> to make an unlimited look ahead for the name.
>
> I wrote grammar, but I get an error. Please help me understand why.
>
> [12:48:09] Checking Grammar...
> [12:48:09] warning(200): block.g:7:8: Decision can match input such as 
> "BLOCK_BOUNDARY LETTER ALPHANUM BLOCK_BOUNDARY" using multiple 
> alternatives: 1, 2
> As a result, alternative(s) 2 were disabled for that input
> [12:48:09] warning(200): block.g:8:19: Decision can match input such 
> as "BLOCK_BOUNDARY LETTER ALPHANUM BLOCK_BOUNDARY" using multiple 
> alternatives: 1, 2
> As a result, alternative(s) 2 were disabled for that input
> [12:48:09] error(201): block.g:8:19: The following alternatives can 
> never be matched: 2
>   
> This is the grammar that I use:
>
> grammar block;
> @members{
>   Stack<String> stack = new Stack<String>();
> }
>
> data: blocks EOF;
> blocks:block+;
> block: block_open blocks? block_close;
> block_open
>     :  ((BLOCK_BOUNDARY 
> block_name)=>{stack.size()==0||!stack.peek().equals($block_name.text)}?) 
> =>
>     BLOCK_BOUNDARY block_name BLOCK_BOUNDARY 
> {stack.push($block_name.text);System.out.println("start of block " + 
> $block_name.text);};
> block_close:BLOCK_BOUNDARY block_name BLOCK_BOUNDARY 
> {System.out.println("end of block " + $block_name.text);stack.pop();};
> block_name:LETTER+ ALPHANUM*;
>
> LETTER    :     ('a'..'z'|'A'..'Z');
> ALPHANUM    :    (LETTER|'0'..'9');
> BLOCK_BOUNDARY     :     'block';
> NEWLINE :     ( CR )? LF | CR;
> fragment CR :'\r';
> fragment LF : '\n';
> INSIGNIFICANT_CHAR:.;
>
> It looks like antlr does not see that the block_open is guarder by 
> syntactic predicate, which is not always true (because 
> "stack.size()==0||!stack.peek().equals($block_name.text)" condition 
> can be false).
>
> Thanks for any advice,
> Lukasz

The problem is that semantic predicates like {...}? are only evaluated 
when there is an ambiguity ... and maybe ANTLR thinks the ambiguity is 
already resolved by your syntactic predicate. Actually, I am not sure 
about the structure you gave, namely, a syntactic predicate inside a 
syntactic predicate, maybe it also means "match the empty string after 
seeing BLOCK_BOUNDARY, I dont know.

I would suggest you try without the syntactic predicate, turn your 
semantic predicate into a gated semantic predicate {...}? => such that 
it is forced to be evaluated:

block_open: {is_block_open()}?=> BLOCK_BOUNDARY

Btw, why do you have BLOCK_BOUNDARY twice in each rule? ACcording to 
your example, you would only need, one, right?

Best,
Andreas Meyer