[antlr-interest] Error in predicate logic

Gavin Lambert antlr at mirality.co.nz
Fri Feb 16 12:30:32 PST 2007


At 04:17 17/02/2007, Gerald B. Rosenberg wrote:
>SPCHAR :
>     ( AMP GRIDLET INT SEMI    { printState("SpDigit "); }
>     | AMP LETTERS SEMI        { printState("SpLettr "); }
>     | AMP LETTERS ~';'        { $type=PCDATA; 
> printState("P1Data  "); }
>     | AMP GRIDLET INT ~';'    { $type=PCDATA; 
> printState("P2Data  "); }
>     | AMP GRIDLET ~INT        { $type=PCDATA; 
> printState("P3Data  "); }
>     | AMP                     { $type=PCDATA; 
> printState("PcData  "); }
>     ) ;

That one's going to consume the non-semicolon character following 
the letters as part of the PCDATA.  That could potentially be a 
problem (eg. if it's an angle bracket or quote mark).  Leaving 
those "not this" alternatives out should still actually work, it 
just may produce a nondeterminism warning.  If you want to get rid 
of that, then you probably will need to put the syntactic 
predicates back in, but they should be different from your 
matching set.  Like so:

SPCHAR
   : AMP GRIDLET INT SEMI              { printState("SpDigit "); }
   | AMP LETTERS SEMI                  { printState("SpLettr "); }
   | (AMP LETTERS ~';') => AMP LETTERS { $type = PCDATA; 
printState("P1Data "); }
   | (AMP GRIDLET INT ~';') => AMP GRIDLET INT { $type = PCDATA; 
printState("P2Data "); }
   | (AMP GRIDLET ~INT) => AMP GRIDLET { $type = PCDATA; 
printState("P3Data "); }
   | (AMP ~(GRIDLET | LETTERS)) => AMP { $type = PCDATA; 
printState("P4Data "); }
   ;

Though I'm not entirely sure about that last alt.  It might run 
afoul of the annoying inverse set thing mentioned below.

Also, I always thought that part of the point of ANTLR3 (and LL(*) 
parsing) was that it could handle grammars like the above as two 
separate rules with no need to do type-tweaking, since it could 
use as much lookahead as it needed to in order to disambiguate 
them.  Was I wrong about that?  Or haven't you tried it?

>Also, using ~SEMI  produced a lexer with an undefined set 
>variable.  Using an explicit ~';' works, but seems 
>counter-intuitive given that both the plain SEMI and  ~INT work.

Yeah, I know.  That one has annoyed me for ages, both in ANTLR2 
and 3.  I posted about it just a few days ago, in fact.



More information about the antlr-interest mailing list