[antlr-interest] Re: Problem with existence of the same literal in two rules

Fri May 2 08:17:35 PDT 2003

Your STRING rule is indeed being called.

What amazes me the most is that ANTLR actually lets you use this
grammar. Didn't it report a conflict between STRING and WFCS?

There's no easy way you can change the behaviour. The first solution
you could use would be avoiding the use of L,H,Z and/or X in STRING:

STRING : ( 'a'..'g' | 'i'..'k' | 'm'..'w' | 'y' |
           'A'..'G' | 'I'..'K' | 'M'..'W' | 'Z' )+
         INT
       ;

If you need those letters in your STRING, there's no easy solution:
your grammar is non-LL(k) for any k, so you'll have to manually code
the recognition rules. What I advice you is start with a very generic
STRING, an then use an action to see if its a STRING, WTFS or
erroneous. You'll have to use some an imaginary token, $setType() and
some native (java/C++/C#) code. I'll assume that you're using java.

The complete grammar would look as this:

class myScanner extends Scanner;
options {
   // don't know, probably k=2;
}

tokens {
   WFCS; // Imaginary token
}

{ // native code

   public boolean isWFCS(String text)
   {
      // Returns true if text is made of L,H,Z and/or X,
      // false otherwise - I'll let you implement this one :)
   }

   public boolean isInvalidString(String text)
   {
      // Returns true if text is NOT a combination of alphabetic chars
      // followed by a single digit - false otherwise
      // You'll also have to implement this one
   }
}

// Usual STRING implementation- a letter followed by zero or more 
// letters or digits
STRING : ('a'..'z' | 'A'..'Z')  ('a'..'z' | 'A'..'Z'|INT)*
         {
           // If it's a WFCS, change the type
           if( isWFCS( $getText() ) ) $setType(WFCS);
           // Else check correctness
           else if ( isInvalidString($getText() ) )
               // error here (throw new MismatchedCharException(...;
         }

       ;

INT :('0'..'9')+ ;

I hope this could help you...

--- In antlr-interest at yahoogroups.com, "ramyasivadas"
<ramyasivadas at y...> wrote:
> Hi,
> 
> Let me quote an example to help me explain the problem.
> 
> I have a rule as follows
> STRING
> :('a'..'z' | 'A'..'Z')+ (INT)
> ;
> 
> I also have a rule
> WFCS
> :'L'|'H'|'Z'|'X'
> ;
> 
> INT
> :('0'..'9')+
> ;
> 
> The issue is, if I have a set of literals constituting a rule and if 
> one or more of the same literals form a part of another independent 
> rule, the parser generates an exception. How can I avoid this.
> 
> For example, the rule STRING is defined to be made up of a 
> combination of any alphabets followed by a numeral. Another rule WFCS 
> which has nothing to do with STRING can be made up of a combination 
> of L,H,Z and/or X.
> 
> When the parsing for WFCS block is done, the parser expects an INT 
> after the alphabet. I assume it is applying the STRING rule on the 
> WFCS block since it has encountered an alphabet. Can we override this 
> behaviour.
> 
> 
> Thanks in advance.
> 
> Regards,
> Ramya

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/