[antlr-interest] Re: Problem with existence of the same literal in two rules
antlrlist
antlrlist at yahoo.com
Fri May 2 08:17:35 PDT 2003
Your STRING rule is indeed being called.
What amazes me the most is that ANTLR actually lets you use this
grammar. Didn't it report a conflict between STRING and WFCS?
There's no easy way you can change the behaviour. The first solution
you could use would be avoiding the use of L,H,Z and/or X in STRING:
STRING : ( 'a'..'g' | 'i'..'k' | 'm'..'w' | 'y' |
'A'..'G' | 'I'..'K' | 'M'..'W' | 'Z' )+
INT
;
If you need those letters in your STRING, there's no easy solution:
your grammar is non-LL(k) for any k, so you'll have to manually code
the recognition rules. What I advice you is start with a very generic
STRING, an then use an action to see if its a STRING, WTFS or
erroneous. You'll have to use some an imaginary token, $setType() and
some native (java/C++/C#) code. I'll assume that you're using java.
The complete grammar would look as this:
class myScanner extends Scanner;
options {
// don't know, probably k=2;
}
tokens {
WFCS; // Imaginary token
}
{ // native code
public boolean isWFCS(String text)
{
// Returns true if text is made of L,H,Z and/or X,
// false otherwise - I'll let you implement this one :)
}
public boolean isInvalidString(String text)
{
// Returns true if text is NOT a combination of alphabetic chars
// followed by a single digit - false otherwise
// You'll also have to implement this one
}
}
// Usual STRING implementation- a letter followed by zero or more
// letters or digits
STRING : ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z'|INT)*
{
// If it's a WFCS, change the type
if( isWFCS( $getText() ) ) $setType(WFCS);
// Else check correctness
else if ( isInvalidString($getText() ) )
// error here (throw new MismatchedCharException(...;
}
;
INT :('0'..'9')+ ;
I hope this could help you...
--- In antlr-interest at yahoogroups.com, "ramyasivadas"
<ramyasivadas at y...> wrote:
> Hi,
>
> Let me quote an example to help me explain the problem.
>
> I have a rule as follows
> STRING
> :('a'..'z' | 'A'..'Z')+ (INT)
> ;
>
> I also have a rule
> WFCS
> :'L'|'H'|'Z'|'X'
> ;
>
> INT
> :('0'..'9')+
> ;
>
> The issue is, if I have a set of literals constituting a rule and if
> one or more of the same literals form a part of another independent
> rule, the parser generates an exception. How can I avoid this.
>
> For example, the rule STRING is defined to be made up of a
> combination of any alphabets followed by a numeral. Another rule WFCS
> which has nothing to do with STRING can be made up of a combination
> of L,H,Z and/or X.
>
> When the parsing for WFCS block is done, the parser expects an INT
> after the alphabet. I assume it is applying the STRING rule on the
> WFCS block since it has encountered an alphabet. Can we override this
> behaviour.
>
>
> Thanks in advance.
>
> Regards,
> Ramya
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list