[antlr-interest] Problem with String parsing

Wed Apr 20 05:28:25 PDT 2011

On 04/20/2011 01:45 AM, preitz sharma wrote:
> Hi,
> I am facing problem in parsing String value.
> Here is my grammar:
> 
> grammar stringProblem;
> 
> 
> expr           :  SET attribute EOF;
> 
> attribute      :  ARRAY (SIZE)? Int
>                     |  OUT(PUTSTR)? str
>                     ;
> 
> str               :  (CHAR | DOT  | Int)+ ;

CHAR is a fragment.  You can't use fragments as a TOKEN (unless you
explicitly set then as a token type in some LEXER action)....  Notice
that both DOT and Int are not fragments below.

> 
> Int                :  '0'..'9'+;
> 
> SET            :  'set';
> 
> ARRAY       :  'array';
> 
> SIZE            :  's'('i'('z'('e')?)?)?;
> 
> OUT             :  'out';
> 
> PUTSTR      :  'p'('u'('t'('s'('t'('r')?)?)?)?)?;
> 
> fragment CHAR    :  ('a'..'z');
> 
> Space          :  (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
> 
> DOT           :  ('\U0000' .. '\UFFFF');
> 
> 
> 
> And the class to test it is:
> 
> 
> import org.antlr.runtime.ANTLRStringStream;
> import org.antlr.runtime.CommonTokenStream;
> import org.antlr.runtime.RecognitionException;
> 
> public class Demo {
> 
>     public static void main(String[] args) throws RecognitionException {
>         try {
>             ANTLRStringStream in = new ANTLRStringStream("set outp 100z");
>             stringProblemLexer lexer = new stringProblemLexer(in);
>             CommonTokenStream tokens = new CommonTokenStream(lexer);
>             stringProblemParser parser = new stringProblemParser(tokens);
>             parser.expr();
>         } catch (Exception e) {
>             System.out.println(e.getMessage());
>         }
>     }
> }
> 
> When I give the input sting as : "set outputstr 123zx3%", it is working
> fine.
> But when I am giving some input which matches any of the token, like "set
> output 123arr5", then I am getting error like: "line 1:17 mismatched
> character '5' expecting 'a'"
> 
> This is happening because other lexer rules like SET, ARRAY etc are
> specified before CHAR. Hence it is giving more priority to them rather than
> CHAR.
> So whenever some character comes, it first tries to match to the tokens with
> higher priority. But this should not be the expected behavior.
> 
> Please help me out. What should I do to make it work?
> 

-- 
Kevin J. Cummings
kjchome at verizon.net
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)