[antlr-interest] Antlr Token Issue

Wed Apr 4 00:26:17 PDT 2007

This was a non-determinisim issue then.  It requires me to change my grammer
so that 'hello' can only match one rule.  One way to do this would be:

=====================================================
grammar expr;
options {
    k=2;
    backtrack=true;
    memoize=true;
}

@header {
    package tests;
}

@lexer::header {
    package tests;
}

aprog    :    (WS | anitem)+
         ;
anitem   :    SM_CHARS EQUALS QUOTE CHARS QUOTE
              {
                if ($SM_CHARS.text.equals("hello")) {
                    System.out.println("Have quoted text :  " +
$CHARS.text);
                }
                else {
                    reportError("Invalid keyword \"" + $SM_CHARS.text + "\"
expecting \"hello\"");
                }
              }
         ;
CHARS    :    (SM_CHARS|CP_CHARS)+
         ;
SM_CHARS :    ('a'..'z')+
         ;
CP_CHARS :    ('A'..'Z')+
         ;
QUOTE    :    '"'
         ;
EQUALS   :    '='
         ;
WS       :    (' ' | '\t' | '\n') +
         ;
============================================================

On 4/3/07, James <jameselliot at gmail.com> wrote:
>
> Hi,
>
> I am having a problem with keywords being extracted to tokens and then
> matching against more general requirements.
>
> Is there a simple way to stop this in my grammar or do I need to
> reconsider my rules?
>
>
> An example grammar is:
>
> =====================================================
> grammar expr;
> options {
>     k=2;
>     backtrack=true;
>     memoize=true;
> }
>
> @header {
>     package tests;
> }
>
> @lexer::header {
>     package tests;
> }
>
> aprog    :    (WS | anitem)+
>     ;
> anitem    :     'hello' EQUALS QUOTE CHARS QUOTE
>         {
>             System.out.println("Have quoted text :  " + $CHARS.text);
>         }
>     ;
> CHARS     :     ('a'..'z'|'A'..'Z')+
>     ;
> QUOTE    :    '"'
>     ;
> EQUALS    :    '='
>     ;
> WS    :    (' ' | '\t' | '\n') +
>     ;
> =========================================================================
>
> A test class is:
> ========================================================================
> package tests;
>
> import org.antlr.runtime.ANTLRStringStream;
> import org.antlr.runtime.CommonTokenStream;
>
> public class DoTest {
>
>     public static void main(String[] args) throws Throwable {
>         if (args.length == 0) {
>             System.out.println("Please provide input on command line");
>         }
>         else {
>             exprLexer l = new exprLexer(new ANTLRStringStream(args[0]));
>             CommonTokenStream tokens = new CommonTokenStream();
>             tokens.setTokenSource(l);
>             exprParser p = new exprParser(tokens);
>
>
>             p.aprog();
>         }
>     }
> }
>
> ========================================================================
> Sample usage is:
> ========================================================================
>
> $ java tests.DoTest "hello=\"there\""
>
> Have quoted text :  there
>
> $ java tests.DoTest "hello=\"hello\""
>
> line 1:7 mismatched input 'hello' expecting CHARS
> line 1:12 mismatched input '"' expecting EQUALS
> line 0:-1 mismatched input '<EOF>' expecting CHARS
>
> ========================================================================
>
> I am guessing that the second "hello" is matched by the tokenizer as type
> HELLO.  Can I tell the tokenizer not to do this?
> Or is there a simple way to refactor this?
>
> Thank you,
>
> James.
>
> (All files attached).
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070404/8788ff4f/attachment.html