[antlr-interest] Antlr Token Issue
James
jameselliot at gmail.com
Wed Apr 4 00:26:17 PDT 2007
This was a non-determinisim issue then. It requires me to change my grammer
so that 'hello' can only match one rule. One way to do this would be:
=====================================================
grammar expr;
options {
k=2;
backtrack=true;
memoize=true;
}
@header {
package tests;
}
@lexer::header {
package tests;
}
aprog : (WS | anitem)+
;
anitem : SM_CHARS EQUALS QUOTE CHARS QUOTE
{
if ($SM_CHARS.text.equals("hello")) {
System.out.println("Have quoted text : " +
$CHARS.text);
}
else {
reportError("Invalid keyword \"" + $SM_CHARS.text + "\"
expecting \"hello\"");
}
}
;
CHARS : (SM_CHARS|CP_CHARS)+
;
SM_CHARS : ('a'..'z')+
;
CP_CHARS : ('A'..'Z')+
;
QUOTE : '"'
;
EQUALS : '='
;
WS : (' ' | '\t' | '\n') +
;
============================================================
On 4/3/07, James <jameselliot at gmail.com> wrote:
>
> Hi,
>
> I am having a problem with keywords being extracted to tokens and then
> matching against more general requirements.
>
> Is there a simple way to stop this in my grammar or do I need to
> reconsider my rules?
>
>
> An example grammar is:
>
> =====================================================
> grammar expr;
> options {
> k=2;
> backtrack=true;
> memoize=true;
> }
>
> @header {
> package tests;
> }
>
> @lexer::header {
> package tests;
> }
>
> aprog : (WS | anitem)+
> ;
> anitem : 'hello' EQUALS QUOTE CHARS QUOTE
> {
> System.out.println("Have quoted text : " + $CHARS.text);
> }
> ;
> CHARS : ('a'..'z'|'A'..'Z')+
> ;
> QUOTE : '"'
> ;
> EQUALS : '='
> ;
> WS : (' ' | '\t' | '\n') +
> ;
> =========================================================================
>
> A test class is:
> ========================================================================
> package tests;
>
> import org.antlr.runtime.ANTLRStringStream;
> import org.antlr.runtime.CommonTokenStream;
>
> public class DoTest {
>
> public static void main(String[] args) throws Throwable {
> if (args.length == 0) {
> System.out.println("Please provide input on command line");
> }
> else {
> exprLexer l = new exprLexer(new ANTLRStringStream(args[0]));
> CommonTokenStream tokens = new CommonTokenStream();
> tokens.setTokenSource(l);
> exprParser p = new exprParser(tokens);
>
>
> p.aprog();
> }
> }
> }
>
> ========================================================================
> Sample usage is:
> ========================================================================
>
> $ java tests.DoTest "hello=\"there\""
>
> Have quoted text : there
>
> $ java tests.DoTest "hello=\"hello\""
>
> line 1:7 mismatched input 'hello' expecting CHARS
> line 1:12 mismatched input '"' expecting EQUALS
> line 0:-1 mismatched input '<EOF>' expecting CHARS
>
> ========================================================================
>
> I am guessing that the second "hello" is matched by the tokenizer as type
> HELLO. Can I tell the tokenizer not to do this?
> Or is there a simple way to refactor this?
>
> Thank you,
>
> James.
>
> (All files attached).
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070404/8788ff4f/attachment.html
More information about the antlr-interest
mailing list