[antlr-interest] Advice with backtracking/ambiguity

Ken Williams ken.williams at thomsonreuters.com
Wed Jun 2 13:04:10 PDT 2010


Hi,

Here's a simple grammar demonstrating something I'm working with:

------------------------------
grammar testg;

options {
    backtrack=true;
    memoize=true;
    output=AST;
}

cite    :    token+ EOF ;

token    :    DATE | SLASH | DIGITS ;
    
DATE    :    DIGITS SLASH DIGITS SLASH DIGITS ;

WS    :    ( ' ' | '\t'| '\f' | '\n' | '\r' ) {skip();} ;

SLASH    :    '/' ;
DIGITS    :    ('0'..'9')+ ;
--------------------------------


As you can see, there's an ambiguity with DATE.  What I'm trying to do is to
use the DATE rule when it can succeed, and use DATE & SLASH otherwise.  So
for example, the input "10 30/2" should parse as "DIGITS DIGITS SLASH
DIGITS", but "10 30/2/24" should parse as "DIGITS DATE".

Instead, what happens is that "10 30/2" fails to parse, saying "mismatched
character '<EOF>' expecting '/'".

I've tried using syntactic predicates on DATE and on the first alternative
for 'token', but nothing I've tried seems to have any effect.  What I've
tried is:

token    :   (DATE)=> DATE | SLASH | DIGITS ;

and

DATE    :    (DIGITS SLASH DIGITS SLASH DIGITS)=> DIGITS SLASH DIGITS SLASH
DIGITS ;

but neither seems to actually have any effect on the parse.

I've also tried changing DATE to a parser rule:

token    :    (DIGITS SLASH DIGITS SLASH DIGITS)=> date | SLASH | DIGITS ;
date    :     DIGITS SLASH DIGITS SLASH DIGITS ;

but now I can't get the 'date' rule to ever match - the input "10 30/2/24"
parses as "DIGITS DIGITS SLASH DIGITS SLASH DIGITS" instead of "DIGITS
date".


I'm sure this is a classic problem with a classic solution but so far it
eludes me, so I'd appreciate any advice.  Thanks.

-- 
Ken Williams
Sr. Research Scientist
Thomson Reuters
Phone: 651-848-7712
ken.williams at thomsonreuters.com




More information about the antlr-interest mailing list