[antlr-interest] Advice with backtracking/ambiguity
Ken Williams
ken.williams at thomsonreuters.com
Wed Jun 2 13:04:10 PDT 2010
Hi,
Here's a simple grammar demonstrating something I'm working with:
------------------------------
grammar testg;
options {
backtrack=true;
memoize=true;
output=AST;
}
cite : token+ EOF ;
token : DATE | SLASH | DIGITS ;
DATE : DIGITS SLASH DIGITS SLASH DIGITS ;
WS : ( ' ' | '\t'| '\f' | '\n' | '\r' ) {skip();} ;
SLASH : '/' ;
DIGITS : ('0'..'9')+ ;
--------------------------------
As you can see, there's an ambiguity with DATE. What I'm trying to do is to
use the DATE rule when it can succeed, and use DATE & SLASH otherwise. So
for example, the input "10 30/2" should parse as "DIGITS DIGITS SLASH
DIGITS", but "10 30/2/24" should parse as "DIGITS DATE".
Instead, what happens is that "10 30/2" fails to parse, saying "mismatched
character '<EOF>' expecting '/'".
I've tried using syntactic predicates on DATE and on the first alternative
for 'token', but nothing I've tried seems to have any effect. What I've
tried is:
token : (DATE)=> DATE | SLASH | DIGITS ;
and
DATE : (DIGITS SLASH DIGITS SLASH DIGITS)=> DIGITS SLASH DIGITS SLASH
DIGITS ;
but neither seems to actually have any effect on the parse.
I've also tried changing DATE to a parser rule:
token : (DIGITS SLASH DIGITS SLASH DIGITS)=> date | SLASH | DIGITS ;
date : DIGITS SLASH DIGITS SLASH DIGITS ;
but now I can't get the 'date' rule to ever match - the input "10 30/2/24"
parses as "DIGITS DIGITS SLASH DIGITS SLASH DIGITS" instead of "DIGITS
date".
I'm sure this is a classic problem with a classic solution but so far it
eludes me, so I'd appreciate any advice. Thanks.
--
Ken Williams
Sr. Research Scientist
Thomson Reuters
Phone: 651-848-7712
ken.williams at thomsonreuters.com
More information about the antlr-interest
mailing list