[antlr-interest] Advice with backtracking/ambiguity

Ken Williams ken.williams at thomsonreuters.com
Wed Jun 2 15:03:35 PDT 2010


Yeah, probably I should be using parser rules.  I was trying to keep things
"simple" by making everything a linear stream of tokens from the point of
view of the Java caller, while still having high-level constructs like DATE.

Perhaps what I really want is something like this:

-------------------
options {
    backtrack=true;
    memoize=true;
    output=AST;
}

tokens {
    DATE;
}

cite    :    token+ EOF ;
token   :    date -> DATE | SLASH | DIGITS;
date    :    DIGITS SLASH DIGITS SLASH DIGITS ;

SLASH   :    '/' ;
DIGITS  :    ('0'..'9')+ ;
WS      :    ( ' ' | '\t'| '\f' | '\n' | '\r' ) {skip();} ;
-------------------


The only thing missing now is the character-data from DATE.  Is there a way
to change that 'token' rule to something like this?

token   :    date -> {new CommonToken(DATE, $text)} | SLASH | DIGITS;


I appreciate all the help.



On 6/2/10 4:41 PM, "Jim Idle" <jimi at temporal-wave.com> wrote:

> This isn't left factored, it is doing the lookahead that you require to
> distinguish the two. In v4 this will be different, but for now, this is what
> you will need to do.
> 
> Or, don't try to do it in the lexer at all and construct parser rules for it.

-- 
Ken Williams
Sr. Research Scientist
Thomson Reuters
Phone: 651-848-7712
ken.williams at thomsonreuters.com




More information about the antlr-interest mailing list