[antlr-interest] Advice with backtracking/ambiguity
Ken Williams
ken.williams at thomsonreuters.com
Wed Jun 2 15:03:35 PDT 2010
Yeah, probably I should be using parser rules. I was trying to keep things
"simple" by making everything a linear stream of tokens from the point of
view of the Java caller, while still having high-level constructs like DATE.
Perhaps what I really want is something like this:
-------------------
options {
backtrack=true;
memoize=true;
output=AST;
}
tokens {
DATE;
}
cite : token+ EOF ;
token : date -> DATE | SLASH | DIGITS;
date : DIGITS SLASH DIGITS SLASH DIGITS ;
SLASH : '/' ;
DIGITS : ('0'..'9')+ ;
WS : ( ' ' | '\t'| '\f' | '\n' | '\r' ) {skip();} ;
-------------------
The only thing missing now is the character-data from DATE. Is there a way
to change that 'token' rule to something like this?
token : date -> {new CommonToken(DATE, $text)} | SLASH | DIGITS;
I appreciate all the help.
On 6/2/10 4:41 PM, "Jim Idle" <jimi at temporal-wave.com> wrote:
> This isn't left factored, it is doing the lookahead that you require to
> distinguish the two. In v4 this will be different, but for now, this is what
> you will need to do.
>
> Or, don't try to do it in the lexer at all and construct parser rules for it.
--
Ken Williams
Sr. Research Scientist
Thomson Reuters
Phone: 651-848-7712
ken.williams at thomsonreuters.com
More information about the antlr-interest
mailing list