[antlr-interest] Advice with backtracking/ambiguity

Wed Jun 2 15:38:59 PDT 2010

On Wed, 2010-06-02 at 17:03 -0500, Ken Williams wrote:
> Yeah, probably I should be using parser rules.  I was trying to keep things
> "simple" by making everything a linear stream of tokens from the point of
> view of the Java caller, while still having high-level constructs like DATE.
> 
> Perhaps what I really want is something like this:

just be aware that when you make date a parser rule WS will be silently
accepted between the DIGITS and SLASHes comprising the date
non-terminal. 

if your language permits this, great! otherwise you would need to keep
it in the lexer.

i do not know what $text will be for a parser rule and do not know
whether any skip()'d WS will (or not) be included.

(same drill for COMMENT if you have that and/or any other skip()'d or
HIDDEN token within the lexer)

> 
> -------------------
> options {
>     backtrack=true;
>     memoize=true;
>     output=AST;
> }
> 
> tokens {
>     DATE;
> }
> 
> cite    :    token+ EOF ;
> token   :    date -> DATE | SLASH | DIGITS;
> date    :    DIGITS SLASH DIGITS SLASH DIGITS ;
> 
> SLASH   :    '/' ;
> DIGITS  :    ('0'..'9')+ ;
> WS      :    ( ' ' | '\t'| '\f' | '\n' | '\r' ) {skip();} ;
> -------------------
> 
> 
> The only thing missing now is the character-data from DATE.  Is there a way
> to change that 'token' rule to something like this?
> 
> token   :    date -> {new CommonToken(DATE, $text)} | SLASH | DIGITS;
> 
> 
> I appreciate all the help.
> 
> 
> 
> On 6/2/10 4:41 PM, "Jim Idle" <jimi at temporal-wave.com> wrote:
> 
> > This isn't left factored, it is doing the lookahead that you require to
> > distinguish the two. In v4 this will be different, but for now, this is what
> > you will need to do.
> > 
> > Or, don't try to do it in the lexer at all and construct parser rules for it.
>