[antlr-interest] Lexer not backtracking properly?

Fri May 21 09:59:42 PDT 2010

On 05/21/2010 11:31 AM, Stevenson, Todd (GE Healthcare) wrote:
> I have been working on a lexer grammar that appears to try to process
> one rule but, 
> when it fails, does not back track and find the correct rule.
>  
> Consider this grammar:
>  
> start
>    : LBRACE option RBRACE EOF
>    ;
>  
> 
> option
>    : DELETE
>    | IMPORT
>    | ALL
>    ;
>  
> DELETE  : 'delete';
> ALL     : 'all';
> IMPORT  : 'import';
>  
> LBRACE    : '{';

How about:

LBRACE	: '{' ( XDIGIT+ RBRACE { $type = GUID; } )?
	;

and delete your GUID production.  You'll probably have to define GUID in
a TOKENS {} section, too.  [I use _ttype instead of $type in ANTLR v2]

> RBRACE    : '}';
>  
> GUID    : LBRACE XDIGIT+ RBRACE;
>  
> HEXNUMBER : XDIGIT+;
>  
> WS        : ( ' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};
>  
> fragment
> XDIGIT     : '0' .. '9' | 'a' .. 'f' | 'A' .. 'F';
>  
> ------------------------
>  
> When I use the following input '{all}' it fails because it appears to
> try to process the 'GUID'
> rule but when it fails, cannot backtrack and process the 'LBRACE' rule.
> It works correctly
> when I processing the input '{ all}'.  Is this correct behavior?   

Yes, because your GUID rule gets done before your LBRACE rule (it tries
to be greedy when it can).  Since whitespace won't match in your GUID,
but the character 'a' does!

> I know that the GUID rule could probably be a parser rule but I wanted
> to not allow embedded 
> whitespace so I left it as a lexer rule.

-- 
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)