[antlr-interest] lexer rule matching problem

Fri Jan 6 06:39:21 PST 2006

Tinker Tailor asked:
>  I am trying to parse a subset of the vbscript language, and have run
>into the following problem:
>   The '&' in VBS can be used in two ways -
>       1. As a concatenation operator
>              e.g.:  a = b & c    or   a=b&c
>       2.As part of the prefix ("&H") and optional suffix('&') for
>hexadecimal numbers
>             e.g.:  a=&H9Abc    or  a=&H9Abc&
>
>So, here are the rules I made in my lexer (lookahead=3):
>
>CONCAT : '&';
>HEX : "&h" (HEX_DIGIT)+ (('&')?)! ;
>HEX_DIGIT : '0'..'9' | 'a'..'f' ;
>
>Now what I want the lexer to do is to first try and match a hex
>number, and only when that fails, to try and match for the CONCAT
>token. But I am not really sure how to tell antlr that. :(
> As things stand, the lexer first matches CONCAT, and as a result
>throws the 'unexpected token: exception when I give it the following
>valid input:
>     a = &H345ad&
>
>Any suggestions?

untested, but perhaps this might do it:

token { HEX; }
CONCAT : '&' (( 'h' (HEX_DIGIT)+ (('&')?)! ){ $setType(HEX); })? ;
protected HEX_DIGIT : '0'..'9' | 'a'..'f' ;