[antlr-interest] lexer rule matching problem

Mon Jan 9 07:20:02 PST 2006

Nope, that didn't work! :( But I finally found the solution!! :D

The trouble with the rule:
  CONCAT : '&' (( 'h' (HEX_DIGIT)+ (('&')?)! ){ $setType(HEX); })? ;
is that the lexer can't backtrack if it gets an input like a=a&height
Since the main requirement is that the lexer first try to match a hex
number, and failing that backtrack and just match the ampersand '&', I
decided to check out ...yup, you guessed it...syntactic predicates!
So, after much tinkering (and some tailoring), I finally arrived at a
rule that is able to process all my input files correctly. And for
your viewing pleasure, here it is:
==================
 CONCAT :  ('&')=> (HEX_NUM)=>HEX_NUM{_ttype = HEX;}
                |(OCT_NUM)=>OCT_NUM{_ttype = OCT;}
                | '&'
         ;
protected  HEX_NUM
: '&' 'h' (HEX_DIGIT)+ (('&')?)!
;
protected OCT_NUM
: '&' 'o' ('0' .. '7')+ ;

==================
Note that this rule takes care of the string concatenation operator
'&', as well as HEX (&H1&, &H2) and OCTAL (&O7) numbers.

Now here is another question: valid hex numbers in VB can only have
upto 8 digits. Is there any way in ANTLR that I can specify the number
of times to match a  rule?

off for some well deserved sleep.

- tinker
:)

On 1/9/06, tinker tailor <mail.tinker at gmail.com> wrote:
> Hi John,
>    Seems like this should do just what i want. I'll test it out and
> let you know.
> Thanks,
> Tinker
> :)
>
> On 1/6/06, John B. Brodie <jbb at acm.org> wrote:
> > Tinker Tailor asked:
> > >  I am trying to parse a subset of the vbscript language, and have run
> > >into the following problem:
> > >   The '&' in VBS can be used in two ways -
> > >       1. As a concatenation operator
> > >              e.g.:  a = b & c    or   a=b&c
> > >       2.As part of the prefix ("&H") and optional suffix('&') for
> > >hexadecimal numbers
> > >             e.g.:  a=&H9Abc    or  a=&H9Abc&
> > >
> > >So, here are the rules I made in my lexer (lookahead=3):
> > >
> > >CONCAT : '&';
> > >HEX : "&h" (HEX_DIGIT)+ (('&')?)! ;
> > >HEX_DIGIT : '0'..'9' | 'a'..'f' ;
> > >
> > >Now what I want the lexer to do is to first try and match a hex
> > >number, and only when that fails, to try and match for the CONCAT
> > >token. But I am not really sure how to tell antlr that. :(
> > > As things stand, the lexer first matches CONCAT, and as a result
> > >throws the 'unexpected token: exception when I give it the following
> > >valid input:
> > >     a = &H345ad&
> > >
> > >Any suggestions?
> >
> > untested, but perhaps this might do it:
> >
> > token { HEX; }
> > CONCAT : '&' (( 'h' (HEX_DIGIT)+ (('&')?)! ){ $setType(HEX); })? ;
> > protected HEX_DIGIT : '0'..'9' | 'a'..'f' ;
> >
>