[antlr-interest] lexer rule matching problem
tinker tailor
mail.tinker at gmail.com
Mon Jan 9 07:20:02 PST 2006
Nope, that didn't work! :( But I finally found the solution!! :D
The trouble with the rule:
CONCAT : '&' (( 'h' (HEX_DIGIT)+ (('&')?)! ){ $setType(HEX); })? ;
is that the lexer can't backtrack if it gets an input like a=a&height
Since the main requirement is that the lexer first try to match a hex
number, and failing that backtrack and just match the ampersand '&', I
decided to check out ...yup, you guessed it...syntactic predicates!
So, after much tinkering (and some tailoring), I finally arrived at a
rule that is able to process all my input files correctly. And for
your viewing pleasure, here it is:
==================
CONCAT : ('&')=> (HEX_NUM)=>HEX_NUM{_ttype = HEX;}
|(OCT_NUM)=>OCT_NUM{_ttype = OCT;}
| '&'
;
protected HEX_NUM
: '&' 'h' (HEX_DIGIT)+ (('&')?)!
;
protected OCT_NUM
: '&' 'o' ('0' .. '7')+ ;
==================
Note that this rule takes care of the string concatenation operator
'&', as well as HEX (&H1&, &H2) and OCTAL (&O7) numbers.
Now here is another question: valid hex numbers in VB can only have
upto 8 digits. Is there any way in ANTLR that I can specify the number
of times to match a rule?
off for some well deserved sleep.
- tinker
:)
On 1/9/06, tinker tailor <mail.tinker at gmail.com> wrote:
> Hi John,
> Seems like this should do just what i want. I'll test it out and
> let you know.
> Thanks,
> Tinker
> :)
>
> On 1/6/06, John B. Brodie <jbb at acm.org> wrote:
> > Tinker Tailor asked:
> > > I am trying to parse a subset of the vbscript language, and have run
> > >into the following problem:
> > > The '&' in VBS can be used in two ways -
> > > 1. As a concatenation operator
> > > e.g.: a = b & c or a=b&c
> > > 2.As part of the prefix ("&H") and optional suffix('&') for
> > >hexadecimal numbers
> > > e.g.: a=&H9Abc or a=&H9Abc&
> > >
> > >So, here are the rules I made in my lexer (lookahead=3):
> > >
> > >CONCAT : '&';
> > >HEX : "&h" (HEX_DIGIT)+ (('&')?)! ;
> > >HEX_DIGIT : '0'..'9' | 'a'..'f' ;
> > >
> > >Now what I want the lexer to do is to first try and match a hex
> > >number, and only when that fails, to try and match for the CONCAT
> > >token. But I am not really sure how to tell antlr that. :(
> > > As things stand, the lexer first matches CONCAT, and as a result
> > >throws the 'unexpected token: exception when I give it the following
> > >valid input:
> > > a = &H345ad&
> > >
> > >Any suggestions?
> >
> > untested, but perhaps this might do it:
> >
> > token { HEX; }
> > CONCAT : '&' (( 'h' (HEX_DIGIT)+ (('&')?)! ){ $setType(HEX); })? ;
> > protected HEX_DIGIT : '0'..'9' | 'a'..'f' ;
> >
>
More information about the antlr-interest
mailing list