[antlr-interest] lexer rule matching problem

Sun Jan 8 20:59:41 PST 2006

Hi Martin,
  As per the Microsoft VBScript interpreter, for a VBS statement like
    b=a&h3
  the correct token stream to expect would be
    IDENTIFIER EQUALS IDENTIFIER HEX
  which is exactly what we are getting. The official interpreter first
tries to parse the text as a hex, and only when that fails does it try
to interpret it as an identifier. So, the above vbs statement actually
causes the interpreter to throw an error.
   A statement like
      b=b&&h3
  is valid, and produces the following
      IDENTIFIER EQUALS IDENTIFIER CONCAT HEX
So, while diambiguating in the parser might make the language more
logical, since I want to stay faithfull to the official version, I
have to implement the quirks as well.

- tinker
:)


On 1/6/06, Martin Probst <mail at martin-probst.com> wrote:
>
> > token { HEX; }
> > CONCAT : '&' (( 'h' (HEX_DIGIT)+ (('&')?)! ){ $setType(HEX); })? ;
> > protected HEX_DIGIT : '0'..'9' | 'a'..'f' ;
>
> What happens if someone wants do to this:
>
> a = "foo"
> h3 = "bar"
> b = a&h3
>
> You'll end up with a token stream of IDENTIFIER EQUALS IDENTIFIER HEX.
> The lexer needs to know that it's in a non-operator state (where a
> concat cannot occur) as the language is ambiguous otherwise. Maybe you
> can also get around it by disambiguating in the parser, e.g. lex the '&'
> simply as an AMPERSAND and let the parser figure out what it is.
>
> Martin
>
>