[antlr-interest] Why won't this match...

Sun Feb 24 15:23:45 PST 2008

On Sun, Feb 24, 2008 at 5:12 PM, Mark Volkmann
<r.mark.volkmann at gmail.com> wrote:
>
> On Sun, Feb 24, 2008 at 9:40 AM, alan brown <listbrownie at gmail.com> wrote:
>  > It must be something obvious but why won't this language parse the word
>  > 'wibble'?  I would expect the lexer to be unable to match the input to
>  > BIG_TOKEN but successfully match to LITTLE_TOKEN followed by SEMI_TOKEN.  If
>  > I change the BIG_TOKEN definition to 'wobble' then all is well but I don't
>  > know why this is failing.
>  >
>  > Any help is appreciated
>  >
>  > root                       : tokenizer2 | tokenizer1 ;
>  >
>  > tokenizer1              : BIG_TOKEN ;
>  > tokenizer2             : LITTLE_TOKEN SEMI_TOKEN ;
>  >
>  > BIG_TOKEN           : 'wibbled' ;
>  >  LITTLE_TOKEN     : 'wi';
>  > SEMI_TOKEN            : 'bble' ;
>
>  This is "bang my head on the wall" frustrating!
>  It looks so simple, but I can't get it to work either!
>  Just when I thought I was getting the hang of it ...
>  I hope someone else has an answer.

Here's the grammar I used to test this.

grammar Wibble;
root: (PREFIX SUFFIX | WHOLE) { System.out.println("got it!"); };
WHOLE: 'wibbled';
PREFIX: 'wi';
SUFFIX: 'bble';
WHITESPACE: '\r' | '\n' { skip(); };

I stepped through the generated lexer code with the input "wibble". It
basically says
- got a 'w'
- got an 'i'
- got a 'b'
- okay, stop looking because the next token must be 'wibbled'

But that's wrong. The next token needs to be 'wi' since it's never
going to find the 'd' at the end.
Doesn't this seem wrong?

-- 
R. Mark Volkmann
Object Computing, Inc.