[antlr-interest] Understanding priorities in lexing (newbie)

Daniel Brosseau daniel at lba.ca
Thu Jul 12 09:01:20 PDT 2007


I am also new to ANTLR, I have read the book, want to thank you and am quite excited. But this thread has me a little perplexed.

If I have a simple grammar:

grammar lex;

KEYWORD : 'a' 'b' 'c';
OTHER : 'a' | 'b' | 'c';
program : (  KEYWORD  |  OTHER  )*

and feed it  "abab" it chokes at the second 'a'. Now I think I understand what was said earlier and I have gone through the code and I can see why it chokes but I do not yet understand why this is proper behaviour. Coming from a LEX background, there should be no problem converting this into a DFA that works, its done all the time.

To illustrate, if I change my grammar to the following:

grammar lex;

kEYWORD : 'a' 'b' 'c';
oTHER : 'a' | 'b' | 'c';
program : (  kEYWORD |  oTHER  )*

and feed it "abab" it parses the input as I would expect, no problem, properly identifying a sequence of four oTHER tokens. Isn't that what I should get by default, it looks much more natural and expected. Using filter=true cannot be the right answer for general cases like this.

The lexer should e keeping track of the longest token it has matched todate and return that token if it fails to match another longer token. Here, it does not do that. If I further change my grammar to:

grammar lex;

KEYWORD : 'a' 'b';
OTHER : 'a' | 'b';
program : (  KEYWORD  |  OTHER  )*

and feed it "aa" it correctly splits it into two OTHER tokens. The only difference between the first grammar and this grammar is that the distance between the length of the last acceptable matched token and where further matches fail goes from 1 ("a" vs "ab") to 2 ("a" vs "abc" ) characters. But that should not make a difference, although I know why it does in your case.

Regard with confusion,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070712/de558f05/attachment-0001.html 

More information about the antlr-interest mailing list