[antlr-interest] Understanding priorities in lexing (newbie)

Thu Jul 12 17:05:31 PDT 2007

El 13/7/2007, a las 0:18, Daniel Brosseau escribió:

> Hi,
>
> Love this:
>
>> Well, it does what I expected so it's "correct", just not what  
>> you  want ;)
>>
>
> Case 1:
> grammar lex;
> KEYWORD : 'a' 'b' 'c';
> OTHER : 'a' | 'b' | 'c';
> program : (  KEYWORD  |  OTHER  )*
>
> Input: "aba" chokes on second a
>
> Case 2:
> grammar lex;
> kEYWORD : 'a' 'b' 'c';
> oTHER : 'a' | 'b' | 'c';
> program : (  kEYWORD |  oTHER  )*
>
> Input: "aba" outputs oTHER oTHER oTHER
>
> Same grammar, two different state machines.
>
> As I tried to say earlier, although the rules language used for the  
> lexer and parser seems to be describing things in the same manner,  
> they in fact describe very different state machines. So at the  
> least this is an inconsistency which leads to confusion.

One thing to bear in mind is that lexing and parsing are completely  
separate phases in ANTLR; sure the parser and lexer run at the same  
time because the parser is just saying "give me a token, give me  
another token" etc until all tokens are produced, but conceptually  
because there is no communication from the parser to the lexer you  
can think of them as two completely separate phases.

So when you take your first lexer, which has two rules (KEYWORD and  
OTHER) and then morph it into the second lexer, which only has one  
rule (KEYWORD) then you are changing it in a fundamental way which  
completely changes the way it operates.

Cheers,
Wincent