[antlr-interest] Understanding priorities in lexing (newbie)

Thu Jul 12 17:49:51 PDT 2007

HI,

Yes, I understand somewhat ..., what somewhat I'm not yet 100% on.

Where I have been coming from is that one grammar file (what follows 
'grammar' in the file) would define one language grammar (not two). How the 
work gets divided up between lexer and parser is a matter of convenience and 
efficiency but should not change the meaning of the overall grammar and the 
character streams that gets accepted or rejected by the resulting overall 
state machine. It seems that in ANTLR's case, the one grammar file defines 
two grammars (one for the lexer and one for the parser) and how the work 
gets divided up between the two can have a considerable impact on the 
character streams accepted or rejected. I get further tripped up because the 
rules of these two grammars can be interspersed in the file and look as if 
they mesh seamlessly. Ooof!

Confused but thinking about it,

Daniel

----- Original Message ----- 
From: "Wincent Colaiuta" <win at wincent.com>
To: "Daniel Brosseau" <daniel at lba.ca>
Cc: "ANTLR-Interest" <antlr-interest at antlr.org>
Sent: Thursday, July 12, 2007 8:05 PM
Subject: Re: [antlr-interest] Understanding priorities in lexing (newbie)

El 13/7/2007, a las 0:18, Daniel Brosseau escribió:

> Hi,
>
> Love this:
>
>> Well, it does what I expected so it's "correct", just not what  you  want 
>> ;)
>>
>
> Case 1:
> grammar lex;
> KEYWORD : 'a' 'b' 'c';
> OTHER : 'a' | 'b' | 'c';
> program : (  KEYWORD  |  OTHER  )*
>
> Input: "aba" chokes on second a
>
> Case 2:
> grammar lex;
> kEYWORD : 'a' 'b' 'c';
> oTHER : 'a' | 'b' | 'c';
> program : (  kEYWORD |  oTHER  )*
>
> Input: "aba" outputs oTHER oTHER oTHER
>
> Same grammar, two different state machines.
>
> As I tried to say earlier, although the rules language used for the  lexer 
> and parser seems to be describing things in the same manner,  they in fact 
> describe very different state machines. So at the  least this is an 
> inconsistency which leads to confusion.

One thing to bear in mind is that lexing and parsing are completely
separate phases in ANTLR; sure the parser and lexer run at the same
time because the parser is just saying "give me a token, give me
another token" etc until all tokens are produced, but conceptually
because there is no communication from the parser to the lexer you
can think of them as two completely separate phases.

So when you take your first lexer, which has two rules (KEYWORD and
OTHER) and then morph it into the second lexer, which only has one
rule (KEYWORD) then you are changing it in a fundamental way which
completely changes the way it operates.

Cheers,
Wincent