[antlr-interest] Lexer rule question

Mon Feb 11 06:08:32 PST 2008

Jim Idle schrieb:
>> -----Original Message-----
>> From: Johannes Luber [mailto:jaluber at gmx.de]
>> Sent: Friday, February 08, 2008 8:10 AM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] Lexer rule question
>>
>> Hi!
>>
>> I have never needed to know the answer before now, but what is the
>> actual difference between:
>>
>> A : B ;
>>
>> B : 'B' ;
>>
>> and
>>
>> A : B ;
>>
>> fragment B : 'B' ;
> 
> In the first instance, you will get an error that B is unreachable 
> because it sees a  non fragment rule A first and that calls B. Because B 
> is not a fragment, ANTLR tries to produce a token match for that as well 
> as A and finds that the spec for both A and B is exactly the same.
> 
> In the second instance, B is a fragment and so ANTLR knows not to try to 
> produce a real token B, as it is just a rule that is called by other 
> lexer token definitions. Hence there is only a spec for the token A, 
> which just calls the rule B.
> 
> All rules produce a single token only, but may call other rules, whether 
> fragment rules or not, as part of the spec. However, if you dont use 
> the fragment modifier, then the lexer will try to produce a token for 
> that rule on its own, as well as the other rules that use it in 
> combination.

Wait a minute. If I have the following situation:

A : B C;

B : 'B';

C : 'C':

Does ANTLR still give unreachable warnings? Does ANTLR produces three 
tokens (A, B and C)?
> 
> So, basically, if your rule is just something for another rule to match 
> with such as DIGIT etc, then use fragment and the lexer will not try to 
> produce code that matches and produces the token DIGIT. Always use 
> fragment if the parser is not expecting a token called by the lexer rule 
> name.

OK.

> To produce multiple tokens from one production you have to start 
> deriving the token stream and storing the tokens produced in a List that 
> you can consume/add to the token list (see source code comments here). 
> That would be an overhead that most lexers dont need, so it isnt the 
> default. There are few occasions that the only solution is to produce 
> two tokens from one lexer rule; it does happen but I have always managed 
> to find another way.

So my hypothetical situation above would only create the token A, if it 
would work at all?

Johannes