[antlr-interest] Lexer rule question

Jim Idle jimi at temporal-wave.com
Fri Feb 8 10:34:44 PST 2008


> -----Original Message-----
> From: Johannes Luber [mailto:jaluber at gmx.de]
> Sent: Friday, February 08, 2008 8:10 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Lexer rule question
> 
> Hi!
> 
> I have never needed to know the answer before now, but what is the
> actual difference between:
> 
> A : B ;
> 
> B : 'B' ;
> 
> and
> 
> A : B ;
> 
> fragment B : 'B' ;

In the first instance, you will get an error that B is unreachable 
because it sees a  non fragment rule A first and that calls B. Because B 
is not a fragment, ANTLR tries to produce a token match for that as well 
as A and finds that the spec for both A and B is exactly the same.

In the second instance, B is a fragment and so ANTLR knows not to try to 
produce a real token B, as it is just a rule that is called by other 
lexer token definitions. Hence there is only a spec for the token A, 
which just calls the rule B.

All rules produce a single token only, but may call other rules, whether 
fragment rules or not, as part of the spec. However, if you dont use 
the fragment modifier, then the lexer will try to produce a token for 
that rule on its own, as well as the other rules that use it in 
combination.

So, basically, if your rule is just something for another rule to match 
with such as DIGIT etc, then use fragment and the lexer will not try to 
produce code that matches and produces the token DIGIT. Always use 
fragment if the parser is not expecting a token called by the lexer rule 
name.

To produce multiple tokens from one production you have to start 
deriving the token stream and storing the tokens produced in a List that 
you can consume/add to the token list (see source code comments here). 
That would be an overhead that most lexers dont need, so it isnt the 
default. There are few occasions that the only solution is to produce 
two tokens from one lexer rule; it does happen but I have always managed 
to find another way.

Jim




More information about the antlr-interest mailing list