[antlr-interest] Lexer rule question
Johannes Luber
jaluber at gmx.de
Mon Feb 11 06:08:32 PST 2008
Jim Idle schrieb:
>> -----Original Message-----
>> From: Johannes Luber [mailto:jaluber at gmx.de]
>> Sent: Friday, February 08, 2008 8:10 AM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] Lexer rule question
>>
>> Hi!
>>
>> I have never needed to know the answer before now, but what is the
>> actual difference between:
>>
>> A : B ;
>>
>> B : 'B' ;
>>
>> and
>>
>> A : B ;
>>
>> fragment B : 'B' ;
>
> In the first instance, you will get an error that B is unreachable
> because it sees a non fragment rule A first and that calls B. Because B
> is not a fragment, ANTLR tries to produce a token match for that as well
> as A and finds that the spec for both A and B is exactly the same.
>
> In the second instance, B is a fragment and so ANTLR knows not to try to
> produce a real token B, as it is just a rule that is called by other
> lexer token definitions. Hence there is only a spec for the token A,
> which just calls the rule B.
>
> All rules produce a single token only, but may call other rules, whether
> fragment rules or not, as part of the spec. However, if you dont use
> the fragment modifier, then the lexer will try to produce a token for
> that rule on its own, as well as the other rules that use it in
> combination.
Wait a minute. If I have the following situation:
A : B C;
B : 'B';
C : 'C':
Does ANTLR still give unreachable warnings? Does ANTLR produces three
tokens (A, B and C)?
>
> So, basically, if your rule is just something for another rule to match
> with such as DIGIT etc, then use fragment and the lexer will not try to
> produce code that matches and produces the token DIGIT. Always use
> fragment if the parser is not expecting a token called by the lexer rule
> name.
OK.
> To produce multiple tokens from one production you have to start
> deriving the token stream and storing the tokens produced in a List that
> you can consume/add to the token list (see source code comments here).
> That would be an overhead that most lexers dont need, so it isnt the
> default. There are few occasions that the only solution is to produce
> two tokens from one lexer rule; it does happen but I have always managed
> to find another way.
So my hypothetical situation above would only create the token A, if it
would work at all?
Johannes
More information about the antlr-interest
mailing list