[antlr-interest] Lexer rules and unreachable alternatives (trying to understand lexer)
Wincent Colaiuta
win at wincent.com
Thu Apr 19 04:32:06 PDT 2007
El 19/4/2007, a las 12:15, Johannes Luber escribió:
> Wincent Colaiuta wrote:
>> Given a lexer with a single rule:
>>
>> OTHER : .+ ;
>
> The problem is, that your first OTHER rule is ambiguous - it can match
> everything, even the keywords, etc. defined in the other rules.
> With .*
> the OTHER rule becomes optional. I suggest to either change your
> grammar, so that it doesn't need the OTHER rule, or to use syntactic
> predicates, which prevent that OTHER matches anything what another
> rule
> could match.
Ok, the funny thing is that there are no other rules at all. I made a
lexer with that single rule in it because I was trying to figure out
what it did under the covers... Given that no ambiguity is possible
with only one rule, I wonder if ANTLR has a hard-coded response to
lexer rules like ".+"...
The thing which motivated me to start exploring this was a set of
questions about lexer precedence (by which I mean, how the lexer
chooses which rules to try) and I had a set of rules which looked
something like this:
WS : ' '+ ;
FOO : ~('x' | 'y' | 'z')+ ;
At first I mistakenly thought that the lexer would try lexer rules in
order (WS first and then FOO), but it doesn't. It calls a predict
method and the predication always goes for FOO without fail. My
understanding is now that the prediction method favors a greedy
match, and so even typing " \n" into the test rig is enough to
make it prefer FOO over WS (because of the trailing newline). I
played around with greedy=false but that yielded single characters
rather than a string of non-whitespace characters. In any case,
exploring the issue I eventually got down to a minimal lexer
containing that lone OTHER rule...
I still have a lot to learn about ANTLR lexers!
Cheers,
Wincent
More information about the antlr-interest
mailing list