[antlr-interest] Lexer rules and unreachable alternatives (trying to understand lexer)

Wincent Colaiuta win at wincent.com
Thu Apr 19 04:32:06 PDT 2007


El 19/4/2007, a las 12:15, Johannes Luber escribió:

> Wincent Colaiuta wrote:
>> Given a lexer with a single rule:
>>
>>     OTHER : .+ ;
>
> The problem is, that your first OTHER rule is ambiguous - it can match
> everything, even the keywords, etc. defined in the other rules.  
> With .*
> the OTHER rule becomes optional. I suggest to either change your
> grammar, so that it doesn't need the OTHER rule, or to use syntactic
> predicates, which prevent that OTHER matches anything what another  
> rule
> could match.

Ok, the funny thing is that there are no other rules at all. I made a  
lexer with that single rule in it because I was trying to figure out  
what it did under the covers... Given that no ambiguity is possible  
with only one rule, I wonder if ANTLR has a hard-coded response to  
lexer rules like ".+"...

The thing which motivated me to start exploring this was a set of  
questions about lexer precedence (by which I mean, how the lexer  
chooses which rules to try) and I had a set of rules which looked  
something like this:

WS : ' '+ ;
FOO : ~('x' | 'y' | 'z')+ ;

At first I mistakenly thought that the lexer would try lexer rules in  
order (WS first and then FOO), but it doesn't. It calls a predict  
method and the predication always goes for FOO without fail. My  
understanding is now that the prediction method favors a greedy  
match, and so even typing "     \n" into the test rig is enough to  
make it prefer FOO over WS (because of the trailing newline). I  
played around with greedy=false but that yielded single characters  
rather than a string of non-whitespace characters. In any case,  
exploring the issue I eventually got down to a minimal lexer  
containing that lone OTHER rule...

I still have a lot to learn about ANTLR lexers!

Cheers,
Wincent



More information about the antlr-interest mailing list