[antlr-interest] Lexer rules and unreachable alternatives (trying to understand lexer)
Wincent Colaiuta
win at wincent.com
Thu Apr 19 07:25:04 PDT 2007
El 19/4/2007, a las 14:38, Johannes Luber escribió:
> With this rule WS should get all consecutive spaces. But I haven't
> tested if FOO is still chosen over WS. Maybe
>
> start
> : (WS)=> WS
> | FOO
> ;
>
> is still needed.
I think the problem is that by the time the start rule is run in the
parser, lexing has already taken place, so by then it is too late for
the predicate to influence the outcome (you already have either a WS
or a FOO token).
I did some more testing, and these are the results; for start rules
like this:
start : WS | FOO ; // order of WS and FOO in parser rule irrelevant
start : (WS | FOO)+ ;
start : .+ ;
If your input is *only* spaces then, all else being equal, the first-
listed lexer rule wins.
But if your input contains more than just spaces (like "foo bar",
"foo ", " bar"), the FOO is always going to win, regardless of
the order of the lexer rules.
As you commented, the only way to overcome this greedy matching
behaviour seems to be to explicitly disallow spaces in FOO. No big
deal, but my natural inclination was to specify my lexer rules like
this:
SPECIFIC_RULE : ....
LESS_SPECIFIC_RULE : ...
GENERAL_RULE : ...
And let "lexer precedence" sort out which one matches. This doesn't
work, though, because if the a more general rule subsumes a more
specific one, then the general rule will always win (a single greedy
match) instead of yielding two smaller matches. In the end it looks
like predicates in the lexer rules or some other workaround will have
to step in.
> And as you are new with ANTLR I can recommend the following tutorial
> (which I incidentally wrote):
>
> http://www.antlr.org/wiki/display/ANTLR3/Quick+Starter+on+Parser
> +Grammars+-+No+Past+Experience+Required
Yes, I had already read it, actually. It is a nice introduction to
the topic! The main thing which I'm having trouble coming to grips
with is achieving total separation between the lexer and the parser;
my previous experience was with integrated lexer/parsers, so the
lexer always knew exactly where it was and what kinds of symbols to
look for in the current context. But in ANTLR the lexer has to do its
scanning from start to finish without any help from the parser; I
understand that you can get it to do what you want using predicates,
but it's probably going to take me a while to get the hang of it.
Cheers,
Wincent
More information about the antlr-interest
mailing list