[antlr-interest] Why don't parsers support character ranges?
Gavin Lambert
antlr at mirality.co.nz
Wed Apr 23 01:16:49 PDT 2008
At 14:16 23/04/2008, Hannes Schmidt wrote:
>Why can't I use ranges in parsers?
You can, they just don't mean what you think they mean.
>Why doesn't ANTLR emit a warning when it ignores ranges in
grammar
>rules?
Because it's not ignoring them. When you say "'a'..'z'" in a
parser rule, the parser first automatically creates tokens for the
quoted terms (because it's a combined grammar; if it were a
standalone grammar you'd just get an error). Now the rule says
something like "T16..T17", which is a range of tokens. If you're
lucky, this will be the first time it's seen those tokens and
they'll have contiguous values, so your range basically just means
those two tokens and nothing else. If you're not lucky, there may
be other tokens in between those two, so you'll be referring to
those as well. Either way, it's probably not what you thought you
were saying.
Given that there currently isn't any official way of controlling
the token ids generated by rules, ranges in the parser probably
*should* generate warnings, since (while valid) they're not
especially useful.
(I also dislike the way that quoted constants are permitted in
parser rules in the first place [since I think it leads to just
this sort of confusion], but that's a different issue.)
>How can I emulate the missing range feature without obfuscating
>my grammar too much? Semantic predicates?
Probably. Or move whatever construct you're trying to match that
includes ranges into the lexer as a single token.
>Now let me put my tinfoil hat on and theorize a little bit: I
>think that the root cause of my confusion is ANTLR's
distinction
>between lexer and parser. I think this distinction is purely
>historical and ANTLR might be better of without it. When writing
>grammars, I often find myself in situations where I know that
>certain lexer rules make sense in a certain parser context only
>but that context is not available to the lexer because the state
>that defines it is maintained in the parser.
At times I agree with you, but it's usually not all that hard to
get a decent set of lexer rules. The tactic I usually follow is
to write the lexer rules *first*, and unit test them by themselves
to ensure the token stream is being generated as I expect. *Then*
I start writing parser rules to either transform the token stream
into an AST or to directly do something more interesting. If you
think of it in layers then it's not hard to keep it all straight.
More information about the antlr-interest
mailing list