[antlr-interest] Failure on OpenJDK on Debian
Sam Barnett-Cormack
s.barnett-cormack at lancaster.ac.uk
Wed Apr 1 04:02:31 PDT 2009
Gavin Lambert wrote:
> At 09:45 1/04/2009, Ola Bini wrote:
> >> Looks like your lexer rules are ambiguous (which probably
> isn't
> >> helped by the fact that you're also using literals in your
> >> parser rules).
> >> You should definitely consider correcting your rules so that
> >> the ambiguity warnings go away; doing that should make the
> >> error go away as well.
> >>
> >I don't have any ambiguity warnings on any other Java
> >implementation.
>
> Perhaps not, but it's still suggestive. If your rules aren't
> fully left-factored then ANTLR often can work out the right thing
> to do, but it'll be slower than if they were factored properly --
> and every once in a while you'll get a "surprise" when you change
> some other part of the grammar :)
>
> So if that particular runtime executes slower than other variants,
> it might be hitting the timeout on that version but just barely
> squeaking through on others.
In a language with keywords, there'll always be ambiguities at k=1. If
you have a typical identifier rule ID, and they keywords like
THIS : 'this';
IF: 'if';
and so on, then at k=1 they'll always be ambiguous with ID. However,
k=*, it'll do whatever lookahead is needed, so there isn't actually an
ambiguity with LL(*). It would be silly to left-factor, say:
EVERY : 'every';
EACH : 'each';
EVENT : 'event';
Because it just makes it unreadable. ANTLR knows what to do with this,
so why left-factor? You'll end up with equivalent decision making, even.
I'm assuming that when LL(*) DFA generation times out, it falls back
to LL(1), and then spits out all the warnings. I've actually fallen back
on a lookup table as my keywords got very numerous, and once the grammar
was basically done I moved things like '::=' into the lexer properly,
but that makes no difference to whether things are ambiguous or not. The
simple fact is that it confuses people when ANTLR *sometimes* tells them
that:
COLON: ':' ;
ASS: '::=' ;
are ambiguous, without actually saying clearly that it's fallen back on
k=1 or something. I think that there's a problem if this is happening as
much as it sounds like. I know about the Stopping Problem, so ANTLR has
to have some arbitrary way to try and detect an inifinite loop in DFA
creation, but it sounds like the timeout is actually a bit short, as
it's causing problems in lots of valid cases. I'm assuming, of course,
that it *is* a timeout issue, as when this has happened to me (pretty
much as others have described it) I haven't seen a timeout warning, just
all the misleading lexer ambiguity warnings, usually followed by errors
that tokens are unreachable.
For the issue of keywords, is it really not worth building in a
keyword-lookup feature for ANTLR lexers? I'd've thought it'd be easy in
all the target languages, just introduce (or is it re-introduce) a
keywords{} section with name-value pairs, that creates the tokens (no
need for empty fragment rules) and provides a standard hook interface so
that rules that *can* lead to keywords look them up. It's a fairly
standard coding pattern, AFAICT, so providing an ANTLRish way to do it
would make sense. I think it's more efficient than a rule-per-keyword
(or the equivalent, using the tokens section), with more than a handful
of keywords anyway. It's not like keywords are a rare language concept,
so direct support makes sense.
--
Sam Barnett-Cormack
More information about the antlr-interest
mailing list