[antlr-interest] Failure on OpenJDK on Debian

Wed Apr 1 04:02:31 PDT 2009

Gavin Lambert wrote:
> At 09:45 1/04/2009, Ola Bini wrote:
>  >> Looks like your lexer rules are ambiguous (which probably 
> isn't
>  >> helped by the fact that you're also using literals in your
>  >> parser rules).
>  >> You should definitely consider correcting your rules so that
>  >> the ambiguity warnings go away; doing that should make the
>  >> error go away as well.
>  >>
>  >I don't have any ambiguity warnings on any other Java
>  >implementation.
> 
> Perhaps not, but it's still suggestive.  If your rules aren't 
> fully left-factored then ANTLR often can work out the right thing 
> to do, but it'll be slower than if they were factored properly -- 
> and every once in a while you'll get a "surprise" when you change 
> some other part of the grammar :)
> 
> So if that particular runtime executes slower than other variants, 
> it might be hitting the timeout on that version but just barely 
> squeaking through on others.

In a language with keywords, there'll always be ambiguities at k=1. If 
you have a typical identifier rule ID, and they keywords like

THIS : 'this';
IF: 'if';

and so on, then at k=1 they'll always be ambiguous with ID. However, 
k=*, it'll do whatever lookahead is needed, so there isn't actually an 
ambiguity with LL(*). It would be silly to left-factor, say:

EVERY : 'every';
EACH : 'each';
EVENT : 'event';

Because it just makes it unreadable. ANTLR knows what to do with this, 
so why left-factor? You'll end up with equivalent decision making, even. 
  I'm assuming that when LL(*) DFA generation times out, it falls back 
to LL(1), and then spits out all the warnings. I've actually fallen back 
on a lookup table as my keywords got very numerous, and once the grammar 
was basically done I moved things like '::=' into the lexer properly, 
but that makes no difference to whether things are ambiguous or not. The 
simple fact is that it confuses people when ANTLR *sometimes* tells them 
that:

COLON: ':' ;
ASS: '::=' ;

are ambiguous, without actually saying clearly that it's fallen back on 
k=1 or something. I think that there's a problem if this is happening as 
much as it sounds like. I know about the Stopping Problem, so ANTLR has 
to have some arbitrary way to try and detect an inifinite loop in DFA 
creation, but it sounds like the timeout is actually a bit short, as 
it's causing problems in lots of valid cases. I'm assuming, of course, 
that it *is* a timeout issue, as when this has happened to me (pretty 
much as others have described it) I haven't seen a timeout warning, just 
all the misleading lexer ambiguity warnings, usually followed by errors 
that tokens are unreachable.

For the issue of keywords, is it really not worth building in a 
keyword-lookup feature for ANTLR lexers? I'd've thought it'd be easy in 
all the target languages, just introduce (or is it re-introduce) a 
keywords{} section with name-value pairs, that creates the tokens (no 
need for empty fragment rules) and provides a standard hook interface so 
that rules that *can* lead to keywords look them up. It's a fairly 
standard coding pattern, AFAICT, so providing an ANTLRish way to do it 
would make sense. I think it's more efficient than a rule-per-keyword 
(or the equivalent, using the tokens section), with more than a handful 
of keywords anyway. It's not like keywords are a rare language concept, 
so direct support makes sense.

-- 
Sam Barnett-Cormack