[antlr-interest] Failure on OpenJDK on Debian

Sam Barnett-Cormack s.barnett-cormack at lancaster.ac.uk
Wed Apr 1 05:58:28 PDT 2009


Gavin Lambert wrote:
> At 00:02 2/04/2009, Sam Barnett-Cormack wrote:
>  >However, k=*, it'll do whatever lookahead is needed, so there
>  >isn't actually an ambiguity with LL(*). It would be silly to
>  >left-factor, say:
>  >
>  >EVERY : 'every';
>  >EACH : 'each';
>  >EVENT : 'event';
>  >
>  >Because it just makes it unreadable. ANTLR knows what to do with
>  >this, so why left-factor? You'll end up with equivalent decision
>  >making, even.
> 
> Right, which is why those aren't the problem -- they can always be 
> resolved with static lookahead, so they shouldn't take long to figure out.
> 
> Where you can get into trouble is when there's a common left prefix 
> involving a loop -- such as the INT vs FLOAT vs RANGE case.

But by the sound of it, in Ola's case, at lest some of the collisions 
are of the sort I describe:

>      [java] warning(200): ioke.g:269:5: Decision can match input such as "'#'"
> using multiple alternatives: 1, 2
>      [java] As a result, alternative(s) 2 were disabled for that input

Okay, that sounds like it probably ought to be factored, from what 
little info we have.

>      [java] warning(209): ioke.g:323:1: Multiple token rules can match input
> such as "'#'": T__38, Identifier, StringLiteral, RegexpLiteral, LineComment
>      [java] 
>      [java] As a result, token(s)
> Identifier,StringLiteral,RegexpLiteral,LineComment were disabled for that input

At least T_38 is presumably finite-length and shouldn't be included. 
Something sounds odd in the language if identifiers, string literals, 
regex literals (and they are separate?) and one-line comments can all 
start with a hash...

>      [java] warning(209): ioke.g:202:1: Multiple token rules can match input
> such as "'['": T__34, Identifier
>      [java] 
>      [java] As a result, token(s) Identifier were disabled for that input

Ditto above

>      [java] warning(209): ioke.g:202:1: Multiple token rules can match input
> such as "'{'": T__36, Identifier
>      [java] 
>      [java] As a result, token(s) Identifier were disabled for that input

And again...

Sounds like something may well be a bit wrong with the grammar (would 
have to look at it to judge better), but sounds like something is wrong 
with the ambiguity detection (or it's falling back to k=1 without saying 
so) as well.

Sam



More information about the antlr-interest mailing list