[antlr-interest] Grammar Perplexity in v3.0

Sun Nov 12 08:10:16 PST 2006

Hi,

I'm having a very perplexing problem with a grammar I'm developing for 
ANTLR v3.0. Since this is my first foray in ANTLR and since version 3.0 
is still in beta, I'm uncertain whether the problem is with my 
understanding or the software.

Before I try to create an excerpt of the grammar to post here, I thought 
I'd describe the problem roughly and ask for general suggestions on 
what might be the problem. If that doesn't yield any progress, I'll 
make a longer, more detailed post.

So here's the thing. My grammar, as usual, has both lexical rules and 
grammar productions (sorry if I'm using the wrong terminology).

When I compile the grammar, I get this diagnostic:

ANTLR Parser Generator   Early Access Version 3.0b4 (??, 2006)  
1989-2006
TSTPv3209.g:1099:1: The following token definitions are unreachable: 
AtomicWord,FileName,SingleQuoted

These are lines 1098-1100:

SingleQuoted
    :   '\'' ( ~( '\'' | '\\' ) | '\\' '\'' | '\\' '\\' )* '\''
    ;

Now, I've looked very closely at my grammar and there most certainly are 
chains of derivation in the production rules from the start symbol to 
AtomicWord. Furthermore, when I sumbit input to the parser, it fails at 
precisely the point where it should recognize an AtomicWord, issuing 
this diagnostic (line breaks added by me):

[main,
 tptpUnit,
 tptpInput,
 annotatedFormula,
 fofAnnotated,
 firstOrderFormula,
 unitaryFormula,
 firstOrderFormula,
 unitaryFormula,
 quantifiedFormula,
 unitaryFormula,
 unaryFormula,
 unitaryFormula]:
line 29:14 state 0 (decision=11) no viable alt;
 token=[@49,1289:1290='gt',<4>,29:14]

The input does include the characters 'g' and 't' at columns 14 and 15 
of line 29. The rule unitaryFormula includes in its four alternatives 
one which derives, through a fairly long chain, an instance of 
AtomicWord. None of those derivations appear at the point the parse 
failure is reported.

The pertinent portion of the grammar, lines 1288-1291:

fragment
Alphanumeric
    :   ( LowerAlpha | UpperAlpha | Numeric | '_' )
    ;

This chain of derivations is correct up to the point that the input 
token 'gt' is rejected. That sequence, 'gt', should be matched by 
AtomicWord.

I have scrutinized my grammar both in its source form and in ANTLRworks, 
and I can see no reason for this parse failure.

I'd very much welcome any suggestions on what might be going wrong or 
how I might diagnose the problem. If the suggestion is to send a more 
comprehensive report, please give me some idea of what's the best way 
to construct and submit it.

Thanks.

Randall Schulz