[antlr-interest] Grammar Perplexity in v3.0
Randall R Schulz
rschulz at sonic.net
Sun Nov 12 08:10:16 PST 2006
Hi,
I'm having a very perplexing problem with a grammar I'm developing for
ANTLR v3.0. Since this is my first foray in ANTLR and since version 3.0
is still in beta, I'm uncertain whether the problem is with my
understanding or the software.
Before I try to create an excerpt of the grammar to post here, I thought
I'd describe the problem roughly and ask for general suggestions on
what might be the problem. If that doesn't yield any progress, I'll
make a longer, more detailed post.
So here's the thing. My grammar, as usual, has both lexical rules and
grammar productions (sorry if I'm using the wrong terminology).
When I compile the grammar, I get this diagnostic:
ANTLR Parser Generator Early Access Version 3.0b4 (??, 2006)
1989-2006
TSTPv3209.g:1099:1: The following token definitions are unreachable:
AtomicWord,FileName,SingleQuoted
These are lines 1098-1100:
SingleQuoted
: '\'' ( ~( '\'' | '\\' ) | '\\' '\'' | '\\' '\\' )* '\''
;
Now, I've looked very closely at my grammar and there most certainly are
chains of derivation in the production rules from the start symbol to
AtomicWord. Furthermore, when I sumbit input to the parser, it fails at
precisely the point where it should recognize an AtomicWord, issuing
this diagnostic (line breaks added by me):
[main,
tptpUnit,
tptpInput,
annotatedFormula,
fofAnnotated,
firstOrderFormula,
unitaryFormula,
firstOrderFormula,
unitaryFormula,
quantifiedFormula,
unitaryFormula,
unaryFormula,
unitaryFormula]:
line 29:14 state 0 (decision=11) no viable alt;
token=[@49,1289:1290='gt',<4>,29:14]
The input does include the characters 'g' and 't' at columns 14 and 15
of line 29. The rule unitaryFormula includes in its four alternatives
one which derives, through a fairly long chain, an instance of
AtomicWord. None of those derivations appear at the point the parse
failure is reported.
The pertinent portion of the grammar, lines 1288-1291:
fragment
Alphanumeric
: ( LowerAlpha | UpperAlpha | Numeric | '_' )
;
This chain of derivations is correct up to the point that the input
token 'gt' is rejected. That sequence, 'gt', should be matched by
AtomicWord.
I have scrutinized my grammar both in its source form and in ANTLRworks,
and I can see no reason for this parse failure.
I'd very much welcome any suggestions on what might be going wrong or
how I might diagnose the problem. If the suggestion is to send a more
comprehensive report, please give me some idea of what's the best way
to construct and submit it.
Thanks.
Randall Schulz
More information about the antlr-interest
mailing list