[antlr-interest] A very simple grammar problem.

Fri Mar 7 12:37:44 PST 2008

At 07:45 8/03/2008, Kenneth Domino wrote:
 >In fact, I should think that because one will need to define a
 >grammar with a production "S' -> S EOF" or some equivalent,
 >ANTLR would do the construction for me.  Maybe this is part of
 >a feature that I am unaware of.

I think (and this would seem to be supported by the comments from 
the source you quoted later on) that ANTLR parsers permit you to 
have many separate entrypoints.  This means that you could write 
one parser that has two entirely separate rule trees (or even 
overlap at some point lower down), then use one of them to parse 
half the input stream and the other to parse the rest.  So it can 
be an advantage to be able to say "parse everything that you can 
recognise, but don't worry if you can't consume the whole input, 
that's for a different parser".

But for the most common case (single entrypoint for "normal" 
parsing, maybe one or two others for unit tests) the main 
entrypoint should usually have an EOF.  In fact normally I 
explicitly write a 'start' rule; not just for that reason but also 
because it reads a bit more naturally to execute 
'parser.start()'.  It normally looks like one of these:

   start: program EOF;
   start: definition* EOF;

 >The second problem I was having was the error "no rule can
 >obviously be followed by EOF".  Maybe it is because of the
 >common prefix in two alternatives as you suggested.  But, I
 >think there is a problem in the ANTLR tool that searches for
 >start symbols and adds end states to the NFA constructed for
 >the grammar.  This check occurs in the following ANTLR code:
[...]
 >   /** add an EOF transition to any rule end NFAState that 
points
 >to nothing
 >     *  (i.e., for all those rules not invoked by another 
rule).
 >These
 >     *  are start symbols then.
 >     *
 >     *  Return the number of grammar entry points; i.e., how 
many
 >rules are
 >     *  not invoked by another rule (they can only be invoked 
from
 >outside).
 >     *  These are the start rules.
 >     */
[...]
 >Unfortunately, with the grammar "grammar test; a : 'A' a | 'A' 
;",
 >the NFA constructed contains cycles so endNFAState.transition(0) 

 >is never null, numberUnInvokedRules is never incremented in
 >build_EOFStates(), and the error message is printed.

Well, first I must admit that I've never studied the internals of 
parsing, so I only have a vague idea of what an NFA is and how it 
relates.  But from the comment it sounds like the routines are 
actually operating as expected.  The rule "a" in your grammar *is* 
invoked by another rule (namely, itself), so by that definition it 
is not a candidate start symbol, and your grammar therefore indeed 
contains no start rules.

If you had added a rule "b: a;", then this would have been a start 
rule (since nothing within the grammar calls 'b') and the warning 
would go away.

Similarly, if you had removed the recursion (which as I explained 
earlier is generally a bad idea anyway) then the rule 'a' would no 
longer have called itself, and would have become a start symbol.

(Any kind of recursion involving the root rule can have this 
effect, not just direct recursion.  For example, if your root rule 
is 'expression' and you supported subexpressions in parentheses 
[which of course also invoke 'expression'] then you'll hit the 
same problem.  Using a separate root rule as I mentioned above 
will always resolve things neatly though.)