[antlr-interest] Where is the EOF documentation?
Loring Craymer
lgcraymer at yahoo.com
Mon Jan 5 10:29:28 PST 2009
EOF is a real token generated when the lexer reaches the end of file. The EOF symbol appears in a grammar to force processing of all tokens in a file or to generate an error condition if that is not possible. If all files fed to a grammar-derived parser are well-formed (do not have missing or extra tokens at the end of file and conform to the grammar), then a grammar without EOF will appear to be working.
--Loring
________________________________
From: George S. Cowan <cowang at comcast.net>
To: antlr-interest at antlr.org
Sent: Monday, January 5, 2009 9:48:34 AM
Subject: [antlr-interest] Where is the EOF documentation?
Message
Is there
any documentation for the special EOF token?
When the
parser reaches the end of file, special things seem to happen. But I can't find
anywhere this is documented. I'm in the middle of my second pass through the
Definitive ANTLR Reference, and I don't recall it occurring in any of the
grammars there, or in fact, being mentioned at all. It isn't in the index
either.
I have seen it
mentioned as a magic fix for certain grammars on the website, e.g., http://www.antlr.org/wiki/pages/viewpage.action?pageId=4554943.
In a search from the ANTLR home page, I
found several email list entries that mentioned EOF frustrations and
solutions, but I am still not clear about when it is needed and when it is not.
(When trying to find emails from a search on the home page, watch out for the
"" port in the URLS, it must be manually removed in order to get to the
emails' new location.)
Here are the the
facts that I have gleaned from reading and experiments:
1. EOF is not
usually required at the end of your top-level rule in a grammar, but
sometimes it is. When is not clear.
2. Some rules cannot
be unit tested because directly calling them against a valid input
stream puts an EOF at the end of the input steam and some rules
trip over the EOF. When is not clear.
3. EOF is only
available in the parser, not the lexer. So, for instance, we can't make it part
of whitespace, or make it ok for a line comment to end with an EOF as well
as a newline.
I begin to suspect
that EOF is not truly a token on the token steam, but just something
to use in the parser when certain special undocumented handling is wanted.
I also suspect that the effect of an end of file is different
depending on DFAs, hoisting, and maybe other things not seen in the grammar, but
going on in the implementation.
A satisfactory
response would be, "How true, but a future release will clean up some of this,
and better to wait and document that."
Frustrated, but hoping for clarity,
George
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090105/cd8624d2/attachment.html
More information about the antlr-interest
mailing list