[antlr-interest] Where is the EOF documentation?

Loring Craymer lgcraymer at yahoo.com
Mon Jan 5 10:29:28 PST 2009


EOF is a real token generated when the lexer reaches the end of file.  The EOF symbol appears in a grammar to force processing of all tokens in a file or to generate an error condition if that is not possible.  If all files fed to a grammar-derived parser are well-formed (do not have missing or extra tokens at the end of file and conform to the grammar), then a grammar without EOF will appear to be working.

--Loring




________________________________
From: George S. Cowan <cowang at comcast.net>
To: antlr-interest at antlr.org
Sent: Monday, January 5, 2009 9:48:34 AM
Subject: [antlr-interest] Where is the EOF documentation?

Message 
Is there 
any documentation for the special EOF token? 
 
When the 
parser reaches the end of file, special things seem to happen. But I can't find 
anywhere this is documented. I'm in the middle of my second pass through the 
Definitive ANTLR Reference, and I don't recall it occurring in any of the 
grammars there, or in fact, being mentioned at all. It isn't in the index 
either. 
 
I have seen it 
mentioned as a magic fix for certain grammars on the website, e.g., http://www.antlr.org/wiki/pages/viewpage.action?pageId=4554943. 
 
In a search from the ANTLR home page, I 
found several email list entries that mentioned EOF frustrations and 
solutions, but I am still not clear about when it is needed and when it is not. 
(When trying to find emails from a search on the home page, watch out for the 
"" port in the URLS, it must be manually removed in order to get to the 
emails' new location.)
 
Here are the the 
facts that I have gleaned from reading and experiments:
 
1. EOF is not 
usually required at the end of your top-level rule in a grammar, but 
sometimes it is. When is not clear.
 
2. Some rules cannot 
be unit tested because directly calling them against a valid input 
stream puts an EOF at the end of the input steam and some rules 
trip over the EOF. When is not clear.
 
3. EOF is only 
available in the parser, not the lexer. So, for instance, we can't make it part 
of whitespace, or make it ok for a line comment to end with an EOF as well 
as a newline.
 
 
I begin to suspect 
that EOF is not truly a token on the token steam, but just something 
to use in the parser when certain special undocumented handling is wanted. 
I also suspect that the effect of an end of file is different 
depending on DFAs, hoisting, and maybe other things not seen in the grammar, but 
going on in the implementation. 
 
A satisfactory 
response would be, "How true, but a future release will clean up some of this, 
and better to wait and document that."
 
Frustrated, but hoping for clarity, 
George


      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090105/cd8624d2/attachment.html 


More information about the antlr-interest mailing list