[antlr-interest] Is there an ANTLR trick/hack to specify "NEWLINE or EOF" in Lexer

Ric Klaren klaren at cs.utwente.nl
Tue Feb 4 08:31:35 PST 2003


Hi,

On Tue, Feb 04, 2003 at 11:31:26AM -0000, micheal_jor <open.zone at virgin.net> wrote:
> In an extension of the "single line comment can be followed by 
> NEWLINE or end-of-file" scenario, I need to return a NEWLINE token 
> when either an actual end-of-line or the end-of-file condition is 
> encountered.
> 
> [How] can this be specified in ANTLR?. 
> 
> 1. Subclassing uponEOF() to somehow persuade nextToken() to fudge and 
> return one final NEWLINE token seems to be the "wrong" approach. Will 
> it work?

You might get something going with an extra tokenstreamfilter to do your
extra token fudging... Another option might be subclassing the lexer and
overriding uponEOF and nextToken (the latter only proxy original nextToken
untill the special case arrives)

Notice that uponEOF might be called more than once, not 100% sure if this
is related to me having tracing enabled by default, it might also be a
result of guessing mode.

The trouble with this will probably be to make it play nice when the parser
is in guessing mode.

> 2. Perhaps ANTLR should support a "virtual EOF char" that can be 
> matched like any other char in Lexer rules in addition to the current 
> [upon]EOF end-of-file condition mechanism?

I'd go for that one anytime, uponEOF is very awkward in use. It also allows
to make nicer errormessages for unexpected EOF's, uponEOF is (at least in my
attempts to make use of it) useless.

> 3. Leaving the decision to a Parser generated errors at end-of-
> file "expecting "NEWLINE", found ''":
> 
> (a) Matching against the EOF "token"
> newlineOrEOF
> :   NEWLINE
> |!  EOF
> ;
> 
> (b) Checking if LA(1) against EOF_TYPE
> newlineOrEOF
> { bool atEndOfFile = true; }
> :   ( NEWLINE { atEndOfFile = false; } )?
>     { if ( atEndOfFile && (LA(1) != Token.EOF_TYPE) )
>          throw new RecognitionException(...);
>     }
> ;

Dunno about these conceptually I'd be more inclined to try and stuff it in
the lexer. It also depends on the grammar what might be easiest to
accomplish.

> Related Question:
> -----------------
> Is there a standard inbuilt mechanism for stuffing arbitary Token 
> into the Lexer's output TokenStream?

TokenStream filters would be closest to that. I think you might be able to
find something in the archives related to stuffing extra tokens into the
stream, I recall seeing stuff in the past.

Cheers,

Ric
-- 
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893722  ----
-----+++++*****************************************************+++++++++-------
 "Don't call me stupid." "Oh, right. To call you stupid would be an insult
    to stupid people. I've known sheep that could outwit you! I've worn
              dresses with higher IQs!" --- A Fish Called Wanda


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list