[antlr-interest] Re: Is there an ANTLR trick/hack to specify "NEWLINE or EOF" in Lexer

micheal_jor <open.zone at virgin.net> open.zone at virgin.net
Tue Feb 4 16:29:51 PST 2003


> > 1. Subclassing uponEOF() to somehow persuade nextToken() to fudge 
and 
> > return one final NEWLINE token seems to be the "wrong" approach. 
Will 
> > it work?
> 
> You might get something going with an extra tokenstreamfilter to do 
your
> extra token fudging... Another option might be subclassing the 
lexer and
> overriding uponEOF and nextToken (the latter only proxy original 
nextToken
> untill the special case arrives)

Thanks. I was thinking along those lines but I guess it just felt 
wrong somehow.

> Notice that uponEOF might be called more than once, not 100% sure 
if this
> is related to me having tracing enabled by default, it might also 
be a
> result of guessing mode.

I wasn't expecting that...

> The trouble with this will probably be to make it play nice when 
the parser
> is in guessing mode.

...all this is herding me in the "sort it all out in a hand-crafted 
TokenStreamFilter" direction.

> > 2. Perhaps ANTLR should support a "virtual EOF char" that can be 
> > matched like any other char in Lexer rules in addition to the 
current 
> > [upon]EOF end-of-file condition mechanism?
> 
> I'd go for that one anytime, uponEOF is very awkward in use. It 
also allows
> to make nicer errormessages for unexpected EOF's, uponEOF is (at 
least in my
> attempts to make use of it) useless.

Well then, I'll consider it added to the cookie jar for ANTLR v3 ;-)

> > Related Question:
> > -----------------
> > Is there a standard inbuilt mechanism for stuffing arbitary Token 
> > into the Lexer's output TokenStream?
> 
> TokenStream filters would be closest to that. I think you might be 
able to
> find something in the archives related to stuffing extra tokens 
into the
> stream, I recall seeing stuff in the past.

Cool. Thanks, I might think about on-demand code-gen for this. Sort 
of like:
$insertToken(new CustomToken(...))/$appendToken(...)

The codegen would then generate code that uses a Vector/ArrayList in 
nextToken() as appropriate. 

Oh forget it...it's trivial to write. I'll just check that it's 
covered in the FAQ  ;-)

Micheal



 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list