[antlr-interest] passing stuff from lexer to parser

Sun Jan 6 21:01:44 PST 2008

On Jan 7, 2008 1:25 PM,  <siemsen at ucar.edu> wrote:
> Gavin,
>
> My comments inline...
>
> On Jan 2, 2008, at 3:59 PM, Gavin Lambert wrote:
>
> >> Would it be possible to inject a token into the token stream just
> >> before I switch to the include file and call reset?  In the
> >> PragmaInclude lexer rule, can I call "emit" to do it, and make the
> >> token contain the include file name?  I haven't done anything like
> >> this before, I just wonder if it's reasonable.
> >
> > Lexer operation is basically just calling nextToken to retrieve one
> > token at a time.  Calling emit sets the data for that token; not
> > calling it will lead to generating a default token based on all the
> > characters matched by the rule.
> >
> > I'm not really familiar with the Java runtime, so I'm not sure what
> > the reset call affects.  It might destroy an emit as well (and you
> > probably can't emit afterwards successfully either).  Still, it
> > could be worth a try.
> >
> > The rule must currently be returning *something*, though, since
> > every top-level lexer rule called must return a token.  Trace it
> > through with a debugger and see what's going on.
>
> I tried adding a call to emit right before the calls to setCharStream
> and reset.  As Thomas Brandon predicted, nothing happened, probably
> because the calls to setCharStream and reset destroy the token(s)
> created by the lexer rule.  I tried putting the call to emit right
> after the call to reset, even though that's not of much value to me -
> I want the parser to know the include file name before it sees tokens
> from the include file.
Putting it after the reset will still result in it coming out before
the included tokens.

> That generated this:
>
> Exception in thread "main" java.lang.ClassCastException:
> org.antlr.runtime.ClassicToken
>          at cimmof2javaLexer.nextToken(cimmof2javaLexer.java:111)
>          at org.antlr.runtime.CommonTokenStream.fillBuffer
> (CommonTokenStream.java:119)
>          at org.antlr.runtime.CommonTokenStream.LT
> (CommonTokenStream.java:238)
>          at cimmof2javaParser.mofSpecification(cimmof2javaParser.java:
> 141)
>          at cimmof2java.main(cimmof2java.java:24)
>
> Line 111 in cimmof2javaLexer.java is
>
>                 if (((CommonToken)token).getStartIndex() < 0)
>
> So when the token is cast to a CommonToken, boom.  I confess that I'm
> not sure how to handle this.  If you're still interested, it may help
> to see a current version of the grammar, which I've attached.
>
Yeah, the reset call wipes all the token variables so emit before hand
wont help. It looks like you should be able to call emit after the
reset call. It Iooks like the overloaded nextToken in the include
example skips the empty token that results when you switch lexers but
if a token is created after the reset it should return this. You
should be creating a CommonToken not a ClassicToken. Looks like it is
working fine otherwise.

> I'll start a new antlr-interest thread that focuses on the mechanism
> for handling include files.  I think the parser should see the tokens
> in the include statement, and that the tokens from the included file
> should appear after the tokens that represent the include statement
> itself.
>
Generally I don't think you would really want the include statement to
remain in the source file. The more typical method would be to use a
custom token subclass that stored the original file name. This might
be a better method for you as well. This saves you having to track the
filename of the last include file in the parser and means that the
original file name is always available for error messages and the
like.

Tom.

> Thanks for all your help!
>
> -- Pete
>
>
>
>