[antlr-interest] passing stuff from lexer to parser

Mon Jan 7 08:45:03 PST 2008

On Jan 6, 2008, at 10:01 PM, Thomas Brandon wrote:

> On Jan 7, 2008 1:25 PM,  <siemsen at ucar.edu> wrote:
>> Gavin,
>>
>> My comments inline...
>>
>> On Jan 2, 2008, at 3:59 PM, Gavin Lambert wrote:
>>
>>>> Would it be possible to inject a token into the token stream just
>>>> before I switch to the include file and call reset?  In the
>>>> PragmaInclude lexer rule, can I call "emit" to do it, and make the
>>>> token contain the include file name?  I haven't done anything like
>>>> this before, I just wonder if it's reasonable.
>>>
>>> Lexer operation is basically just calling nextToken to retrieve one
>>> token at a time.  Calling emit sets the data for that token; not
>>> calling it will lead to generating a default token based on all the
>>> characters matched by the rule.
>>>
>>> I'm not really familiar with the Java runtime, so I'm not sure what
>>> the reset call affects.  It might destroy an emit as well (and you
>>> probably can't emit afterwards successfully either).  Still, it
>>> could be worth a try.
>>>
>>> The rule must currently be returning *something*, though, since
>>> every top-level lexer rule called must return a token.  Trace it
>>> through with a debugger and see what's going on.
>>
>> I tried adding a call to emit right before the calls to setCharStream
>> and reset.  As Thomas Brandon predicted, nothing happened, probably
>> because the calls to setCharStream and reset destroy the token(s)
>> created by the lexer rule.  I tried putting the call to emit right
>> after the call to reset, even though that's not of much value to me -
>> I want the parser to know the include file name before it sees tokens
>> from the include file.
> Putting it after the reset will still result in it coming out before
> the included tokens.
>
>> That generated this:
>>
>> Exception in thread "main" java.lang.ClassCastException:
>> org.antlr.runtime.ClassicToken
>>          at cimmof2javaLexer.nextToken(cimmof2javaLexer.java:111)
>>          at org.antlr.runtime.CommonTokenStream.fillBuffer 
>> (CommonTokenStream.java:119)
>>          at org.antlr.runtime.CommonTokenStream.LT 
>> (CommonTokenStream.java:238)
>>          at cimmof2javaParser.mofSpecification 
>> (cimmof2javaParser.java:141)
>>          at cimmof2java.main(cimmof2java.java:24)
>>
>> Line 111 in cimmof2javaLexer.java is
>>
>>                 if (((CommonToken)token).getStartIndex() < 0)
>>
>> So when the token is cast to a CommonToken, boom.  I confess that I'm
>> not sure how to handle this.  If you're still interested, it may help
>> to see a current version of the grammar, which I've attached.
>>
> Yeah, the reset call wipes all the token variables so emit before hand
> wont help. It looks like you should be able to call emit after the
> reset call. It Iooks like the overloaded nextToken in the include
> example skips the empty token that results when you switch lexers but
> if a token is created after the reset it should return this. You
> should be creating a CommonToken not a ClassicToken. Looks like it is
> working fine otherwise.

Yep, creating a CommonToken instead of a ClassicToken fixed it.
THANKS!

>> I'll start a new antlr-interest thread that focuses on the mechanism
>> for handling include files.  I think the parser should see the tokens
>> in the include statement, and that the tokens from the included file
>> should appear after the tokens that represent the include statement
>> itself.
>>
> Generally I don't think you would really want the include statement to
> remain in the source file. The more typical method would be to use a
> custom token subclass that stored the original file name. This might
> be a better method for you as well. This saves you having to track the
> filename of the last include file in the parser and means that the
> original file name is always available for error messages and the
> like.
>
> Tom.

Perhaps this would be better, but for now I'll just track the include
filename in the parser.

Thanks again!

-- Pete