[antlr-interest] Fragment tokens are not generated by emit() -- breaks setting TokenLabelType

Wed Jul 8 14:28:36 PDT 2009

David-Sarah Hopwood wrote:
> In order to change the token type (so that I can associate another
> field with each token), I have overridden emit() in my lexer as shown
> below:
>
> grammar Jacaranda;
>
> options {
>   TokenLabelType = JacarandaToken;
>   language = Java;
> }
>
> @lexer::members {
>   private String SV = null;
>
>   // See <http://www.antlr.org/wiki/pages/viewpage.action?pageId=1844>.
>   public Token emit() {
>     JacarandaToken token = new JacarandaToken(input, state.type,
>       state.channel, state.tokenStartCharIndex, getCharIndex()-1);
>     token.setLine(state.tokenStartLine);
>     token.setText(state.text);
>     token.setCharPositionInLine(state.tokenStartCharPositionInLine);
>
>     // Transfer the last SV computed by the lexer to the token object.
>     token.SV = SV; SV = null;
>
>     emit(token);
>     return token;
>   }
> }
>
> The problem is that the generated code has compilation errors because
> not all creation of token objects goes through emit(); there are some
> direct uses of 'new CommonToken(...)':
>
> org\jacaranda\verifier\JacarandaLexer.java:4006: incompatible types
> found   : org.antlr.runtime.CommonToken
> required: org.jacaranda.verifier.JacarandaToken
>   d = new CommonToken(input, Token.INVALID_TOKEN_TYPE,
>           Token.DEFAULT_CHANNEL, dStart1982, getCharIndex()-1);
>       ^
>
> The errors seem to occur in lexer rules where a child rule that is
> a fragment is given a name (whether or not that name is used in an
> action), for example:
Yes, I think you are correct.

The solution is either that return types of fragments in lexerRuleRef() 
template, should be known to be CommonToken or that the hard coded new 
CommonToken should use the template's labelType. Probably the latter, 
but fragment labels are really for using the token to determine the 
start and end of fragments spans and so on, rather than emit()'ing them, 
so perhaps not.

However, if you separate the lexer and parser, and use TokenLabelType 
only in the parser grammar, then I think it would work as you require, 
even with labeled fragment rules.

Jim