[antlr-interest] Tokens vs. Characters in Lexer/MismatchedTokenException

Rick Mann rmann at latencyzero.com
Sun Jan 18 13:10:19 PST 2009


On Jan 18, 2009, at 12:55:46, Terence Parr wrote:

>
> On Jan 18, 2009, at 12:38 PM, Rick Mann wrote:
>
>> As I'm working on my language target, I see that Lexer.match(int c)  
>> in
>> the Java target can create a MismatchedTokenException(), passing c in
>> to its constructor.
>
> weird. in the Java version, it treats it as a token type.

Sorry I wasn't clear: I'm referring to the Java version. As I examine  
this further, I realize it's combining Characters and Token *Types*,  
not Tokens. This is still a little apples-and-oranges to me.

Lexer has two match() methods:

Lexer.match(String s)
Lexer.match(int c)

When the ANTLR tool builds the Java recognizer for the example grammar  
in the codegen wiki page, it creates a rule mZERO() in the Lexer  
subclass that calls match('0'). At this point we're passing a char as  
an int parameter, which should be legal without warnings. Examining  
Lexer.match(int c) reveals this:

public void match(int c) throws MismatchedTokenException {
	if ( input.LA(1)!=c ) {
		if ( state.backtracking>0 ) {
			state.failed = true;
			return;
		}
		MismatchedTokenException mte =
			new MismatchedTokenException(c, input);
		recover(mte);  // don't really recover; just consume in lexer
		throw mte;
	}
	input.consume();
	state.failed = false;
}

The only no-arg constructor of MismatchedTokenException is:

public MismatchedTokenException(int expecting, IntStream input) {
	super(input);
	this.expecting = expecting;
}

And this.exception is declared like this:

public int expecting = Token.INVALID_TOKEN_TYPE;

Implying that we've now converted a character to a token type  
(semantically, that is).



>
>
>> The exception class seems to treat that int as a token. I wouldn't
>> have thought Tokens and Characters to be interchangeable. What am I
>> missing?
>
> if that were the case, it wouldn't compile. Are you sure that it is  
> treating it as a token?
>
> Ter

-- 
Rick



More information about the antlr-interest mailing list