[antlr-interest] Tokens vs. Characters in Lexer/MismatchedTokenException

Terence Parr parrt at cs.usfca.edu
Sun Jan 18 13:12:30 PST 2009


ah.  Yes, I get it now. Take of a lexer as a parser that parses  
characters instead of tokens. In this way I have generalized the  
notion of a recognizer so that we represent any element in the stream  
as an integer vocabulary symbol "type".

sorry for the confusion.

Ter
On Jan 18, 2009, at 1:10 PM, Rick Mann wrote:

>
> On Jan 18, 2009, at 12:55:46, Terence Parr wrote:
>
>>
>> On Jan 18, 2009, at 12:38 PM, Rick Mann wrote:
>>
>>> As I'm working on my language target, I see that Lexer.match(int  
>>> c) in
>>> the Java target can create a MismatchedTokenException(), passing c  
>>> in
>>> to its constructor.
>>
>> weird. in the Java version, it treats it as a token type.
>
> Sorry I wasn't clear: I'm referring to the Java version. As I  
> examine this further, I realize it's combining Characters and Token  
> *Types*, not Tokens. This is still a little apples-and-oranges to me.
>
> Lexer has two match() methods:
>
> Lexer.match(String s)
> Lexer.match(int c)
>
> When the ANTLR tool builds the Java recognizer for the example  
> grammar in the codegen wiki page, it creates a rule mZERO() in the  
> Lexer subclass that calls match('0'). At this point we're passing a  
> char as an int parameter, which should be legal without warnings.  
> Examining Lexer.match(int c) reveals this:
>
> public void match(int c) throws MismatchedTokenException {
> 	if ( input.LA(1)!=c ) {
> 		if ( state.backtracking>0 ) {
> 			state.failed = true;
> 			return;
> 		}
> 		MismatchedTokenException mte =
> 			new MismatchedTokenException(c, input);
> 		recover(mte);  // don't really recover; just consume in lexer
> 		throw mte;
> 	}
> 	input.consume();
> 	state.failed = false;
> }
>
> The only no-arg constructor of MismatchedTokenException is:
>
> public MismatchedTokenException(int expecting, IntStream input) {
> 	super(input);
> 	this.expecting = expecting;
> }
>
> And this.exception is declared like this:
>
> public int expecting = Token.INVALID_TOKEN_TYPE;
>
> Implying that we've now converted a character to a token type  
> (semantically, that is).
>
>
>
>>
>>
>>> The exception class seems to treat that int as a token. I wouldn't
>>> have thought Tokens and Characters to be interchangeable. What am I
>>> missing?
>>
>> if that were the case, it wouldn't compile. Are you sure that it is  
>> treating it as a token?
>>
>> Ter
>
> -- 
> Rick
>



More information about the antlr-interest mailing list