[antlr-interest] Tokens vs. Characters in Lexer/MismatchedTokenException
Terence Parr
parrt at cs.usfca.edu
Sun Jan 18 13:12:30 PST 2009
ah. Yes, I get it now. Take of a lexer as a parser that parses
characters instead of tokens. In this way I have generalized the
notion of a recognizer so that we represent any element in the stream
as an integer vocabulary symbol "type".
sorry for the confusion.
Ter
On Jan 18, 2009, at 1:10 PM, Rick Mann wrote:
>
> On Jan 18, 2009, at 12:55:46, Terence Parr wrote:
>
>>
>> On Jan 18, 2009, at 12:38 PM, Rick Mann wrote:
>>
>>> As I'm working on my language target, I see that Lexer.match(int
>>> c) in
>>> the Java target can create a MismatchedTokenException(), passing c
>>> in
>>> to its constructor.
>>
>> weird. in the Java version, it treats it as a token type.
>
> Sorry I wasn't clear: I'm referring to the Java version. As I
> examine this further, I realize it's combining Characters and Token
> *Types*, not Tokens. This is still a little apples-and-oranges to me.
>
> Lexer has two match() methods:
>
> Lexer.match(String s)
> Lexer.match(int c)
>
> When the ANTLR tool builds the Java recognizer for the example
> grammar in the codegen wiki page, it creates a rule mZERO() in the
> Lexer subclass that calls match('0'). At this point we're passing a
> char as an int parameter, which should be legal without warnings.
> Examining Lexer.match(int c) reveals this:
>
> public void match(int c) throws MismatchedTokenException {
> if ( input.LA(1)!=c ) {
> if ( state.backtracking>0 ) {
> state.failed = true;
> return;
> }
> MismatchedTokenException mte =
> new MismatchedTokenException(c, input);
> recover(mte); // don't really recover; just consume in lexer
> throw mte;
> }
> input.consume();
> state.failed = false;
> }
>
> The only no-arg constructor of MismatchedTokenException is:
>
> public MismatchedTokenException(int expecting, IntStream input) {
> super(input);
> this.expecting = expecting;
> }
>
> And this.exception is declared like this:
>
> public int expecting = Token.INVALID_TOKEN_TYPE;
>
> Implying that we've now converted a character to a token type
> (semantically, that is).
>
>
>
>>
>>
>>> The exception class seems to treat that int as a token. I wouldn't
>>> have thought Tokens and Characters to be interchangeable. What am I
>>> missing?
>>
>> if that were the case, it wouldn't compile. Are you sure that it is
>> treating it as a token?
>>
>> Ter
>
> --
> Rick
>
More information about the antlr-interest
mailing list