[antlr-interest] Re: Overlapping tokens

David Maxwell david at crlf.net
Tue Oct 11 14:45:03 PDT 2005


On Wed, 05 Oct 2005, David Maxwell wrote:
> In a lex/yacc example, I could do something like this:
> 
> "FooBar"                { printf ("Found a FOOBAR lex token\n");
>                           strcpy(yylval.stval,yytext);
>                           return FOOBAR; }
> 
> [a-zA-Z_]*              { printf("Found a ID lex token\n");
>                           strcpy(yylval.stval,yytext);
>                           return ID; }

Okay - so it was a bit of an RTFM (though no one even said that...)

testLiterals can do most of what I want as described above - but not
perfectly. The generated code takes the {} in the ID token and runs it
before the lookup in the literals table. As a result, the code can't
access the token type - it's not known yet.

The generated code looks like what is shown below. Is there any
construct that allows insertion of code _after_ the token type is set?
(Other than hand-editing the Lexer.cpp after every rebuild.)

void Lexer::mID(bool _createToken) {
	... // match code

	{ Your code here }
#line 442 "Lexer.cpp"
        _ttype = testLiteralsTable(_ttype);
        if ( _createToken && _token==ANTLR_USE_NAMESPACE(antlr)nullToken && _ttype!=ANTLR_USE_NAMESPACE(antlr)Token::SKIP ) {
           _token = makeToken(_ttype);
           _token->setText(text.substr(_begin, text.length()-_begin));
        }
        _returnToken = _token;
        _saveIndex=0;
}

-- 
David Maxwell, david at vex.net|david at maxwell.net --> Unless you have a solution
when you tell them things like that, most people collapse into a gibbering, 
unthinking mass.  This is the same reason why you probably don't tell your 
boss about everything you read on BugTraq!    - Signal 11


More information about the antlr-interest mailing list