[antlr-interest] Lexer problem

Monty Zukowski monty at codetransform.com
Mon May 24 19:30:51 PDT 2004


Yeah, semantic predicates are used to decide which alternatives to 
choose.  ANTLR "knows" that the most restrictive case is the one where 
you match . and '\'', so it tests for that first.  You were using 
semantic predicates to decide which action to run.  The lexer isn't 
smart enough see beyond the token and know what is coming after in the 
next token.

Doing it in the actions, like your following post shows, is how I would 
have tried to do it.

Monty

On May 24, 2004, at 3:27 PM, Tom Verbeure wrote:

> Find below the generated code.
> It will first check for the CHAR_LIT and only then for the QUOTE with
> the heavy look-ahead. However, the CHAR_LIT case includes the more
> restricted case of the first QUOTE subrule, so this one will never be
> checked...
>
> BTW, in the original rule that I sent, all the hardcoded LA(x)
> statements should become LA(x-1). In the code below, this is already
> corrected.
>
> I will have a look at your article.
>
> Thanks,
> Tom
>
>
>
> public final void mQUOTE(boolean _createToken) throws
> RecognitionException, CharStreamException, TokenStreamException {
>     int _ttype; Token _token=null; int _begin=text.length();
>     traceIn("mQUOTE");
>     _ttype = QUOTE;
>     int _saveIndex;
>     try { // debugging
>
> 	match('\'');
> 	{
> 	    if ((((LA(1) >= '\u0000' && LA(1) <= '\u00ff')))&&(LA(2)=='\'')) {
> 		matchNot(EOF_CHAR);
> 		match("'");
> 		_ttype = CHARACTER_LITERAL;
> 	    }
> 	    else if (( true )&&(LA(1)=='(' && LA(2)=='\'' && LA(4)=='\'')) {
> 		_ttype = QUOTE;
> 	    }
> 	    else {
> 		_ttype = QUOTE;
> 	    }
>
> 	}
> 	if ( _createToken && _token==null && _ttype!=Token.SKIP ) {
> 	    _token = makeToken(_ttype);
> 	    _token.setText(new String(text.getBuffer(), _begin, 
> text.length()-_begin));
> 	}
> 	_returnToken = _token;
>     } finally { // debugging
> 	traceOut("mQUOTE");
>     }
> }
>
>
>
> On Mon, 24 May 2004 15:14:42 -0700, Monty Zukowski
> <monty at codetransform.com> wrote:
>>
>> On May 24, 2004, at 3:05 PM, Tom Verbeure wrote:
>>
>>> QUOTE: '\'' (
>>>     {LA(2)=='(' && LA(3)=='\'' && LA(5)=='\''}? {$setType(QUOTE);}
>>>     | {LA(3)=='\''}? . "'"                      {$setType(CHAR_LIT};}
>>>     |                                           {$setType(QUOTE);}
>>>     ;
>>>
>>> However, when I look at the generated code, it will always test for
>>> CHAR_LIT first, before looking at the first QUOTE.
>>
>> I'm not following you. Quote the generated code too.  Also consider
>> using a parser filter for this nastiness.
>> http://www.codetransform.com/filterexample.html
>>
>> Monty Zukowski
>>
>> ANTLR & Java Consultant -- http://www.codetransform.com
>> ANSI C/GCC transformation toolkit --
>> http://www.codetransform.com/gcc.html
>> Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html
>>
>>
>>
>>
>> Yahoo! Groups Links
>>
>>
>>
>>
>>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>
>
Monty Zukowski

ANTLR & Java Consultant -- http://www.codetransform.com
ANSI C/GCC transformation toolkit -- 
http://www.codetransform.com/gcc.html
Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list