[antlr-interest] newbie lookahead question

Lance Gutteridge lance at thinkingworks.com
Fri Apr 21 23:54:01 PDT 2006


John,

Maybe I figured it out. It seems that there must be a rule that outputs 
tokens for the testLiterals to take effect. So I added an ID rule which 
matches a word that begins with a letter and is followed by an arbitrary 
number of letters or digits.

I think that what happens is that it tries to match ID, when it does it 
checks the literals and if it finds a match it outputs the literal token 
rather than ID.

John: I'm not sure if your previous remark that the members of the 
tokens sections are rules is correct.

> And, oh by the way, that stuff between the "s in the tokens{...} 
> section *IS*
> a lexer rule --- it means:
>
>         'match this explicit string literal when testLiterals is true' 

I think more precisely it says 'when you match a token with a lexer rule 
and if testLiterals is true, then check the token section to see if it 
matches a string and output that token if it does.'

So I think it is more of modifier of  the output of the lexer rules 
(those that have the testLiteral option turned on)  than a lexer rule in 
its own right.

Do you agree with that?

>
> (now if we only had a way to specify synonyms in the tokens{...} section,
> e.g. tokens{ TRUE="true","YES"; FALSE="false","NO"; } then life really 
> would
> be easy ;-) 

Yes that would be a nice feature.


Lance


Lance Gutteridge wrote:

> John,
> Thanks for the help. What you say sounds clear and I read the 
> documentation on TestLiterals=true. I thought, aha, that is the key, 
> just turn the TestLiterals to true and all will be fine.
>
> However when I try it in a grammar it doesn't seem to work. Following 
> is a test grammar I made up. When I give the parser the string
> "activate on" it comes up with the message Parse error: line 1:1: 
> unexpected char: 'a'.
>
> When I uncomment the three rules (ACTIVATE,ON and OFF) it parses fine 
> and gives me a tree with the ACTIVATE token as the main node and one 
> child of the token ON. Which is exactly what I wanted.
> (In this case I am surprised that the tokens section does not create 
> an ambiguity with those lexer rules.)
>
> I checked the code of the lexer and the hash table is being generated 
> to look up the three literals. However the lexer stubbornly refues to 
> output the token ACTIVATE when I just have them defined in the tokens 
> section.
>
> I'm probably doing something really stupid here, but I'm quite puzzled.
>
> Thanks for your help,
> Lance
>
> class TestLexer extends Lexer;
> options
> {
>    testLiterals = true;
>    k=2;
> }
>
> tokens{ ACTIVATE="activate"; ON="on";OFF="off";}
> //ACTIVATE: "activate";
> //ON: "on";
> //OFF: "off";
> //++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>
> // Whitespace -- ignored
> //++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>
> WS    :    (    ' '
>        |    '\t'
>        |    '\f'
>            // handle newlines
>        |    (    options {generateAmbigWarnings=false;}
>            :    "\r\n"  // Windows
>            |    '\r'    // Macintosh
>            |    '\n'    // Unix
>            )
>            { newline(); }
>        )+
>        { _ttype = Token.SKIP; }
>    ;
>
> class TestParser extends Parser;
> options
> {
>        buildAST=true;
>        k = 1;
>        defaultErrorHandler=false;
> }
>
> statement: ACTIVATE^ (ON | OFF);
>
>
>
>
> John B. Brodie wrote:
>
>> Sir :-
>>
>>  
>>
>>> Well maybe not. It seems I was wrong about the tokens section. It 
>>> doesn't specify lexer rules so the tokens aren't detected and put 
>>> into the token stream for the parser. Ah well. It seemed like a good 
>>> idea at the time.
>>>
>>> Lance
>>>   
>>
>>
>> You are not wrong about the tokens{...} lexer section.
>>
>> The tokens{...} section operates in concert with the testLiterals=true
>> option. Please review the antlr documentation for testLiterals.
>>
>> You are able to set the options{ testLiterals=true; } either at the 
>> global
>> level so that all rules in your lexical inspect the tokens{...} 
>> generated map
>> or you can set the options{ testLiterals=true; } on only those 
>> specific lexer
>> rules that are pertinent (i prefer the latter).
>>
>> And, oh by the way, that stuff between the "s in the tokens{...} 
>> section *IS*
>> a lexer rule --- it means:
>>
>>         'match this explicit string literal when testLiterals is true'
>>
>>
>> (now if we only had a way to specify synonyms in the tokens{...} 
>> section,
>> e.g. tokens{ TRUE="true","YES"; FALSE="false","NO"; } then life 
>> really would
>> be easy ;-)
>>
>> Hope this helps...
>>   -jbb
>>
>>  
>>
>


More information about the antlr-interest mailing list