[antlr-interest] newbie lookahead question
Lance Gutteridge
lance at thinkingworks.com
Fri Apr 21 23:54:01 PDT 2006
John,
Maybe I figured it out. It seems that there must be a rule that outputs
tokens for the testLiterals to take effect. So I added an ID rule which
matches a word that begins with a letter and is followed by an arbitrary
number of letters or digits.
I think that what happens is that it tries to match ID, when it does it
checks the literals and if it finds a match it outputs the literal token
rather than ID.
John: I'm not sure if your previous remark that the members of the
tokens sections are rules is correct.
> And, oh by the way, that stuff between the "s in the tokens{...}
> section *IS*
> a lexer rule --- it means:
>
> 'match this explicit string literal when testLiterals is true'
I think more precisely it says 'when you match a token with a lexer rule
and if testLiterals is true, then check the token section to see if it
matches a string and output that token if it does.'
So I think it is more of modifier of the output of the lexer rules
(those that have the testLiteral option turned on) than a lexer rule in
its own right.
Do you agree with that?
>
> (now if we only had a way to specify synonyms in the tokens{...} section,
> e.g. tokens{ TRUE="true","YES"; FALSE="false","NO"; } then life really
> would
> be easy ;-)
Yes that would be a nice feature.
Lance
Lance Gutteridge wrote:
> John,
> Thanks for the help. What you say sounds clear and I read the
> documentation on TestLiterals=true. I thought, aha, that is the key,
> just turn the TestLiterals to true and all will be fine.
>
> However when I try it in a grammar it doesn't seem to work. Following
> is a test grammar I made up. When I give the parser the string
> "activate on" it comes up with the message Parse error: line 1:1:
> unexpected char: 'a'.
>
> When I uncomment the three rules (ACTIVATE,ON and OFF) it parses fine
> and gives me a tree with the ACTIVATE token as the main node and one
> child of the token ON. Which is exactly what I wanted.
> (In this case I am surprised that the tokens section does not create
> an ambiguity with those lexer rules.)
>
> I checked the code of the lexer and the hash table is being generated
> to look up the three literals. However the lexer stubbornly refues to
> output the token ACTIVATE when I just have them defined in the tokens
> section.
>
> I'm probably doing something really stupid here, but I'm quite puzzled.
>
> Thanks for your help,
> Lance
>
> class TestLexer extends Lexer;
> options
> {
> testLiterals = true;
> k=2;
> }
>
> tokens{ ACTIVATE="activate"; ON="on";OFF="off";}
> //ACTIVATE: "activate";
> //ON: "on";
> //OFF: "off";
> //++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> // Whitespace -- ignored
> //++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> WS : ( ' '
> | '\t'
> | '\f'
> // handle newlines
> | ( options {generateAmbigWarnings=false;}
> : "\r\n" // Windows
> | '\r' // Macintosh
> | '\n' // Unix
> )
> { newline(); }
> )+
> { _ttype = Token.SKIP; }
> ;
>
> class TestParser extends Parser;
> options
> {
> buildAST=true;
> k = 1;
> defaultErrorHandler=false;
> }
>
> statement: ACTIVATE^ (ON | OFF);
>
>
>
>
> John B. Brodie wrote:
>
>> Sir :-
>>
>>
>>
>>> Well maybe not. It seems I was wrong about the tokens section. It
>>> doesn't specify lexer rules so the tokens aren't detected and put
>>> into the token stream for the parser. Ah well. It seemed like a good
>>> idea at the time.
>>>
>>> Lance
>>>
>>
>>
>> You are not wrong about the tokens{...} lexer section.
>>
>> The tokens{...} section operates in concert with the testLiterals=true
>> option. Please review the antlr documentation for testLiterals.
>>
>> You are able to set the options{ testLiterals=true; } either at the
>> global
>> level so that all rules in your lexical inspect the tokens{...}
>> generated map
>> or you can set the options{ testLiterals=true; } on only those
>> specific lexer
>> rules that are pertinent (i prefer the latter).
>>
>> And, oh by the way, that stuff between the "s in the tokens{...}
>> section *IS*
>> a lexer rule --- it means:
>>
>> 'match this explicit string literal when testLiterals is true'
>>
>>
>> (now if we only had a way to specify synonyms in the tokens{...}
>> section,
>> e.g. tokens{ TRUE="true","YES"; FALSE="false","NO"; } then life
>> really would
>> be easy ;-)
>>
>> Hope this helps...
>> -jbb
>>
>>
>>
>
More information about the antlr-interest
mailing list