[antlr-interest] Tokens

Fri Nov 27 05:53:08 PST 2009

Kevin J. Cummings wrote:
> On 11/26/2009 10:47 PM, Ronald Sok wrote:
>   
>> Being not too familiar with language grammars and ANTLR
>> I ended up with a grammar which I am not too happy with.
>>
>> I don't want to bother you with my entire grammar so I created
>> a very simple example demonstrating my problem.
>> I want to parse the following :
>>
>> BEGIN_SOMETHING
>>     Name: Pear
>>     Type: Apple
>> END_SOMETHING
>>
>>
>> The tokens BEGIN_SOMETHING and END_SOMETHING indicate
>> the start and end markers of the block. The Name can have any value and
>> the Type can be one from the list Apple, Pear, Orange. The problem
>> I have is that the Name, as seen in the example, can also have the value
>> from one of the Type list, in this case Pear.
>>
>> The grammar I have is this:
>> grammar Some;
>>
>> someFile
>>     :    'BEGIN_SOMETHING' NEWLINE someName someType 'END_SOMETHING' NEWLINE
>>     ;
>>    
>> someName
>>     :    'Name:' ID NEWLINE
>>     ;
>>
>> someType
>>     :    'Type:' someTypeOption NEWLINE
>>     ;
>>    
>> someTypeOption
>>     :    APPLE
>>     |    PEAR
>>     |    ORANGE
>>     ;
>>    
>> APPLE
>>     :    'Apple'
>>     ;
>>
>> PEAR
>>     :    'Pear'
>>     ;
>>
>> ORANGE
>>     :    'Orange'
>>     ;   
>>    
>> ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
>>     ;
>>
>> NEWLINE
>>     :    '\r'? '\n'
>>     ;
>>    
>> WS  :   ( ' '
>>         | '\t'
>>         | '\r'
>>         | '\n'
>>         ) {$channel=HIDDEN;}
>>     ;
>>
>>
>> Obviously this grammar is unable to recognize the sequence 'Name: Pear',
>> because 'Pear' is matched by the token PEAR and not by ID. I can ofcourse
>> add the tokens APPLE,PEAR and ORANGE to the rule someName:
>>
>> someName
>>     :    'Name:' (APPLE|PEAR|ORANGE|ID) NEWLINE
>>     ;
>>
>> But my feeling tells me this is not the way to go. I hope somebody can 
>> clarify this for me.
>>     
>
> You are close.  What you have here is keywords as opposed to reserved
> words.  When implementing the former, you will need to do something like
> (at least this is what I do using ANTLR 2.7.7):
>
> id : ID
>    | k:keyword
>       { #k->setType(ID); }
>       // This changes the token type of a keyword to an ID
>    ;
>   
This seems very useful and appears to solve my problem. Thank you very much.

> keyword
>    : APPLE | PEAR | ORANGE
>    ;
>
> someName
>    :     'Name:' id NEWLINE
>    ;
>
> You could reduce the number of productions by folding, but the principal
> of changing the token type of keywords is what is important here.  And
> you may have to find out how to do this with ANTLR 3.x.
>
> [Of course, I have problems with the token 'Name:' containing the ":"
> character, but that's just me.  (":" is usually a delimiter and usually
> parsed as its own token.)  Plus, you don't seem to be treating NEWLINE
> as whitespace, but maybe your grammar requires that too.]
>
>   
I follow you on this one and agree that ':' should be considered as a 
seperate token.
The NEWLINE token I use to force that the input is line seperated. So 
that I don't
accept input like:

BEGIN_SOMETHING Name: Pear Type: Apple END_SOMETHING

But maybe I am being too strict here.
Thank you for your pointer.

>> Thanks.
>>
>> Ronald
>>     
>
>