[antlr-interest] Tokens
Ronald Sok
ronald.sok at gmail.com
Fri Nov 27 05:53:08 PST 2009
Kevin J. Cummings wrote:
> On 11/26/2009 10:47 PM, Ronald Sok wrote:
>
>> Being not too familiar with language grammars and ANTLR
>> I ended up with a grammar which I am not too happy with.
>>
>> I don't want to bother you with my entire grammar so I created
>> a very simple example demonstrating my problem.
>> I want to parse the following :
>>
>> BEGIN_SOMETHING
>> Name: Pear
>> Type: Apple
>> END_SOMETHING
>>
>>
>> The tokens BEGIN_SOMETHING and END_SOMETHING indicate
>> the start and end markers of the block. The Name can have any value and
>> the Type can be one from the list Apple, Pear, Orange. The problem
>> I have is that the Name, as seen in the example, can also have the value
>> from one of the Type list, in this case Pear.
>>
>> The grammar I have is this:
>> grammar Some;
>>
>> someFile
>> : 'BEGIN_SOMETHING' NEWLINE someName someType 'END_SOMETHING' NEWLINE
>> ;
>>
>> someName
>> : 'Name:' ID NEWLINE
>> ;
>>
>> someType
>> : 'Type:' someTypeOption NEWLINE
>> ;
>>
>> someTypeOption
>> : APPLE
>> | PEAR
>> | ORANGE
>> ;
>>
>> APPLE
>> : 'Apple'
>> ;
>>
>> PEAR
>> : 'Pear'
>> ;
>>
>> ORANGE
>> : 'Orange'
>> ;
>>
>> ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
>> ;
>>
>> NEWLINE
>> : '\r'? '\n'
>> ;
>>
>> WS : ( ' '
>> | '\t'
>> | '\r'
>> | '\n'
>> ) {$channel=HIDDEN;}
>> ;
>>
>>
>> Obviously this grammar is unable to recognize the sequence 'Name: Pear',
>> because 'Pear' is matched by the token PEAR and not by ID. I can ofcourse
>> add the tokens APPLE,PEAR and ORANGE to the rule someName:
>>
>> someName
>> : 'Name:' (APPLE|PEAR|ORANGE|ID) NEWLINE
>> ;
>>
>> But my feeling tells me this is not the way to go. I hope somebody can
>> clarify this for me.
>>
>
> You are close. What you have here is keywords as opposed to reserved
> words. When implementing the former, you will need to do something like
> (at least this is what I do using ANTLR 2.7.7):
>
> id : ID
> | k:keyword
> { #k->setType(ID); }
> // This changes the token type of a keyword to an ID
> ;
>
This seems very useful and appears to solve my problem. Thank you very much.
> keyword
> : APPLE | PEAR | ORANGE
> ;
>
> someName
> : 'Name:' id NEWLINE
> ;
>
> You could reduce the number of productions by folding, but the principal
> of changing the token type of keywords is what is important here. And
> you may have to find out how to do this with ANTLR 3.x.
>
> [Of course, I have problems with the token 'Name:' containing the ":"
> character, but that's just me. (":" is usually a delimiter and usually
> parsed as its own token.) Plus, you don't seem to be treating NEWLINE
> as whitespace, but maybe your grammar requires that too.]
>
>
I follow you on this one and agree that ':' should be considered as a
seperate token.
The NEWLINE token I use to force that the input is line seperated. So
that I don't
accept input like:
BEGIN_SOMETHING Name: Pear Type: Apple END_SOMETHING
But maybe I am being too strict here.
Thank you for your pointer.
>> Thanks.
>>
>> Ronald
>>
>
>
More information about the antlr-interest
mailing list