[antlr-interest] Tokens

Thu Nov 26 20:05:19 PST 2009

On 11/26/2009 10:47 PM, Ronald Sok wrote:
> Being not too familiar with language grammars and ANTLR
> I ended up with a grammar which I am not too happy with.
> 
> I don't want to bother you with my entire grammar so I created
> a very simple example demonstrating my problem.
> I want to parse the following :
> 
> BEGIN_SOMETHING
>     Name: Pear
>     Type: Apple
> END_SOMETHING
> 
> 
> The tokens BEGIN_SOMETHING and END_SOMETHING indicate
> the start and end markers of the block. The Name can have any value and
> the Type can be one from the list Apple, Pear, Orange. The problem
> I have is that the Name, as seen in the example, can also have the value
> from one of the Type list, in this case Pear.
> 
> The grammar I have is this:
> grammar Some;
> 
> someFile
>     :    'BEGIN_SOMETHING' NEWLINE someName someType 'END_SOMETHING' NEWLINE
>     ;
>    
> someName
>     :    'Name:' ID NEWLINE
>     ;
> 
> someType
>     :    'Type:' someTypeOption NEWLINE
>     ;
>    
> someTypeOption
>     :    APPLE
>     |    PEAR
>     |    ORANGE
>     ;
>    
> APPLE
>     :    'Apple'
>     ;
> 
> PEAR
>     :    'Pear'
>     ;
> 
> ORANGE
>     :    'Orange'
>     ;   
>    
> ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
>     ;
> 
> NEWLINE
>     :    '\r'? '\n'
>     ;
>    
> WS  :   ( ' '
>         | '\t'
>         | '\r'
>         | '\n'
>         ) {$channel=HIDDEN;}
>     ;
> 
> 
> Obviously this grammar is unable to recognize the sequence 'Name: Pear',
> because 'Pear' is matched by the token PEAR and not by ID. I can ofcourse
> add the tokens APPLE,PEAR and ORANGE to the rule someName:
> 
> someName
>     :    'Name:' (APPLE|PEAR|ORANGE|ID) NEWLINE
>     ;
> 
> But my feeling tells me this is not the way to go. I hope somebody can 
> clarify this for me.

You are close.  What you have here is keywords as opposed to reserved
words.  When implementing the former, you will need to do something like
(at least this is what I do using ANTLR 2.7.7):

id : ID
   | k:keyword
      { #k->setType(ID); }
      // This changes the token type of a keyword to an ID
   ;

keyword
   : APPLE | PEAR | ORANGE
   ;

someName
   :     'Name:' id NEWLINE
   ;

You could reduce the number of productions by folding, but the principal
of changing the token type of keywords is what is important here.  And
you may have to find out how to do this with ANTLR 3.x.

[Of course, I have problems with the token 'Name:' containing the ":"
character, but that's just me.  (":" is usually a delimiter and usually
parsed as its own token.)  Plus, you don't seem to be treating NEWLINE
as whitespace, but maybe your grammar requires that too.]

> Thanks.
> 
> Ronald

-- 
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)