[antlr-interest] Tokens

Ronald Sok ronald.sok at gmail.com
Thu Nov 26 19:47:36 PST 2009


Being not too familiar with language grammars and ANTLR
I ended up with a grammar which I am not too happy with.

I don't want to bother you with my entire grammar so I created
a very simple example demonstrating my problem.
I want to parse the following :

BEGIN_SOMETHING
    Name: Pear
    Type: Apple
END_SOMETHING


The tokens BEGIN_SOMETHING and END_SOMETHING indicate
the start and end markers of the block. The Name can have any value and
the Type can be one from the list Apple, Pear, Orange. The problem
I have is that the Name, as seen in the example, can also have the value
from one of the Type list, in this case Pear.

The grammar I have is this:
grammar Some;

someFile
    :    'BEGIN_SOMETHING' NEWLINE someName someType 'END_SOMETHING' NEWLINE
    ;
   
someName
    :    'Name:' ID NEWLINE
    ;

someType
    :    'Type:' someTypeOption NEWLINE
    ;
   
someTypeOption
    :    APPLE
    |    PEAR
    |    ORANGE
    ;
   
APPLE
    :    'Apple'
    ;

PEAR
    :    'Pear'
    ;

ORANGE
    :    'Orange'
    ;   
   
ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

NEWLINE
    :    '\r'? '\n'
    ;
   
WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;


Obviously this grammar is unable to recognize the sequence 'Name: Pear',
because 'Pear' is matched by the token PEAR and not by ID. I can ofcourse
add the tokens APPLE,PEAR and ORANGE to the rule someName:

someName
    :    'Name:' (APPLE|PEAR|ORANGE|ID) NEWLINE
    ;

But my feeling tells me this is not the way to go. I hope somebody can 
clarify this for me.

Thanks.

Ronald





More information about the antlr-interest mailing list