[antlr-interest] Extracting a string whose value clashes with token value

Achint Mehta achintmehta at gmail.com
Tue Aug 11 22:34:05 PDT 2009


Hi,

I am stuck at a seemingly trivial problem.
I have written a simplified sample grammar which has this issue.

In the grammar I have a rule to extract a generic string
special_string: (CHAR | '=' | '.' | '-' | '@' )+ ;

and a token ANTLR which is defined as:
ANTLRTOKEN:'ANTLR';

A rule which parses two words ( the first of which has to be ANTLR is
defined as follows):
requestline : ANTLRTOKEN WHITESPACE special_string ;

It seems that if the input word begins with keyword "ANTLR", then that word
is treated as it is beginning with ANTLRTOKEN and passed to the parser.
i.e. an input text "ANTLR ANTLRWORKS",  loosely seems to be treated as
sequence of
ANTLRTOKEN WHITESPACE ANTLRTOKEN special_string

The whole grammar file is as follows: (This grammer simply parses any word
followed by the keyword ANTLR)
-----------------------------------------------------------------------------
grammar sample_parser;

options
{
    language=C;
}

requestline : ANTLRTOKEN WHITESPACE special_string ;
special_string: (CHAR | '=' | '.' | '-' | '@' )+ ;

WHITESPACE  : ( '\t' | ' ' | '\u000C' )+;
NEWLINE: ('\r')? '\n';
CHAR: (('a'..'z')|('A'..'Z'));
ANTLRTOKEN:'ANTLR';
-----------------------------------------------------------------------------

If i provide the input as
ANTLR WORKS

Then everything works  fine and I don't get any error.

Now if I provide the input as
ANTLR ANTLRWORKS
Then I get the error as
----------------------------------------------------------------------------
input(1)  : error 5 : Unexpected token, at offset 5
    near [Index: 2 (Start: 24666934-Stop: 24666938) ='ANTLR', type<4> Line:
1 LinePos:5]
     : missing elements...
----------------------------------------------------------------------------

Seems that the lexer treats the sub-string ANTLR in ANTLRWORKS as the token
ANTLRTOKEN and passes it to parser which is not expecting that token.

Is there a way to tell antlr to not to break the input word ANTLRWORKS into
token and treat the whole word as special_string ?

Can somebody help me getting around this issue ?

Thanks in advance.

Also, I am using the following version of the library, etc.
java version "1.6.0_14"
ANTLR version 3.1.3
Target language : C
C runtime library version: 3.1.3
gcc compiler: 4.3.3 (Ubuntu 4.3.3-5ubuntu4)


Regards,
Achint
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090812/53c0f40d/attachment.html 


More information about the antlr-interest mailing list