[antlr-interest] Extracting a string whose value clashes with token value

Achint Mehta achintmehta at gmail.com
Wed Aug 12 20:07:36 PDT 2009


Hi Benoit,

The solution suggested by you solved the problem.

Many thanks.

Regards,
Achint

On Wed, Aug 12, 2009 at 2:10 AM, Benoit Fouletier
<benblo+ANTLR at gmail.com<benblo%2BANTLR at gmail.com>
> wrote:

> I think special_string should be a lexer rule, not a parser rule: rename it
> to SPECIAL_STRING.Also, the lexer depends on the order with which you
> define tokens, so make sure you put ANTLRTOKEN  above SPECIAL_STRING.
>
>
> On Wed, Aug 12, 2009 at 7:34 AM, Achint Mehta <achintmehta at gmail.com>wrote:
>
>>
>> Hi,
>>
>> I am stuck at a seemingly trivial problem.
>> I have written a simplified sample grammar which has this issue.
>>
>> In the grammar I have a rule to extract a generic string
>> special_string: (CHAR | '=' | '.' | '-' | '@' )+ ;
>>
>> and a token ANTLR which is defined as:
>> ANTLRTOKEN:'ANTLR';
>>
>> A rule which parses two words ( the first of which has to be ANTLR is
>> defined as follows):
>> requestline : ANTLRTOKEN WHITESPACE special_string ;
>>
>> It seems that if the input word begins with keyword "ANTLR", then that
>> word is treated as it is beginning with ANTLRTOKEN and passed to the parser.
>> i.e. an input text "ANTLR ANTLRWORKS",  loosely seems to be treated as
>> sequence of
>> ANTLRTOKEN WHITESPACE ANTLRTOKEN special_string
>>
>> The whole grammar file is as follows: (This grammer simply parses any word
>> followed by the keyword ANTLR)
>>
>> -----------------------------------------------------------------------------
>> grammar sample_parser;
>>
>> options
>> {
>>     language=C;
>> }
>>
>> requestline : ANTLRTOKEN WHITESPACE special_string ;
>> special_string: (CHAR | '=' | '.' | '-' | '@' )+ ;
>>
>> WHITESPACE  : ( '\t' | ' ' | '\u000C' )+;
>> NEWLINE: ('\r')? '\n';
>> CHAR: (('a'..'z')|('A'..'Z'));
>> ANTLRTOKEN:'ANTLR';
>>
>> -----------------------------------------------------------------------------
>>
>> If i provide the input as
>> ANTLR WORKS
>>
>> Then everything works  fine and I don't get any error.
>>
>> Now if I provide the input as
>> ANTLR ANTLRWORKS
>> Then I get the error as
>>
>> ----------------------------------------------------------------------------
>> input(1)  : error 5 : Unexpected token, at offset 5
>>     near [Index: 2 (Start: 24666934-Stop: 24666938) ='ANTLR', type<4>
>> Line: 1 LinePos:5]
>>      : missing elements...
>>
>> ----------------------------------------------------------------------------
>>
>> Seems that the lexer treats the sub-string ANTLR in ANTLRWORKS as the
>> token ANTLRTOKEN and passes it to parser which is not expecting that token.
>>
>> Is there a way to tell antlr to not to break the input word ANTLRWORKS
>> into token and treat the whole word as special_string ?
>>
>> Can somebody help me getting around this issue ?
>>
>> Thanks in advance.
>>
>> Also, I am using the following version of the library, etc.
>> java version "1.6.0_14"
>> ANTLR version 3.1.3
>> Target language : C
>> C runtime library version: 3.1.3
>> gcc compiler: 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
>>
>>
>> Regards,
>> Achint
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090812/0f2aa0bf/attachment.html 


More information about the antlr-interest mailing list