[antlr-interest] Extracting a string whose value clashes with token value

Tue Aug 11 23:10:37 PDT 2009

I think special_string should be a lexer rule, not a parser rule: rename it
to SPECIAL_STRING.Also, the lexer depends on the order with which you define
tokens, so make sure you put ANTLRTOKEN  above SPECIAL_STRING.

On Wed, Aug 12, 2009 at 7:34 AM, Achint Mehta <achintmehta at gmail.com> wrote:

>
> Hi,
>
> I am stuck at a seemingly trivial problem.
> I have written a simplified sample grammar which has this issue.
>
> In the grammar I have a rule to extract a generic string
> special_string: (CHAR | '=' | '.' | '-' | '@' )+ ;
>
> and a token ANTLR which is defined as:
> ANTLRTOKEN:'ANTLR';
>
> A rule which parses two words ( the first of which has to be ANTLR is
> defined as follows):
> requestline : ANTLRTOKEN WHITESPACE special_string ;
>
> It seems that if the input word begins with keyword "ANTLR", then that word
> is treated as it is beginning with ANTLRTOKEN and passed to the parser.
> i.e. an input text "ANTLR ANTLRWORKS",  loosely seems to be treated as
> sequence of
> ANTLRTOKEN WHITESPACE ANTLRTOKEN special_string
>
> The whole grammar file is as follows: (This grammer simply parses any word
> followed by the keyword ANTLR)
>
> -----------------------------------------------------------------------------
> grammar sample_parser;
>
> options
> {
>     language=C;
> }
>
> requestline : ANTLRTOKEN WHITESPACE special_string ;
> special_string: (CHAR | '=' | '.' | '-' | '@' )+ ;
>
> WHITESPACE  : ( '\t' | ' ' | '\u000C' )+;
> NEWLINE: ('\r')? '\n';
> CHAR: (('a'..'z')|('A'..'Z'));
> ANTLRTOKEN:'ANTLR';
>
> -----------------------------------------------------------------------------
>
> If i provide the input as
> ANTLR WORKS
>
> Then everything works  fine and I don't get any error.
>
> Now if I provide the input as
> ANTLR ANTLRWORKS
> Then I get the error as
>
> ----------------------------------------------------------------------------
> input(1)  : error 5 : Unexpected token, at offset 5
>     near [Index: 2 (Start: 24666934-Stop: 24666938) ='ANTLR', type<4> Line:
> 1 LinePos:5]
>      : missing elements...
>
> ----------------------------------------------------------------------------
>
> Seems that the lexer treats the sub-string ANTLR in ANTLRWORKS as the token
> ANTLRTOKEN and passes it to parser which is not expecting that token.
>
> Is there a way to tell antlr to not to break the input word ANTLRWORKS into
> token and treat the whole word as special_string ?
>
> Can somebody help me getting around this issue ?
>
> Thanks in advance.
>
> Also, I am using the following version of the library, etc.
> java version "1.6.0_14"
> ANTLR version 3.1.3
> Target language : C
> C runtime library version: 3.1.3
> gcc compiler: 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
>
>
> Regards,
> Achint
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090812/3fde1c6f/attachment.html