[antlr-interest] Extracting a string whose value clashes with token value

Thu Aug 13 20:23:11 PDT 2009

It seems that I missed one scenario.
Now if I give the input as ANTLR ANTLR then I get an error.

It seems that the second token is a super-set of the first token and this
case both the words are treated as ANTLRTOKEN . A similar situation is
described in the section "Ambiguities and Non determinisms" section of book
"The definitive ANTLR guide", that talks about the ambiguities in lexer
rules, but I am not sure how to resolve this.

How can this be done without making the first rule also as SPECIAL_STRING ?

In general, does any section of the book define any guidelines as to what
rules should be lexer rules and what should be parser rules ?

Thanks!

Regards,
Achint

On Wed, Aug 12, 2009 at 11:07 PM, Achint Mehta <achintmehta at gmail.com>wrote:

> Hi Benoit,
>
> The solution suggested by you solved the problem.
>
> Many thanks.
>
> Regards,
> Achint
>
>
> On Wed, Aug 12, 2009 at 2:10 AM, Benoit Fouletier <benblo+ANTLR at gmail.com<benblo%2BANTLR at gmail.com>
> > wrote:
>
>> I think special_string should be a lexer rule, not a parser rule: rename
>> it to SPECIAL_STRING. Also, the lexer depends on the order with which you
>> define tokens, so make sure you put ANTLRTOKEN  above SPECIAL_STRING.
>>
>>
>> On Wed, Aug 12, 2009 at 7:34 AM, Achint Mehta <achintmehta at gmail.com>wrote:
>>
>>>
>>> Hi,
>>>
>>> I am stuck at a seemingly trivial problem.
>>> I have written a simplified sample grammar which has this issue.
>>>
>>> In the grammar I have a rule to extract a generic string
>>> special_string: (CHAR | '=' | '.' | '-' | '@' )+ ;
>>>
>>> and a token ANTLR which is defined as:
>>> ANTLRTOKEN:'ANTLR';
>>>
>>> A rule which parses two words ( the first of which has to be ANTLR is
>>> defined as follows):
>>> requestline : ANTLRTOKEN WHITESPACE special_string ;
>>>
>>> It seems that if the input word begins with keyword "ANTLR", then that
>>> word is treated as it is beginning with ANTLRTOKEN and passed to the parser.
>>> i.e. an input text "ANTLR ANTLRWORKS",  loosely seems to be treated as
>>> sequence of
>>> ANTLRTOKEN WHITESPACE ANTLRTOKEN special_string
>>>
>>> The whole grammar file is as follows: (This grammer simply parses any
>>> word followed by the keyword ANTLR)
>>>
>>> -----------------------------------------------------------------------------
>>> grammar sample_parser;
>>>
>>> options
>>> {
>>>     language=C;
>>> }
>>>
>>> requestline : ANTLRTOKEN WHITESPACE special_string ;
>>> special_string: (CHAR | '=' | '.' | '-' | '@' )+ ;
>>>
>>> WHITESPACE  : ( '\t' | ' ' | '\u000C' )+;
>>> NEWLINE: ('\r')? '\n';
>>> CHAR: (('a'..'z')|('A'..'Z'));
>>> ANTLRTOKEN:'ANTLR';
>>>
>>> -----------------------------------------------------------------------------
>>>
>>> If i provide the input as
>>> ANTLR WORKS
>>>
>>> Then everything works  fine and I don't get any error.
>>>
>>> Now if I provide the input as
>>> ANTLR ANTLRWORKS
>>> Then I get the error as
>>>
>>> ----------------------------------------------------------------------------
>>> input(1)  : error 5 : Unexpected token, at offset 5
>>>     near [Index: 2 (Start: 24666934-Stop: 24666938) ='ANTLR', type<4>
>>> Line: 1 LinePos:5]
>>>      : missing elements...
>>>
>>> ----------------------------------------------------------------------------
>>>
>>> Seems that the lexer treats the sub-string ANTLR in ANTLRWORKS as the
>>> token ANTLRTOKEN and passes it to parser which is not expecting that token.
>>>
>>> Is there a way to tell antlr to not to break the input word ANTLRWORKS
>>> into token and treat the whole word as special_string ?
>>>
>>> Can somebody help me getting around this issue ?
>>>
>>> Thanks in advance.
>>>
>>> Also, I am using the following version of the library, etc.
>>> java version "1.6.0_14"
>>> ANTLR version 3.1.3
>>> Target language : C
>>> C runtime library version: 3.1.3
>>> gcc compiler: 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
>>>
>>>
>>> Regards,
>>> Achint
>>>
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe:
>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090813/f75fb61e/attachment.html