[antlr-interest] Lexing problem I cannot resolve

Raphael Reitzig r_reitzi at cs.uni-kl.de
Sun Aug 3 04:40:43 PDT 2008


I just build something that would at least be eaten by ANTLR ;)

***

grammar test;

options {
	output=AST;
}

tokens {
   ELLIPSIS;
   RANGE;
   FLOAT;
   INTEGER;
}

numericalconstruct :
   a=INT    THREE_DOTS     -> ^(ELLIPSIS $a)
| a=INT    TWO_DOTS b=INT -> ^(RANGE $a $b)
| (a=INT)? ONE_DOT b=INT  -> ^(FLOAT $a $b)
| a=INT                   -> ^(INTEGER $a);

INT : ('0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9')+;
THREE_DOTS : '...';
TWO_DOTS : '..';
ONE_DOT : '.';

***

As mentioned in my previous mail, it would be nice to merge the two  
children in the float alternative. I don't know how.

Regards

Raphael

"Raphael Reitzig" <r_reitzi at cs.uni-kl.de> wrote (Sun Aug  3 13:21:07 2008):

> Hi again!
>
> You are probably right and may consider Gavin's response.
>
> But do I understand correctly that in your language '..5' is a valid  
>  range? What range is that? I only had 'INT..INT' in mind and would   
> create a single token if it.
>
> Consider the following:
>
> INT : (0|1|2|3|4|5|6|7|8|9)+;
> THREE_DOTS : '...';
> TWO_DOTS : '..';
> ONE_DOT : '.';
>
> numerical construct :
>   a=INT  THREE_DOTS     -> ^(ELLIPSIS $a)
> | a=INT  TWO_DOTS b=INT -> ^(RANGE $a $b
> | a=INT? ONE_DOT b=INT  -> ^(FLOAT ($a + $b))
> | a=INT                 -> ^(INTEGER $a);
>
> I think that may work; more experienced list members will have a  
> say,  I suppose. In particular, I am not sure about the float  
> rewrite rule.  You can put two integers as children and deal with  
> conversion to float  in your target language if it fails.
>
> Regards
>
> Raphael
>
> "Carter Cheng" <carter_cheng at yahoo.com> wrote (Sun Aug  3 13:01:37 2008):
>
>> Thanks for the reply. I think that will only disambiguate between   
>> the .2 and .. cases and not the example I am describing in this case.
>>
>> The problem is the entry point into the FSA would be the leading   
>> digit and therefore the range rule will not be considered at all.   
>> The only thing I can think of but I am not sure how to state it in   
>> ANTLR is to use the syntax predicates and do something as follows.
>>
>> digit+ '...'=> (return an int) /* int followed by ellipsis */
>> digit+ '..' => (return an int) /* int followed by range */
>> digit+ '.' => (possible float value) /* float or error */
>>
>> Or is this wrong?
>>
>> Regards,
>>
>> Carter.
>>
>>
>> --- On Sun, 8/3/08, Raphael Reitzig <r_reitzi at cs.uni-kl.de> wrote:
>>
>>> From: Raphael Reitzig <r_reitzi at cs.uni-kl.de>
>>> Subject: Re: [antlr-interest] Lexing problem I cannot resolve
>>> To: antlr-interest at antlr.org
>>> Date: Sunday, August 3, 2008, 3:41 AM
>>> Hi Carter!
>>>
>>> Moving range rule above float rule should do the job. ANTLR
>>> chooses
>>> the first matching rule it discovers, testing bottom down.
>>>
>>> Regards
>>>
>>> Raphael
>>>
>>> "Carter Cheng" <carter_cheng at yahoo.com>
>>> wrote (Sun Aug  3 12:16:38 2008):
>>>
>>>> Hi,
>>>>
>>>> Yet another beginner's question. I have been
>>> having difficulties
>>>> with a lexing ambiguity and I am curious how one would
>>> go about
>>>> resolving it with ANTLR. The problem I am having is
>>> follows. I have
>>>> a grammar with a standard C like INT FLOAT lexing
>>> rules but I also
>>>> have the ellipsis ... and range .. tokens in the
>>> grammar. The
>>>> difficulty I am having is with this instance string:
>>>>
>>>> 1..2
>>>>
>>>> Which the lexer seems to like to lex as two FLOATS as
>>> oppose to as
>>>> INT RANGE INT. In the language in question FLOAT FLOAT
>>> is illegal
>>>> but obviously the lexer cannot know that. Is there a
>>> way to resolve
>>>> this in ANTLR cleanly?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Carter.
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> ----------------------------------------------------------------
>>> This message was sent using IMP, the Internet Messaging
>>> Program.
>>
>>
>>
>>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: PGP Digital Signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20080803/7c9da0bc/attachment-0001.bin 


More information about the antlr-interest mailing list