[antlr-interest] Whitespace problem. (keywords Vs identifiers)

Monty Zukowski monty at codetransform.com
Mon May 24 08:20:18 PDT 2004


On May 21, 2004, at 9:03 AM, Bharath Sundararaman wrote:

> Hi Monty,
>
> Here's my rule:
>
> IDMEAT:i:IDENT {
>         if ( i.getText().equals("t") | i.getText().equals("T") |
> i.getText().equals("time")) {
>                     $setType(TIME_PREFIX);
>        }
>         else if (i.getText().equals("e") | i.getText().equals("E")) {
>       		  $setType(Exponent_prefix);
>        }
>        else {
>
> 			$setType(i.getType());
>        }
>       };
>

IDENT will have set the type of the token, so your test could be 
if(i.getType()==T | i.getType()==TIME etc.)

You also aren't testing for # and a number, so you will get TIME_PREFIX 
for a variable named 't' no matter what follows.

E9 is a valid identifier, I assume.  That one should probably be 
handled in IDENT

IDENT:
(('e'|'E') (INT | PLUS | MINUS))=> ('e'|'E') 
{$setType(Exponent_prefix);}
| ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9')



> Problem: My time rule is (in the parser) --
> time: TIME_PREFIX HASH Int; and it takes values like "t#9" or "T#9". 
> Note
> that there's no space between 't' and '#' and that's what I want. 
> However,
> for Exponent_prefix, it doesn't work.
>
> exponent: Exponent_prefix (PLUS|MINUS)? Int; allows "E 9" or "E+9" but 
> it
> doesn't allow "E9". I tried to ignore WHITEPACE in IDMEAT rule but 
> that cant
> be the problem because TIME_PREFIX works fine.
>
> Any ideas?
>
> B.
>
> -----Original Message-----
> From: Monty Zukowski [mailto:monty at codetransform.com]
> Sent: Thursday, May 20, 2004 12:05 PM
> To: antlr-interest at yahoogroups.com
> Cc: Monty Zukowski
> Subject: Re: [antlr-interest] Keywords Vs Identifiers.
>
>
> I'm sorry, I was in a hurry.  Inspect the generated code, you will see
> in the ID rule where antlr tests the token text against the literals
> table and assigns the token type.  To use it in a rule you may need a
> semantic predicate, this is a little tricky because you need to use the
> predicate to choose an alternative--hmmm, maybe you could get by with
> calling the lexer rule directly in your action code.  Yes, in your
> action where you see the TIME id, call the WS rule and then the INT
> rule.  If either fail that's ok, it was not the TIME keyword, is was an
> ID, so change the type back.  Then call your s,m,ms rule.  The text
> will still be appended to the token buffer and make it through to the
> parser.  Try it out and ask when you hit a problem.  I wish I had
> another 15 minutes to explain fully...
>
> Monty
>
> On May 20, 2004, at 6:30 AM, Bharath S wrote:
>
>> Hi Monty,
>>
>> I am unclear about the ID token here. Let's say that lexer sees "abc"
>> which
>> is a token of type ID. Please correct me if my understanding is not
>> right.
>>
>> 1. if (i.getType( )) statement, is used to test against literals. So,
>> if ID
>> was "INT" instead of "abc", it would return LITERAL_INT and it would
>> skip
>> that token. Otherwise, it sets "abc"'s type as ID. Though ID by itself
>> has
>> {testliterals} options set, IDMEAT rule would allow me to have both ID
>> and
>> (TIME : "TIME" Integer;) rule to co-exist in the lexer.
>>
>> 2. This is a better solution because if I had 's', 'm', 'ms' etc to
>> denote
>> seconds, minutes and milliseconds, I have to write a separate rule for
>> each
>> one of them  in the parser (if i follow my solution) to prevent
>> conflict
>> with the ID rule. Doing it via IDMEAT will solve the issue and make
>> life
>> easier.
>>
>> Thanks for your comments and clarifications!
>>
>> Bharath.
>> ----- Original Message -----
>> From: "Monty Zukowski" <monty at codetransform.com>
>> To: <antlr-interest at yahoogroups.com>
>> Cc: "Monty Zukowski" <monty at codetransform.com>
>> Sent: Wednesday, May 19, 2004 5:13 PM
>> Subject: Re: [antlr-interest] Keywords Vs Identifiers.
>>
>>
>>> If you want to handle that in the lexer you need to do it by calling
>>> the rule that tests the literals table, here's an example from the C
>>> grammar:
>>>
>>> IDMEAT
>>>          :
>>>                  i:ID                {
>>>
>>>                                          if ( i.getType() ==
>>> LITERAL___extension__ ) {
>>>                                                  
>>> $setType(Token.SKIP);
>>>                                          }
>>>                                          else {
>>>
>>> $setType(i.getType());
>>>                                          }
>>>
>>>                                      }
>>>          ;
>>>
>>> protected ID
>>>          options
>>>                  {
>>>                  testLiterals = true;
>>>                  }
>>>          :       ( 'a'..'z' | 'A'..'Z' | '_' | '$')
>>>                  ( 'a'..'z' | 'A'..'Z' | '_' | '$' | '0'..'9' )*
>>>          ;
>>>
>>> It's actually tricky to figure out how to lex the following
>>> whitespace and integer without using a syntactic predicate, but a syn
>>> pred here will be a performance problem.  I would actually recommend
>>> using a parser filter see
>>> http://www.codetransform.com/filterexample.html
>>>
>>> By the way your parser solution works just fine too, is probably the
>>> easiest.
>>>
>>> Monty
>>>
>>> On May 19, 2004, at 2:55 PM, Bharath wrote:
>>>
>>>> Hi Monty,
>>>>
>>>> I did. I figured a way out too but I am not sure if it's an
>>>> efficient solution. I set a rule in the parser which accepts an
>>>> identifier and I extracted the identifier input into a string. If
>>>> the string is not "TIME", I
>>>> throw an exception, otherwise I accept it. (using getText() method).
>>>>
>>>> Please let me know if this is bad practice.
>>>>
>>>> Thanks!
>>>>
>>>> Bharath.
>>>>
>>>> -----Original Message-----
>>>> From: Monty Zukowski [mailto:monty at codetransform.com]
>>>> Sent: Wednesday, May 19, 2004 4:41 PM
>>>> To: antlr-interest at yahoogroups.com
>>>> Cc: Monty Zukowski
>>>> Subject: Re: [antlr-interest] Keywords Vs Identifiers.
>>>>
>>>> See the documentation about "literals"
>>>>
>>>> Monty
>>>>
>>>> On May 19, 2004, at 8:25 AM, Bharath S wrote:
>>>>
>>>>> Hi Antlers,
>>>>>
>>>>> I have some rules in my grammar, for time literals which require
>>>>> that
>>>>> 'TIME'
>>>>> or "time" be appended to the front of the rule. For eg., time can
>>>>> represented as TIME 99secs. The problem is, "TIME" is not a keyword
>>>>> and so I
>>>>> cant have it in the parser. If I throw it in the lexer, it causes a
>>>>> clash
>>>>> with IDENTIFIER rule, because the lexer sees the rule as
>>>>>
>>>>> TIME: 'T' 'I' 'M' 'E' (Integer) ; and
>>>>> IDENTIFIER: ('a'..'z'|'A'..'Z')+;
>>>>>
>>>>> as expected. Is there a common workaround for this?
>>>>>
>>>>> I can solve this problem by moving a whole bunch of rules in the
>>>>> parser back to the lexer, just to make the TIME rule protected. But
>>>>> it doesnt make
>>>>> sense, at all.
>>>>>
>>>>> Any comments are most welcome.
>>>>>
>>>>> Bharath.
>>>> Monty Zukowski
>>>>
>>>> ANTLR & Java Consultant -- http://www.codetransform.com ANSI C/GCC
>>>> transformation toolkit -- http://www.codetransform.com/gcc.html
>>>> Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html
>>>>
>>>>
>>>>
>>>>
>>>> Yahoo! Groups Links
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Yahoo! Groups Links
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>> Monty Zukowski
>>>
>>> ANTLR & Java Consultant -- http://www.codetransform.com
>>> ANSI C/GCC transformation toolkit --
>>> http://www.codetransform.com/gcc.html
>>> Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html
>>>
>>>
>>>
>>>
>>> Yahoo! Groups Links
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>> Yahoo! Groups Links
>>
>>
>>
>>
>>
>>
>>
>>
> Monty Zukowski
>
> ANTLR & Java Consultant -- http://www.codetransform.com
> ANSI C/GCC transformation toolkit --
> http://www.codetransform.com/gcc.html
> Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>
>
Monty Zukowski

ANTLR & Java Consultant -- http://www.codetransform.com
ANSI C/GCC transformation toolkit -- 
http://www.codetransform.com/gcc.html
Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list