[antlr-interest] AST generation: EXPRESSION TREE example.

Wed Jun 2 13:31:47 PDT 2004

You are trying to create heterogeneous nodes.  Look in the heteroAST 
example in the antlr distribution.

Monty

On Jun 2, 2004, at 1:29 PM, Bharath Sundararaman wrote:

> Hi all,
>
> I looked at the documentation for AST 
> (http://www.antlr.org/doc/trees.html)
> and I tried the EXPRESSION TREE example provided in the documentation. 
> The
> grammar compiles without any errors but when I run the main class, I 
> get an
> error that says: "Invalid class or can't make instance, PLUSNode". I 
> get the
> same for MULTNode and INTNode. Am I missing something here??
>
> Ter :- The tutorial was very useful, thanks!
>
> Thanks,
>
> Bharath.
>
> -----Original Message-----
> From: Monty Zukowski [mailto:monty at codetransform.com]
> Sent: Monday, May 24, 2004 10:20 AM
> To: antlr-interest at yahoogroups.com
> Cc: Monty Zukowski
> Subject: Re: [antlr-interest] Whitespace problem. (keywords Vs 
> identifiers)
>
>
>
> On May 21, 2004, at 9:03 AM, Bharath Sundararaman wrote:
>
>> Hi Monty,
>>
>> Here's my rule:
>>
>> IDMEAT:i:IDENT {
>>         if ( i.getText().equals("t") | i.getText().equals("T") |
>> i.getText().equals("time")) {
>>                     $setType(TIME_PREFIX);
>>        }
>>         else if (i.getText().equals("e") | i.getText().equals("E")) {
>>       		  $setType(Exponent_prefix);
>>        }
>>        else {
>>
>> 			$setType(i.getType());
>>        }
>>       };
>>
>
> IDENT will have set the type of the token, so your test could be
> if(i.getType()==T | i.getType()==TIME etc.)
>
> You also aren't testing for # and a number, so you will get TIME_PREFIX
> for a variable named 't' no matter what follows.
>
> E9 is a valid identifier, I assume.  That one should probably be
> handled in IDENT
>
> IDENT:
> (('e'|'E') (INT | PLUS | MINUS))=> ('e'|'E')
> {$setType(Exponent_prefix);}
> | ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9')
>
>
>
>> Problem: My time rule is (in the parser) --
>> time: TIME_PREFIX HASH Int; and it takes values like "t#9" or "T#9".
>> Note
>> that there's no space between 't' and '#' and that's what I want.
>> However,
>> for Exponent_prefix, it doesn't work.
>>
>> exponent: Exponent_prefix (PLUS|MINUS)? Int; allows "E 9" or "E+9" but
>> it
>> doesn't allow "E9". I tried to ignore WHITEPACE in IDMEAT rule but
>> that cant
>> be the problem because TIME_PREFIX works fine.
>>
>> Any ideas?
>>
>> B.
>>
>> -----Original Message-----
>> From: Monty Zukowski [mailto:monty at codetransform.com]
>> Sent: Thursday, May 20, 2004 12:05 PM
>> To: antlr-interest at yahoogroups.com
>> Cc: Monty Zukowski
>> Subject: Re: [antlr-interest] Keywords Vs Identifiers.
>>
>>
>> I'm sorry, I was in a hurry.  Inspect the generated code, you will see
>> in the ID rule where antlr tests the token text against the literals
>> table and assigns the token type.  To use it in a rule you may need a
>> semantic predicate, this is a little tricky because you need to use
>> the predicate to choose an alternative--hmmm, maybe you could get by
>> with calling the lexer rule directly in your action code.  Yes, in
>> your action where you see the TIME id, call the WS rule and then the
>> INT rule.  If either fail that's ok, it was not the TIME keyword, is
>> was an ID, so change the type back.  Then call your s,m,ms rule.  The
>> text will still be appended to the token buffer and make it through to
>> the parser.  Try it out and ask when you hit a problem.  I wish I had
>> another 15 minutes to explain fully...
>>
>> Monty
>>
>> On May 20, 2004, at 6:30 AM, Bharath S wrote:
>>
>>> Hi Monty,
>>>
>>> I am unclear about the ID token here. Let's say that lexer sees "abc"
>>> which is a token of type ID. Please correct me if my understanding is
>>> not right.
>>>
>>> 1. if (i.getType( )) statement, is used to test against literals. So,
>>> if ID was "INT" instead of "abc", it would return LITERAL_INT and it
>>> would skip
>>> that token. Otherwise, it sets "abc"'s type as ID. Though ID by 
>>> itself
>>> has
>>> {testliterals} options set, IDMEAT rule would allow me to have both 
>>> ID
>>> and
>>> (TIME : "TIME" Integer;) rule to co-exist in the lexer.
>>>
>>> 2. This is a better solution because if I had 's', 'm', 'ms' etc to
>>> denote seconds, minutes and milliseconds, I have to write a separate
>>> rule for each
>>> one of them  in the parser (if i follow my solution) to prevent
>>> conflict
>>> with the ID rule. Doing it via IDMEAT will solve the issue and make
>>> life
>>> easier.
>>>
>>> Thanks for your comments and clarifications!
>>>
>>> Bharath.
>>> ----- Original Message -----
>>> From: "Monty Zukowski" <monty at codetransform.com>
>>> To: <antlr-interest at yahoogroups.com>
>>> Cc: "Monty Zukowski" <monty at codetransform.com>
>>> Sent: Wednesday, May 19, 2004 5:13 PM
>>> Subject: Re: [antlr-interest] Keywords Vs Identifiers.
>>>
>>>
>>>> If you want to handle that in the lexer you need to do it by calling
>>>> the rule that tests the literals table, here's an example from the C
>>>> grammar:
>>>>
>>>> IDMEAT
>>>>          :
>>>>                  i:ID                {
>>>>
>>>>                                          if ( i.getType() ==
>>>> LITERAL___extension__ ) {
>>>>
>>>> $setType(Token.SKIP);
>>>>                                          }
>>>>                                          else {
>>>>
>>>> $setType(i.getType());
>>>>                                          }
>>>>
>>>>                                      }
>>>>          ;
>>>>
>>>> protected ID
>>>>          options
>>>>                  {
>>>>                  testLiterals = true;
>>>>                  }
>>>>          :       ( 'a'..'z' | 'A'..'Z' | '_' | '$')
>>>>                  ( 'a'..'z' | 'A'..'Z' | '_' | '$' | '0'..'9' )*
>>>>          ;
>>>>
>>>> It's actually tricky to figure out how to lex the following
>>>> whitespace and integer without using a syntactic predicate, but a
>>>> syn pred here will be a performance problem.  I would actually
>>>> recommend using a parser filter see
>>>> http://www.codetransform.com/filterexample.html
>>>>
>>>> By the way your parser solution works just fine too, is probably the
>>>> easiest.
>>>>
>>>> Monty
>>>>
>>>> On May 19, 2004, at 2:55 PM, Bharath wrote:
>>>>
>>>>> Hi Monty,
>>>>>
>>>>> I did. I figured a way out too but I am not sure if it's an
>>>>> efficient solution. I set a rule in the parser which accepts an
>>>>> identifier and I extracted the identifier input into a string. If
>>>>> the string is not "TIME", I throw an exception, otherwise I accept
>>>>> it. (using getText() method).
>>>>>
>>>>> Please let me know if this is bad practice.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Bharath.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Monty Zukowski [mailto:monty at codetransform.com]
>>>>> Sent: Wednesday, May 19, 2004 4:41 PM
>>>>> To: antlr-interest at yahoogroups.com
>>>>> Cc: Monty Zukowski
>>>>> Subject: Re: [antlr-interest] Keywords Vs Identifiers.
>>>>>
>>>>> See the documentation about "literals"
>>>>>
>>>>> Monty
>>>>>
>>>>> On May 19, 2004, at 8:25 AM, Bharath S wrote:
>>>>>
>>>>>> Hi Antlers,
>>>>>>
>>>>>> I have some rules in my grammar, for time literals which require
>>>>>> that 'TIME'
>>>>>> or "time" be appended to the front of the rule. For eg., time can
>>>>>> represented as TIME 99secs. The problem is, "TIME" is not a 
>>>>>> keyword
>>>>>> and so I
>>>>>> cant have it in the parser. If I throw it in the lexer, it causes 
>>>>>> a
>>>>>> clash
>>>>>> with IDENTIFIER rule, because the lexer sees the rule as
>>>>>>
>>>>>> TIME: 'T' 'I' 'M' 'E' (Integer) ; and
>>>>>> IDENTIFIER: ('a'..'z'|'A'..'Z')+;
>>>>>>
>>>>>> as expected. Is there a common workaround for this?
>>>>>>
>>>>>> I can solve this problem by moving a whole bunch of rules in the
>>>>>> parser back to the lexer, just to make the TIME rule protected.
>>>>>> But it doesnt make sense, at all.
>>>>>>
>>>>>> Any comments are most welcome.
>>>>>>
>>>>>> Bharath.
>>>>> Monty Zukowski
>>>>>
>>>>> ANTLR & Java Consultant -- http://www.codetransform.com ANSI C/GCC
>>>>> transformation toolkit -- http://www.codetransform.com/gcc.html
>>>>> Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Yahoo! Groups Links
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Yahoo! Groups Links
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Monty Zukowski
>>>>
>>>> ANTLR & Java Consultant -- http://www.codetransform.com ANSI C/GCC
>>>> transformation toolkit -- http://www.codetransform.com/gcc.html
>>>> Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html
>>>>
>>>>
>>>>
>>>>
>>>> Yahoo! Groups Links
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> Yahoo! Groups Links
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>> Monty Zukowski
>>
>> ANTLR & Java Consultant -- http://www.codetransform.com
>> ANSI C/GCC transformation toolkit --
>> http://www.codetransform.com/gcc.html
>> Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html
>>
>>
>>
>>
>> Yahoo! Groups Links
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Yahoo! Groups Links
>>
>>
>>
>>
>>
>>
>>
>>
> Monty Zukowski
>
> ANTLR & Java Consultant -- http://www.codetransform.com
> ANSI C/GCC transformation toolkit --
> http://www.codetransform.com/gcc.html
> Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>
>
Monty Zukowski

ANTLR & Java Consultant -- http://www.codetransform.com
ANSI C/GCC transformation toolkit -- 
http://www.codetransform.com/gcc.html
Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/