[antlr-interest] Can I target C and Java from one grammar file?

Fri Jan 23 08:59:15 PST 2009

Thanks for the feedback and different options presented.

I need something very simple and pragmatic right now so I went with a  
preprocessor approach where I use ANTLR comments such as:

//ifdef JAVA
   ... java version of syntax
//elifdef CPP
  ... cpp version of syntax
//endif

At least I have everything in a single file and I can easy compare  
Java and C code side by side even if there is a little duplication.

In case anyone is interested, here is the complete source for the pre- 
processor, written in Ruby.

#!/usr/bin/ruby
# Preprocessor for ANTLR grammar files with multiple language targets
# Written by Andy Grove on 23-Jan-2009

def preprocess(filename, userTarget)
   f = File.open(filename)
   include = true
   currentTarget = "*"
   f.each_line {|line|
     if line[0,7] == '//ifdef'
       currentTarget = line[7,line.length].strip
     elsif line[0,9] == '//elifdef'
       currentTarget = line[9,line.length].strip
     elsif line[0,7] == '//endif'
       currentTarget = "*"
     else
       if currentTarget=="*" || currentTarget==userTarget
         puts line
       end
     end
   }
   f.close
end

begin
   if ARGV.length < 2
       puts "Usage: preprocess filename target"
   else
       preprocess(ARGV[0], ARGV[1])
   end
end

Thanks,

Andy Grove
Chief Architect
CodeFutures Corporation

On Jan 22, 2009, at 11:57 PM, Johannes Luber wrote:

> Jim Idle schrieb:
>>> Johannes Luber wrote:
>>>
>>> I think you misunderstood me. Here is one rule in my grammar:
>>>
>>> collection_initializer
>>>    :   OPEN_BRACE element_initializer_list COMMA? CLOSE_BRACE
>>>    -> ^(OPEN_BRACE element_initializer_list ^(OPTIONAL COMMA?)  
>>> CLOSE_BRACE)
>>>    ;
>>>
>>> A normal parser would maybe need only:
>>>
>>> collection_initializer
>>>    :   OPEN_BRACE element_initializer_list COMMA? CLOSE_BRACE
>>>    -> ^(element_initializer_list)
>>>    ;
>>>
>>> With a preprocessor one could combine them:
>>>
>>> collection_initializer
>>>    :   OPEN_BRACE element_initializer_list COMMA? CLOSE_BRACE
>>>    -> ^(
>>> 	#ifdef ALL_TOKENS
>>> 	OPEN_BRACE
>>> 	#endif
>>>
>>> 	element_initializer_list
>>>
>>> 	#ifdef ALL_TOKENS
>>> 	^(OPTIONAL COMMA?) CLOSE_BRACE
>>> 	#endif
>>> )
>>>    ;
>>>
>>> A bit ugly, but it gets the job done. Maybe you have another idea to
>>> accomplish this goal?
>>>
>> Well, you should do this with runtime configuration (I show a  
>> parameter
>> here but you should use some grammar global config class set  
>> externally):
>>
>> collection_initializer[boolean allTokens]
>>    :   OPEN_BRACE element_initializer_list COMMA? CLOSE_BRACE
>>
>>       -> {allTokens}? ^(OPEN_BRACE element_initializer_list  
>> ^(OPTIONAL
>> COMMA?) CLOSE_BRACE)
>>       -> element_initializer_list
>> ;
>
> While runtime configuration is interesting, the problem remains that
> tree grammars have to treat both rewrites possible. Effectively you  
> are
> duplicating parts of the tree. I've had another idea to make the  
> syntax
> more compact:
>
> #define ALL
>
> collection_initializer
>    :   OPEN_BRACE element_initializer_list COMMA? CLOSE_BRACE
>    -> ^(ALL.OPEN_BRACE element_initializer_list ^(ALL.OPTIONAL
> ALL.COMMA?) ALL.CLOSE_BRACE)
>    ;
>
> Only if ALL is defined rules and tokens marked with "ALL." end up in  
> the
> generated code. The only question is, how one should treat "^()".  
> Maybe
> saying that it is enough that only if the root node is included that
> DOWN and UP are included as well.
>>
>> And you probably don't need that COMMA under a root node ;-)
>
> For my special purpose I do need really all tokens - except non- 
> newline
> whitespace, I think. And using OPTIONAL fixes the general tree  
> structure
> which makes handling the direct sons of the root node easier.
>
> Johannes
>>
>> But the general point is good.
>>
>> Jim
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address