[antlr-interest] Best way to handle a large number of language constants?

Mon Mar 14 08:59:34 PDT 2011

Hi All,

I am working on a proprietary language of ours that is reminiscent of 
BASIC in some ways, but has morphed over the years into its own 
monstrosity. This language is primarily used to command our hardware 
devices. Our system has a large number of "parameters" (762 to be 
precise) that define the complex configuration of the hardware. This 
configuration mainly lives in a file that is read and sent down to the 
hardware (where it is stored as a simple array of values), but there is 
also the desire to edit these parameters programmatically at runtime. 
Each parameter has a name, and a numeric value. The desire is for each 
parameter to be read/written through simple assignment statements. For 
example, "AxisType.X = 0" assigns the value 0 to the AxisType parameter 
on the X axis. This is currently implemented in a seemingly terrible 
way, and I am looking for the best way to improve it.

The current implementation involves providing a #include file that 
#defines each parameter as an array with a hard-coded index. This 
include file is handled by the pre-processor so that the syntax in 
question only has to handle the hard-coded array. The pre-processor is 
not too terribly inefficient, but the problem is that we have to 
distribute this enormous include file, and the users must remember to 
include it.

I can imagine a couple of other ways to implement this, but I am not 
sure what way would be the most efficient. One way would be to add every 
parameter name as a keyword in the lexer. This has the benefit of 
relying on ANTLR to do all of the lexing for me, so that I don't have to 
parse any strings later in my own code. The problem is that this 
requires a lot of custom code in the grammar file (each token must have 
a well defined numeric index associated with it, to match the index used 
internally in the arrays). Additionally, I don't know how well ANTLR 
will handle having so many hundreds of additional tokens in the 
language. The good thing is that I could auto-generate the grammar from 
our definition of the parameters (in XML format).

Alternatively, I could add a very generic rule to the lexer that would 
match any potentially valid parameter name, and wait until the semantic 
actions to validate this as an actual parameter or a syntax error. While 
this allows for a much simpler grammar on the ANTLR end, what I don't 
like about this that I then have to write a bunch of C code that 
essentially parses the string again.

So I am looking for some advice on the best way to approach this 
problem. If anyone has done something similar before, I would appreciate 
any suggestions that you have for me.

Much thanks,

Justin Murray
jmurray at aerotech.com