[antlr-interest] Best way to handle a large number of language constants?
Justin Murray
jmurray at aerotech.com
Mon Mar 14 08:59:34 PDT 2011
Hi All,
I am working on a proprietary language of ours that is reminiscent of
BASIC in some ways, but has morphed over the years into its own
monstrosity. This language is primarily used to command our hardware
devices. Our system has a large number of "parameters" (762 to be
precise) that define the complex configuration of the hardware. This
configuration mainly lives in a file that is read and sent down to the
hardware (where it is stored as a simple array of values), but there is
also the desire to edit these parameters programmatically at runtime.
Each parameter has a name, and a numeric value. The desire is for each
parameter to be read/written through simple assignment statements. For
example, "AxisType.X = 0" assigns the value 0 to the AxisType parameter
on the X axis. This is currently implemented in a seemingly terrible
way, and I am looking for the best way to improve it.
The current implementation involves providing a #include file that
#defines each parameter as an array with a hard-coded index. This
include file is handled by the pre-processor so that the syntax in
question only has to handle the hard-coded array. The pre-processor is
not too terribly inefficient, but the problem is that we have to
distribute this enormous include file, and the users must remember to
include it.
I can imagine a couple of other ways to implement this, but I am not
sure what way would be the most efficient. One way would be to add every
parameter name as a keyword in the lexer. This has the benefit of
relying on ANTLR to do all of the lexing for me, so that I don't have to
parse any strings later in my own code. The problem is that this
requires a lot of custom code in the grammar file (each token must have
a well defined numeric index associated with it, to match the index used
internally in the arrays). Additionally, I don't know how well ANTLR
will handle having so many hundreds of additional tokens in the
language. The good thing is that I could auto-generate the grammar from
our definition of the parameters (in XML format).
Alternatively, I could add a very generic rule to the lexer that would
match any potentially valid parameter name, and wait until the semantic
actions to validate this as an actual parameter or a syntax error. While
this allows for a much simpler grammar on the ANTLR end, what I don't
like about this that I then have to write a bunch of C code that
essentially parses the string again.
So I am looking for some advice on the best way to approach this
problem. If anyone has done something similar before, I would appreciate
any suggestions that you have for me.
Much thanks,
Justin Murray
jmurray at aerotech.com
More information about the antlr-interest
mailing list