[antlr-interest] suitability of Antlr for generating a PHP expression evaluator

Thomas White, MD, MS, MA tw176 at columbia.edu
Mon May 30 22:34:07 PDT 2011


Hi-

I emailed the following to  Terence Parr, and he suggested I ask the
antlr-interest group. Any help the group can provide would be  much
appreciated.


Can you help me determine whether Antlr is the right tool for the following
situation?

I built a Java-based survey tool in 2000, but can no longer support it, so
I'm trying to port the key functionality of to LimeSurvey, which is written
in PHP.  My system has been supporting NIH-funded research for my
epidemiologist colleagues - so these are really semi-structured diagnostic
interviews that tend to range from 300-3500 questions.  They are highly
branched surveys, have internal scoring and complex branching logic (such as
only asking follow-up questions if the person has a diagnosis of depression,
meaning they acknowledged, via prior questions, 5 of 12 criteria for > 2
weeks without other signs of non-psychiatric causes of the depressive
behavior).  The surveys are multi-lingual, and generate tailored reports for
the patients and clinicians.

I used JavaCC to build the expression parser I needed for my tool, and I'm
totally stuck finding a well maintained compiler-compiler that will output
PHP.  Ideally, I'd like to write BNF or related notation and generate both
PHP (for server-side calculations), and JavaScript (for client-side - e.g.
to automatically update scale-scores on the form, and dynamically hide/show
questions based upon relevance critieria).  I've written expression
evaluators and  moderately sized languages in YACC, Bison/Flex, and
JavaCC/JTree - so I'd much rather take that approach than having to
hand-write a language parser and execution  engine (had to do that 20 years
ago, in C, to convert a C-like language to pseudo-assembly and run it on a
stack-based, byte-code interpreter - I'd like to avoid going through that
pain again :-))

Since Antlr is designed to output to other languages, it seems a natural fit
for my need, but neither the JavaScript nor PHP output targets seem
complete.  On the other hand, I only need limited compiler-compiler
functionality, so perhaps the JavaScript and  PHP targets are good enough,
but I can't assess that myself.

The functionality I'm looking for here is very basic:
(1) All basic math operators and functions
(2) Ability to call other functions from a "white list" of supported
functions (which would be included from an external source).  At present, I
categorize the functions by whether they take 0, 1, 2, 3, or unlimited
arguments, so I can enforce a small degree of  syntax checking.
(3) Ability to handle numbers, strings, and dates separately (e.g. so can
throw syntax exceptions if the a given math operator is not appropriate for
a data type)
(4) Only allowable syntax is supported (so can avoid calls to arrays,
functions, hashes, macros, etc. - anything that might be unsafe or put the
website at risk of an injection attack)
(5) Only be able to reference known variables (each row of the survey has a
unique variable name - so can only reference those variable names, thus
avoiding accessing global parameters).

So, there seem to be three options:
(1) Find or build an equation parser that supports the functionality, and
can evaluate those equations in PHP (and possibly JavaScript too).
(2) Find or build an equation parser that can validate the syntax, but then
let PHP and JavaScript's eval() functions actually evaluate the string
(after replacing the variable and function names with lookup functions that
retrieve the needed values from the PHP and JavaScript data stores).
(3) Find or build an equation parser that can validate the syntax, but is a
plug-able module or compiled shared  library (e.g. a Windows .dll, or a *nix
.so) which can be called from PHP, and be passed an array of allowable
variable names and function syntax (so that it doesn't have to tightly
couple with PHP).

It seems that Antlr might work for option #1, provided that the PHP and
JavaScript parser and evaluator targets are already stable (and well
maintained) for the functionality I  listed.
For option #2, Antlr might work as long as the parser target is robust for
PHP.
I don't know enough about #3 to know how well Antlr would fit that strategy.

Of course, Antlr may be overkill for the sort of task I'm listing here.

Do you think Antlr is an appropriate tool for this?  If so, which of those
strategies do you recommend?
If not, do you have any alternate recommendations?

Thanks in advance for any advice you can provide.

/Tom


More information about the antlr-interest mailing list