[antlr-interest] [SPAM] ANTLR PHP target / runtime status

Kenneth Domino kenneth.domino at domemtech.com
Thu Sep 8 14:16:19 PDT 2011


Hi All,

I updated the PHP target to work with Antlr 3.4/PHP 5.3.  This code is 
available at http://domemtech.com/code/antlrphpruntime.zip for the next 
month or so, until it—hopefully—finds a permanent location. I plan on making 
more changes when I start rewriting the runtime tests for the target and 
figuring out what in the world is going on with this target.

NOTE: Someone needs to take control of this mess, delete the many forked 
copies of this target, and put this in one official location.  The 
development of this target is absolutely atrocious.  This code is not in one 
repository, but at least four.  I really do not understand why people cannot 
make private repositories on their machines instead of proliferating 
multiple public repositories.  It is not easy figuring out who made what 
change when, why, and are those changes useful.  There may be more forked 
copies of the PHP runtime out in the wild, but who knows.

For better or worse, I chose code base #3 listed below for development, and 
made a copy of that onto my machine.  The reason I chose that code base was 
because the author sent a cogent email explaining his changes, and because 
it was changed more recently than any of the other code bases.


WHERE IS ANTLR PHP LOCATED?

Here are the four different repositories:

(1) http://antlrphpruntime.googlecode.com 
(http://code.google.com/p/antlrphpruntime/ ) – SVN.

This code is officially anointed in the Antlr targets web page 
http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets as “the one 
and only PHP target”.  It isn’t clear what Antlr version or PHP version this 
code targets.

This code was last changed on June 19, 2010 (code) by Eugeny Yakimovitch. 
Several other unimportant changes were made more recently (e.g., June 26, 
2011).


(2) https://github.com/rollxx/antlr-php-runtime – GIT

This code was probably forked from (1), but since there are no embedded 
version ids in the source code, I can’t tell you what was done.

The code was last changed March 21, 2010 by rollex.  At the top of the page, 
the author says:
“This version in not maintained. Please visit the main project page listed 
below for the current version “, and gives a link to (1).

Unfortunately, it’s hard to say whether the changes were successfully merged 
back into (1), but there are check ins in late March by rollex to (1).


(3) https://github.com/beberlei/antlr-php-runtime – GIT.

Benjamin Eberlei noted in an email to the “Antlr Interest” and “Antlr dev” 
lists 
(http://markmail.org/message/zbdc2ni3mfjioens#query:+page:1+mid:zbdc2ni3mfjioens+state:results 
http://www.antlr.org/pipermail/antlr-interest/2010-September/039653.html 
http://markmail.org/message/v7wq2a6wvsjlwl4n  ) that development of the 
source code in (1) was halted since Feb 2010. Eberlei modified this code to 
fix several bugs and improve on the quality of the code and checked it in.

This repository was forked from (2) (unclear when), and last modified in 
September 2010 by beberbei.


(4) http://code.google.com/p/phpandallthat/ – SVN.

Eugeny.Yakimovitch, who is on the list of developers for (1), has an unknown 
fork of (1) that is yet another implementation of the PHP runtime.

The latest changes to that source code was in September 10, 2010. Great!


NOTE: As far as I know, there is no PHP target listed in the Fisheye view of 
the Antlr repository (linked via http://antlr.org).


DISCUSSIONS ON THE PHP TARGET:

* Aug 30, 2011 http://markmail.org/message/73fo5jg5a36qhv5p
* May 30, 2011 
http://www.antlr.org/pipermail/antlr-interest/2011-May/041725.html
* Sep 6/8, 2010 http://markmail.org/message/zbdc2ni3mfjioens 
http://markmail.org/message/v7wq2a6wvsjlwl4n
* May 6, 2010 http://www.antlr.org/pipermail/antlr-dev/2009-May/002292.html
* Oct 9, 2009 http://markmail.org/message/ewmppl7u4b3jnwgh


WHAT CHANGES DID I MAKE TO ANTLR PHP?

Most of my changes are in Php.stg, to move it forward to Antlr 3.4, and to 
handle lexers with semantic rules, like this grammar:


lexer grammar BigParLexer;


options {
   backtrack = true;
   filter = true;
}


@members{
   int open = 0;
}


P
@init{open = 1;}
   :
   '/*'
   (
      {open > 0}?=> // keep reapeating `( ... )*` as long as open > 0
         ( ( { !((input.LA(1) == '/' && input.LA(2) == '*') || (input.LA(1) 
== '*' && input.LA(2) == '/')) }?=> . ) // match anything other than 
delimiters.
         | '/*' {open++;}
         | '*/' {open--;}
      )
   )*
   ;

The lexer for this grammar accepts input like ‘/* hi /* there */ */’ as one 
token.  NB: this grammar doesn’t work exactly as written for the PHP target, 
as I explain below.

* Rolled changes from Java.stg, Revision ID: 8204, into Php.stg. The link to 
the code for Java.stg used in the modification of Php.stg is: 
https://fisheye2.atlassian.com/browse/antlr/tool/src/main/resources/org/antlr/codegen/templates/Java/Java.stg
* Fixed problems with backtracking.
* Fixed missing $input declaration for semantic predicates.
* Fixed missing ‘$’ for ‘alt...’ state variables in DFA generated code.
* Added a makefile to constuct antlr.jar.  I could not find any “build.xml” 
file anywhere.  And, I cannot stand Ant.


WHAT DOES NOT WORK?

Not all the tests in .../runtime/Php/test/Antlr/Tests work.  Many of these 
are terrible test cases, some of which cause the Antlr tool to output 
warnings, and others that crash the tool altogether.

I don't know the status of AST construction, tree parsing, etc.  There is 
code for tree construction, but I haven't tested it.


WHAT DON'T I LIKE ABOUT THE PHP TARGET?

* PHP does not automatically convert an integer into a string and vice versa 
for tests; variables must be preceded with “$”; and “?>” ends PHP code even 
in a comment.
Input streams in Antlr are composed of integers, not characters. 
“input->LA()” returns a number.  When you want to test the lookahead in a 
semantic predicate, you must convert the character you are testing into a 
number, or convert LA() into a string.  So, in the above grammar 
BigParLexer, “input.LA(1) == ‘/’” won’t work—and PHP won’t complain!  It 
must be converted to a target-specific syntax, e.g., “\$input->LA(1) == 47”.

* In the wisdom of the developers of PHP, “?>” ends the PHP code section 
even if it is on a comment line.

e.g., “// you are screwed ?> boo hoo.”

Consequently, some of the templates in Php.stg are missing code to generate 
descriptions in comments.  If the grammar contains “?>”, as in some of the 
test cases, PHP will barf on the generated code.  There must be a way to 
convert the description into a PHP safe format, but I don’t know what that 
would be.

* THERE IS NO DOCUMENTATION!


WHAT DO I LIKE ABOUT THE PHP TARGET?

PHP does not have a “64K byte code per method limit” as in Java.  When 
writing a lexer grammar with semantic predicates, it seems extremely easy to 
generate Java code that will not compile (e.g., BitParLexer.g but with 
delimiters with more characters, e.g., “<script> .... </script>”.  But, PHP 
works!


Ken Domino



More information about the antlr-interest mailing list