[antlr-interest] Languages within HTML

Stuart Watt SWatt at infobal.com
Thu Jan 31 14:11:34 PST 2008


If PHP required XML, you're right, it should fail - and PHP does say
 
Note: Also note that if you are embedding PHP within XML or XHTML you will
need to use the <?php ?> tags to remain compliant with standards. (
<http://www.php.net/manual/en/language.basic-syntax.php>
http://www.php.net/manual/en/language.basic-syntax.php) 
 
I suppose this is less of an issue when PHP is generating plain text (I've
used it to generate email messages). However, the PHP / XML processing
instruction discussion is both extensive and tense.
 
The issue that surprised me is that PHP cannot be segmented by any simple
form of island grammar, specialised to the start/end tags. I am pretty
certain that ASP and its various clones and specialisations can be segmented
in this way. (The real driver for this is Windows scripting, which allows
multiple server-side languages in one page (although this is now deprecated,
I believe, and probably was never really a good idea!). 
 
It feels like there is a tradeoff here between *always doing the right
thing* (which may never be entirely possible, as the language between <% ...
%> can be almost anything, even including PHP and doing a good job.
 
The impression I get from
http://www.php.net/manual/en/language.basic-syntax.php
<http://www.php.net/manual/en/language.basic-syntax.php>  is that people are
discouraged from doing stuff like <?php echo("?>"); ?>, but still may do so.

 
Incidentally, another even less pleasant version is:
 
<?php
echo <<<EOT
<?xml version="1.0"?>
And this should be PHP
EOT;
?>
and now back to HTML
 
Frankly, I am amazed this works. It shows that to determine the end of a PHP
tag requires a full PHP lex (at least) from the start of the PHP tag,
wherever that happens to be in the text. In practice, PHP seems to do a
parse -- but I have been caught before by PHP's syntax error handling, which
tended to eagerly cause fatal and uncatchable parse errors. 
 
All the best
Stuart

-----Original Message-----
From: Darien Hager [mailto:darien.hager at etelos-inc.com]
Sent: Thursday, January 31, 2008 4:41 PM
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Languages within HTML


On 1/31/08, Stuart Watt < SWatt at infobal.com <mailto:SWatt at infobal.com> >
wrote: 

An intriguing problem. I did not expect this work in PHP, and if the PHP was
intended to be processable as XML it would be invalid, as the markup tags
would cease to be processing instructions. PHP authors are usually
encouraged to do <? echo("?".">"); ?> or similar. I

This processing model implies that PHP may need to be the "root" grammar,
with the HTML elements handed off to other grammars if and when needed.



It's a good question: Is the PHP parsing engine too lenient, and should
normatively fail in that example to comply with XML processing instruction
rules? I don't think you can put CDATA inside PI blocks...

I'm not sure if the PHP language has "embeddable in accordance with XML" in
it's specs or whether it's just a happenstance similar naming from something
in SGML.

-- 
Darien Hager
Developer
Etelos, Inc.
darien at etelos.com <mailto:darien at etelos.com>  

http://www.etelos.com <http://www.etelos.com> 
"Revolutionizing the way applications are developed, distributed and
consumed."

This e-mail message, including attachments, may contain confidential
information for the sole use of the intended recipient(s). If you are not
the intended recipient, then this is notice that any use, disclosure,
dissemination, distribution or copying is strictly prohibited. If you have
received this message in error please contact the sender by reply mail and
destroy all copies of the original message. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080131/ac77caf8/attachment.html 


More information about the antlr-interest mailing list