The attach patch implements a special "token" <?php HALT; ?> that can be
used to stop the Zend lexical parser from parsing any data after this
token.
The idea behind this patch is to allow tucking on of any data (binary
and otherwise) to the PHP script without having to encode it. It also
saves on memory/cpu that normally be used/needed to parse the data.
Memory is particularly important point, since many systems run with
default memory limit (8 megs) and would easily hit it when a script had
a 2-4 megs of data tacked on.
The patch would be extremely helpful for application distribution where
the entire install/upgrade script can be just a single script.
The syntax chosen for the patch was design specifically in such a way as
to not cause parser errors in older PHPs without the support for this
feature.
Ilia
We've been talking about this on IRC for awhile now and I think it's a
nice patch to minimize memory usage when you want to create these sorts
of "bundled" PHP/data hybrid scripts.
John
The attach patch implements a special "token" <?php HALT; ?> that can be
used to stop the Zend lexical parser from parsing any data after this
token.The idea behind this patch is to allow tucking on of any data (binary
and otherwise) to the PHP script without having to encode it. It also
saves on memory/cpu that normally be used/needed to parse the data.
Memory is particularly important point, since many systems run with
default memory limit (8 megs) and would easily hit it when a script had
a 2-4 megs of data tacked on.The patch would be extremely helpful for application distribution where
the entire install/upgrade script can be just a single script.The syntax chosen for the patch was design specifically in such a way as
to not cause parser errors in older PHPs without the support for this
feature.Ilia
Index: Zend/zend_language_scanner.l
RCS file: /repository/ZendEngine2/zend_language_scanner.l,v
retrieving revision 1.124
diff -u -a -p -r1.124 zend_language_scanner.l
--- Zend/zend_language_scanner.l 7 Mar 2005 16:48:49 -0000 1.124
+++ Zend/zend_language_scanner.l 11 Mar 2005 19:30:47 -0000
@@ -1342,6 +1342,10 @@ NEWLINE ("\r"|"\n"|"\r\n")
return T_INLINE_HTML;
}+<INITIAL>"<?php"{WHITESPACE}"HALT;"{WHITESPACE}*"?>" {
- yyterminate();
+}<INITIAL>"<?"|"<script"{WHITESPACE}+"language"{WHITESPACE}"="{WHITESPACE}("php"|""php""|"'php'"){WHITESPACE}">" {
HANDLE_NEWLINES(yytext, yyleng);
if (CG(short_tags) || yyleng>2) { / yyleng>2 means it's not <? but <script> */
The attach patch implements a special "token" <?php HALT; ?> that can be
used to stop the Zend lexical parser from parsing any data after this
token.
+1
Hi Ilia:
The attach patch implements a special "token" <?php HALT; ?> that can be
used to stop the Zend lexical parser from parsing any data after this
token.
Would the data after that point be sent directly to STDOUT? Or would the
present script somehow be able to use it?
Thanks,
--Dan
--
T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y
data intensive web and database programming
http://www.AnalysisAndSolutions.com/
4015 7th Ave #4, Brooklyn NY 11232 v: 718-854-0335 f: 718-854-0409
Would the data after that point be sent directly to STDOUT? Or would the
present script somehow be able to use it?
The data will not be parsed or output. When you need it you would make
the script open itself and read the (binary) data dump from the end of
the file and use it various creative ways.
Ilia
Hi Ilia:
The data will not be parsed or output. When you need it you would make
the script open itself and read the (binary) data dump from the end of
the file and use it various creative ways.
Interesting. I'm wondering about the security implications of this.
This makes it very easy to use PHP as a means to propogate all sorts of
nasty things. Well, people could even do that today in one script by
setting a variable to a base64 encoded string then decoding it. None the
less, putting binary data in PHP scripts gives me pause.
--Dan
--
T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y
data intensive web and database programming
http://www.AnalysisAndSolutions.com/
4015 7th Ave #4, Brooklyn NY 11232 v: 718-854-0335 f: 718-854-0409
Daniel Convissor wrote:
Interesting. I'm wondering about the security implications of this.
This makes it very easy to use PHP as a means to propogate all sorts of
nasty things.
You can already use PHP to propagate all sorts of nasty things, nothing
changes in this respect.
Well, people could even do that today in one script by
setting a variable to a base64 encoded string then decoding it.
Sure, BUT this approach makes the final file approximately %30 larger
and any time it is executed this "data dump" as I like to refer to it
will be parsed and stored in memory. Which makes this approach highly
inefficient.
Ilia
Daniel Convissor wrote:
Interesting. I'm wondering about the security implications of this.
This makes it very easy to use PHP as a means to propogate all sorts of
nasty things.You can already use PHP to propagate all sorts of nasty things, nothing
changes in this respect.Well, people could even do that today in one script by
setting a variable to a base64 encoded string then decoding it.Sure, BUT this approach makes the final file approximately %30 larger
... snip...
No doubt. My second point was more to diminish my initial assertion, not
to diminish the validity of your patch.
Thanks,
--Dan
--
T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y
data intensive web and database programming
http://www.AnalysisAndSolutions.com/
4015 7th Ave #4, Brooklyn NY 11232 v: 718-854-0335 f: 718-854-0409
Interesting. I'm wondering about the security implications of this.
This makes it very easy to use PHP as a means to propogate all sorts of
nasty things. Well, people could even do that today in one script by
setting a variable to a base64 encoded string then decoding it. None the
less, putting binary data in PHP scripts gives me pause.
There is no issue here. You can throw binary data at the end of a PHP
script as it is now:
<?php
/* stuff */
exit;
?>
<binary data here>
-- the halt token only makes it so PHP doesn't waste time processing
something that doesn't need to be processed.
John
There is no issue here. You can throw binary data at the end of a PHP
script as it is now:<?php
/* stuff */
exit;
?>
<binary data here>
True enough.
Thanks,
--Dan
--
T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y
data intensive web and database programming
http://www.AnalysisAndSolutions.com/
4015 7th Ave #4, Brooklyn NY 11232 v: 718-854-0335 f: 718-854-0409
If you name the token HALT_PHP_PARSER instead (or something
equally unlikely to be used by accident and unambiguous in meaning),
you'll get a +1 from me :-)
--Wez.
The attach patch implements a special "token" <?php HALT; ?> that can be
used to stop the Zend lexical parser from parsing any data after this
token.The idea behind this patch is to allow tucking on of any data (binary
and otherwise) to the PHP script without having to encode it. It also
saves on memory/cpu that normally be used/needed to parse the data.
Memory is particularly important point, since many systems run with
default memory limit (8 megs) and would easily hit it when a script had
a 2-4 megs of data tacked on.The patch would be extremely helpful for application distribution where
the entire install/upgrade script can be just a single script.The syntax chosen for the patch was design specifically in such a way as
to not cause parser errors in older PHPs without the support for this
feature.Ilia
Index: Zend/zend_language_scanner.l
RCS file: /repository/ZendEngine2/zend_language_scanner.l,v
retrieving revision 1.124
diff -u -a -p -r1.124 zend_language_scanner.l
--- Zend/zend_language_scanner.l 7 Mar 2005 16:48:49 -0000 1.124
+++ Zend/zend_language_scanner.l 11 Mar 2005 19:30:47 -0000
@@ -1342,6 +1342,10 @@ NEWLINE ("\r"|"\n"|"\r\n")
return T_INLINE_HTML;
}+<INITIAL>"<?php"{WHITESPACE}"HALT;"{WHITESPACE}*"?>" {
yyterminate();
+}
<INITIAL>"<?"|"<script"{WHITESPACE}+"language"{WHITESPACE}"="{WHITESPACE}("php"|""php""|"'php'"){WHITESPACE}">" {
HANDLE_NEWLINES(yytext, yyleng);
if (CG(short_tags) || yyleng>2) { / yyleng>2 means it's not <? but <script> */
Wez Furlong wrote:
If you name the token HALT_PHP_PARSER instead (or something
equally unlikely to be used by accident and unambiguous in meaning),
you'll get a +1 from me :-)
I aim to please ;-) Here is the revised patch that makes the token a bit
clearer and also increases the strictness of the parser. For those of
you wondering how would you quickly locate the end of script and start
of data here are some solutions:
- while (fgets($fp) != '<?php HALT_PHP_PARSER; ?>'));
- Assuming the author knows the maximum length of the actual code, they
could read X bytes and then using strpos locate the position of the HALT
token and start reading data dump from there.
$fp = fopen(FILE, "r");
$halt_token = "<?php "."HALT_PHP_PARSER"."; ?>";
$pos = strpos(fread($fp, 10000), $halt_token);
fseek($fp, $pos + strlen($halt_token));
Ilia
- while (fgets($fp) != '<?php HALT_PHP_PARSER; ?>'));
Just a nudge about this code approach.
#1 It doesn't detect EOF
#2 It forgets about the newline returned by fgets()
#3 It doesn't allow the HALT; to float within a line between other content
(not that you'd do that anyway but...
while (($tmp = fgets($fp)) && (strpos($tmp, '<?php HALT_PHP_PARSER; ?>')
!== false));
Sara Golemon wrote:
Just a nudge about this code approach.
#1 It doesn't detect EOF
Well, if you encounter EOF before the HALT tag, it means as a developer
of the script you are trying to break your own code ;-)
#2 It forgets about the newline returned by fgets(
True, I suppose strncmp could be used here, since newline is not always
constant.
#3 It doesn't allow the HALT; to float within a line between other content
(not that you'd do that anyway but...
Going back to #1 ;-).
Ilia
Hi,
I modified your patch so it can capture the position where the
supposed data begins into the constant HALT_PHP_PARSER.
There may be a problem with my patch if more than one require()'d /
include()'d script contain HALT_PHP_PARSER, but it'd be
quite handy if such an issue is resolved.
<?php
$fp = fopen(FILE, 'rb');
fseek($fp, HALT_PHP_PARSER, SEEK_SET);
fpassthru($fp);
?>
<?php HALT_PHP_PARSER; ?>
abc
def
Moriyoshi
Moriyoshi Koizumi wrote:
I modified your patch so it can capture the position where the
supposed data begins into the constant HALT_PHP_PARSER.
Sounds like a good idea to me as all the manual work to guess the start
of the data look a bit kludgy to me.
<?php
$fp = fopen(FILE, 'rb');
fseek($fp, HALT_PHP_PARSER, SEEK_SET);
fpassthru($fp);
?>
<?php HALT_PHP_PARSER; ?>
abc
def
Hmm.. I was wondering if we should go one step further and also provide
a stream to the data. If the most common usage of this construct is
going to be reading the data after HALT_PHP_PARSER then we should
optimize for that case. Something like
<?php
$data = gzdeflate(file_get_contents("php://data"));
?>
<?php DATA; ?>
...
- Chris
I think the lexer patch is good enough.
John
Moriyoshi Koizumi wrote:
I modified your patch so it can capture the position where the
supposed data begins into the constant HALT_PHP_PARSER.Sounds like a good idea to me as all the manual work to guess the start
of the data look a bit kludgy to me.<?php
$fp = fopen(FILE, 'rb');
fseek($fp, HALT_PHP_PARSER, SEEK_SET);
fpassthru($fp);
?>
<?php HALT_PHP_PARSER; ?>
abc
defHmm.. I was wondering if we should go one step further and also provide
a stream to the data. If the most common usage of this construct is
going to be reading the data after HALT_PHP_PARSER then we should
optimize for that case. Something like<?php
$data = gzdeflate(file_get_contents("php://data"));
?>
<?php DATA; ?>
...
- Chris
Hmm.. I was wondering if we should go one step further and also provide a
stream to the data. If the most common usage of this construct is going to
be reading the data after HALT_PHP_PARSER then we should optimize for
that case. Something like<?php
$data = gzdeflate(file_get_contents("php://data"));
?>
<?php DATA; ?>
...
I think at that point this progresses from "unobtrusive lexer enhancement"
into the realm of "magic voodoo fluffiness".
Hmm.. I was wondering if we should go one step further and also provide
a stream to the data. If the most common usage of this construct is
going to be reading the data after HALT_PHP_PARSER then we should
optimize for that case. Something like<?php
$data = gzdeflate(file_get_contents("php://data"));
?>
<?php DATA; ?>
...
I would suggest keeping such implementation in the user-land code rather
then the core. It would be fairly trivial to implement a streams wrapper
to do this and it can internally handle various decoding schema like
decompression etc...
Ilia
Moriyoshi Koizumi wrote:
Hi,
I modified your patch so it can capture the position where the
supposed data begins into the constant HALT_PHP_PARSER.There may be a problem with my patch if more than one require()'d /
include()'d script contain HALT_PHP_PARSER, but it'd be
quite handy if such an issue is resolved.
+1 seems like a good idea.
Ilia