Hi all,
I'm trying to optimize php_html_entities().
Since htmlspecialchars()
/htmlentities() are sensitive function, I would
like to ask comments before merge.
https://github.com/php/php-src/pull/1356
Current php_html_entities() convert int/float/etc to string, then convert
it.
int/float/etc is not required to be escaped. Optimize it by simply
converting them to string.
Simple benchmark shows about 60% execution time reduction.
[yohgaki@dev github-php-src]$ ./php-bin b.php
Time: 10.484607934952
[yohgaki@dev github-php-src]$ ./php-bin b.php
Time: 10.867615222931
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 3.9379420280457
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 4.0694711208344
[yohgaki@dev github-php-src]$ cat b.php
<?php
const LOOP=100000000;
$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
}
$loop_time = microtime(true) - $start;
$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
htmlspecialchars(123456790);
}
echo 'Time: '.(microtime(true) - $start - $loop_time)."\n";
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hey:
Hi all,
I'm trying to optimize php_html_entities().
Sincehtmlspecialchars()
/htmlentities() are sensitive function, I would
like to ask comments before merge.https://github.com/php/php-src/pull/1356
Current php_html_entities() convert int/float/etc to string, then convert
it.
int/float/etc is not required to be escaped. Optimize it by simply
converting them to string.Simple benchmark shows about 60% execution time reduction.
[yohgaki@dev github-php-src]$ ./php-bin b.php
Time: 10.484607934952
[yohgaki@dev github-php-src]$ ./php-bin b.php
Time: 10.867615222931
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 3.9379420280457
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 4.0694711208344
[yohgaki@dev github-php-src]$ cat b.php
<?php
const LOOP=100000000;$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
}
$loop_time = microtime(true) - $start;$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
htmlspecialchars(123456790);
}
echo 'Time: '.(microtime(true) - $start - $loop_time)."\n";
But passing an non-string to htmlspecialchars are not common used cases..
"optimize" not common used cases... will bring nothing to us..
thanks
--
Yasuo Ohgaki
yohgaki@ohgaki.net
--
Xinchen Hui
@Laruence
http://www.laruence.com/
Hey:
Hi all,
I'm trying to optimize php_html_entities().
Sincehtmlspecialchars()
/htmlentities() are sensitive function, I would
like to ask comments before merge.https://github.com/php/php-src/pull/1356
Current php_html_entities() convert int/float/etc to string, then convert
it.
int/float/etc is not required to be escaped. Optimize it by simply
converting them to string.Simple benchmark shows about 60% execution time reduction.
[yohgaki@dev github-php-src]$ ./php-bin b.php
Time: 10.484607934952
[yohgaki@dev github-php-src]$ ./php-bin b.php
Time: 10.867615222931
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 3.9379420280457
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 4.0694711208344
[yohgaki@dev github-php-src]$ cat b.php
<?php
const LOOP=100000000;$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
}
$loop_time = microtime(true) - $start;$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
htmlspecialchars(123456790);
}
echo 'Time: '.(microtime(true) - $start - $loop_time)."\n";
But passing an non-string to htmlspecialchars are not common used cases.."optimize" not common used cases... will bring nothing to us..
In addition, this breaks the contract, specifically when using scalar
types. Because you're no longer going to error when the contract is
broken (considering htmlspecialchars is documented as string:string).
Anthony
Hi Anthony,
On Wed, Jun 24, 2015 at 12:21 AM, Anthony Ferrara ircmaxell@gmail.com
wrote:
In addition, this breaks the contract, specifically when using scalar
types. Because you're no longer going to error when the contract is
broken (considering htmlspecialchars is documented as string:string).
What do you mean by "break the contract".
"string" parameter is not a requirement/contract.
htmlspecialchars/htmlentities
just converts param to string. The patch does not change anything as you
can
see it from the phpt results.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Anthony,
I got it.
On Wed, Jun 24, 2015 at 12:21 AM, Anthony Ferrara ircmaxell@gmail.com
wrote:In addition, this breaks the contract, specifically when using scalar
types. Because you're no longer going to error when the contract is
broken (considering htmlspecialchars is documented as string:string).What do you mean by "break the contract".
"string" parameter is not a requirement/contract.
htmlspecialchars/htmlentities
just converts param to string. The patch does not change anything as you
can
see it from the phpt results.
[yohgaki@dev github-php-src]$ cat ../t.php
<?php
declare(strict_types=1);
var_dump( htmlspecialchars(123) );
[yohgaki@dev github-php-src]$ ./php-bin ../t.php
Fatal error: Uncaught TypeError: htmlspecialchars()
expects parameter 1 to
be string, integer given in /home/yohgaki/workspace/ext/git/oss/
php.net/t.php:4
Stack trace:
#0 /home/yohgaki/workspace/ext/git/oss/php.net/t.php(4):
htmlspecialchars(123)
#1 {main}
thrown in /home/yohgaki/workspace/ext/git/oss/php.net/t.php on line 4
I think this is massive breakage. It only happens in strict mode, though.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi all,
I got it.
On Wed, Jun 24, 2015 at 12:21 AM, Anthony Ferrara ircmaxell@gmail.com
wrote:In addition, this breaks the contract, specifically when using scalar
types. Because you're no longer going to error when the contract is
broken (considering htmlspecialchars is documented as string:string).What do you mean by "break the contract".
"string" parameter is not a requirement/contract.
htmlspecialchars/htmlentities
just converts param to string. The patch does not change anything as you
can
see it from the phpt results.[yohgaki@dev github-php-src]$ cat ../t.php
<?php
declare(strict_types=1);var_dump( htmlspecialchars(123) );
[yohgaki@dev github-php-src]$ ./php-bin ../t.php
Fatal error: Uncaught TypeError:
htmlspecialchars()
expects parameter 1 to
be string, integer given in /home/yohgaki/workspace/ext/git/oss/
php.net/t.php:4
Stack trace:
#0 /home/yohgaki/workspace/ext/git/oss/php.net/t.php(4):
htmlspecialchars(123)
#1 {main}
thrown in /home/yohgaki/workspace/ext/git/oss/php.net/t.php on line 4I think this is massive breakage. It only happens in strict mode, though.
IMHO, escape/unescape/encode/decode/conversion function is better to accept
any types.
HTML template may be separated script, but database code etc may not.
Writing code like
<?php
declare(strict_types=1);
$sql = 'SELECT * FROM '. pg_escape_identifier((string)$table). ' WHERE id
'. pg_escpae_literal((string)$id).';';
pg_query($sql);
?>
is better to be avoided. i.e. (string) cast before passing parameter.
Another example. JSON decode convert numeric to int/float
<?php
declare(strict_types=1);
$data = json_decode($json);
$str = mb_convert_kana((string) $data['some_data'], 'AKHV');
?>
Are we going to enforce users to use (string) casts for conversion
functions to switch
strict_types=1?
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Yasuo,
IMHO, escape/unescape/encode/decode/conversion function is better to accept
any types.
HTML template may be separated script, but database code etc may not.Writing code like
<?php
declare(strict_types=1);
$sql = 'SELECT * FROM '. pg_escape_identifier((string)$table). ' WHERE id '.
pg_escpae_literal((string)$id).';';
pg_query($sql);
?>is better to be avoided. i.e. (string) cast before passing parameter.
I agree 100%. Instead, the developer should get an error if the
parameter is not a string. Because it is an error. If you're passing
an array to pg_escape_identifier
, you have FAR WORSE problems.
Having the function accept anything and return anything (as you're
proposing) would eliminate any ability to detect this problem.
If people blind cast, that's their problem. We shouldn't make it
harder for people to detect problems by blindly accepting anything
under the sun.
Another example. JSON decode convert numeric to int/float
<?php
declare(strict_types=1);
$data = json_decode($json);
$str = mb_convert_kana((string) $data['some_data'], 'AKHV');
?>Are we going to enforce users to use (string) casts for conversion functions
to switch
strict_types=1?
No, the entire point is to have them actually validate the types.
Anthony
Hi Anthony,
On Wed, Jun 24, 2015 at 10:40 AM, Anthony Ferrara ircmaxell@gmail.com
wrote:
IMHO, escape/unescape/encode/decode/conversion function is better to
accept
any types.
HTML template may be separated script, but database code etc may not.Writing code like
<?php
declare(strict_types=1);
$sql = 'SELECT * FROM '. pg_escape_identifier((string)$table). ' WHERE
id '.
pg_escpae_literal((string)$id).';';
pg_query($sql);
?>is better to be avoided. i.e. (string) cast before passing parameter.
I agree 100%. Instead, the developer should get an error if the
parameter is not a string. Because it is an error. If you're passing
an array to pg_escape_identifier
, you have FAR WORSE problems.
Having the function accept anything and return anything (as you're
proposing) would eliminate any ability to detect this problem.
I agree 100%.
If people blind cast, that's their problem. We shouldn't make it
harder for people to detect problems by blindly accepting anything
under the sun.
strict_types=1 creates issue for int/float which is valid and accepted
without strict_types.
We will have mixed types due to type hint and it's problematic.
If escape functions accept string/int/float/object(only when
it has __toString), it's easier for users. Safety is guaranteed also.
Other than escape/conversion functions that expect "string" type
should get type errors.
Another example. JSON decode convert numeric to int/float
<?php
declare(strict_types=1);
$data = json_decode($json);
$str = mb_convert_kana((string) $data['some_data'], 'AKHV');
?>Are we going to enforce users to use (string) casts for conversion
functions
to switch
strict_types=1?No, the entire point is to have them actually validate the types.
I fully agree.
But people will do this, unless we make conversion functions accept
safe/valid scalars/objects... Or worse, people make assumption that
variables are safe to output w/o escape...
Since there weren't contracts before PHP7, I think we may adjust contract
for some functions before PHP7 release.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Yasuo,
Did you test the performance impact on strings? Since you changed how it
works the impact can be positive and maybe worth to make the method more
broad.
Juan Basso
Hi Anthony,
On Wed, Jun 24, 2015 at 10:40 AM, Anthony Ferrara ircmaxell@gmail.com
wrote:IMHO, escape/unescape/encode/decode/conversion function is better to
accept
any types.
HTML template may be separated script, but database code etc may not.Writing code like
<?php
declare(strict_types=1);
$sql = 'SELECT * FROM '. pg_escape_identifier((string)$table). ' WHERE
id '.
pg_escpae_literal((string)$id).';';
pg_query($sql);
?>is better to be avoided. i.e. (string) cast before passing parameter.
I agree 100%. Instead, the developer should get an error if the
parameter is not a string. Because it is an error. If you're passingan array to
pg_escape_identifier
, you have FAR WORSE problems.Having the function accept anything and return anything (as you're
proposing) would eliminate any ability to detect this problem.I agree 100%.
If people blind cast, that's their problem. We shouldn't make it
harder for people to detect problems by blindly accepting anything
under the sun.strict_types=1 creates issue for int/float which is valid and accepted
without strict_types.We will have mixed types due to type hint and it's problematic.
If escape functions accept string/int/float/object(only when
it has __toString), it's easier for users. Safety is guaranteed also.Other than escape/conversion functions that expect "string" type
should get type errors.Another example. JSON decode convert numeric to int/float
<?php
declare(strict_types=1);
$data = json_decode($json);
$str = mb_convert_kana((string) $data['some_data'], 'AKHV');
?>Are we going to enforce users to use (string) casts for conversion
functions
to switch
strict_types=1?No, the entire point is to have them actually validate the types.
I fully agree.
But people will do this, unless we make conversion functions accept
safe/valid scalars/objects... Or worse, people make assumption that
variables are safe to output w/o escape...Since there weren't contracts before PHP7, I think we may adjust contract
for some functions before PHP7 release.Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Juan,
Did you test the performance impact on strings? Since you changed how it
works the impact can be positive and maybe worth to make the method more
broad.
It's a bit slower (~3%) for 'abc' as input string. If I remove redundant
type
check(convert_to_string), it's better though. ~3% seems too large for 2
additional if statements - parameter type check. I'll look into.
[yohgaki@dev github-php-src]$ cat b.php
<?php
const LOOP=100000000;
$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
}
$loop_time = microtime(true) - $start;
$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
htmlspecialchars('abc');
}
echo 'Time: '.(microtime(true) - $start - $loop_time)."\n";
[yohgaki@dev github-php-src]$ ./php.old b.php
Time: 4.4925589561462
[yohgaki@dev github-php-src]$ ./php.old b.php
Time: 4.4821939468384
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 4.5892491340637
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 4.6097931861877
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi all,
Did you test the performance impact on strings? Since you changed how it
works the impact can be positive and maybe worth to make the method more
broad.It's a bit slower (~3%) for 'abc' as input string. If I remove redundant
type
check(convert_to_string), it's better though. ~3% seems too large for 2
additional if statements - parameter type check. I'll look into.[yohgaki@dev github-php-src]$ cat b.php
<?php
const LOOP=100000000;$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
}
$loop_time = microtime(true) - $start;$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
htmlspecialchars('abc');
}
echo 'Time: '.(microtime(true) - $start - $loop_time)."\n";[yohgaki@dev github-php-src]$ ./php.old b.php
Time: 4.4925589561462
[yohgaki@dev github-php-src]$ ./php.old b.php
Time: 4.4821939468384
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 4.5892491340637
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 4.6097931861877
Followings are better approximation. New code is a bit slower even if it's
in
benchmark error range. ~1% - ~3% slower for strings while it's much faster
(~70% or more. This case is more than 100%) for integers depending on
integer size. Since integer/float is much faster, only a few presence of
int/float
negates negative impact.
As you can see, loop only benchmark varies a lot.
I also counted total number of CPU instructions by perf. It varies too.
However, new code is a bit slower for sure even if there is only 2 "if"
addition
for strings. I was expecting use of raw zval negates this...
I would like to apply similar change to most string escape/conversion
functions.
(Please note that subject is "escape and conversion" function only)
Current patch allows "resource" type, but it should raise type error.
Comments are appreciated.
Sorry for flood of mails.
Regards,
[yohgaki@dev github-php-src]$ ./php.old b.php
Loop time
Loop: 0.08224892616272
Loop: 0.08589506149292
Loop: 0.087536096572876
Loop: 0.082170963287354
Loop: 0.082328081130981
Loop: 0.081686019897461
Loop: 0.087927103042603
Loop: 0.081233024597168
Loop: 0.082290887832642
Loop: 0.081492900848389
Loop: 0.081452131271362
Loop: 0.081291913986206
Loop Avg: 0.08220899105072
Escaping string abcdefgh
Time: 0.6288548707962
Time: 0.61239206790924
Time: 0.6169821023941
Time: 0.61680710315704
Time: 0.60974395275116
Time: 0.61229383945465
Time: 0.61705410480499
Time: 0.61985099315643
Time: 0.61954915523529
Time: 0.62286603450775
Time: 0.62001097202301
Time: 0.6216961145401
Time Avg: 0.6166380405426
Escaping integer 12345678
Time: 0.98103010654449
Time: 0.97324693202972
Time: 0.96535289287567
Time: 0.962033867836
Time: 0.95926511287689
Time: 0.9667671918869
Time: 0.96851003170013
Time: 0.96103513240814
Time: 0.96122300624847
Time: 0.95892608165741
Time: 0.96553599834442
Time: 0.96350586414337
Time Avg: 0.96321551799774
[yohgaki@dev github-php-src]$ ./php.new b.php
Loop time
Loop: 0.082480907440186
Loop: 0.081701993942261
Loop: 0.081080913543701
Loop: 0.080965042114258
Loop: 0.081230163574219
Loop: 0.08064603805542
Loop: 0.081595897674561
Loop: 0.081001043319702
Loop: 0.080890893936157
Loop: 0.084527969360352
Loop: 0.082308053970337
Loop: 0.081248998641968
Loop Avg: 0.081266903877258
Escaping string abcdefgh
Time: 0.62283012866974
Time: 0.62788126468658
Time: 0.63206312656403
Time: 0.62420103549957
Time: 0.62370917797089
Time: 0.62745521068573
Time: 0.626731133461
Time: 0.68132016658783
Time: 0.62430212497711
Time: 0.63127825260162
Time: 0.63222811222076
Time: 0.62956402301788
Time Avg: 0.62700154781342
Escaping integer 12345678
Time: 0.41526029109955
Time: 0.41403410434723
Time: 0.41262814998627
Time: 0.41230318546295
Time: 0.41555306911469
Time: 0.40698001384735
Time: 0.41639611721039
Time: 0.41320106983185
Time: 0.41369721889496
Time: 0.42157099246979
Time: 0.42368505001068
Time: 0.42184422016144
Time Avg: 0.4141624212265
[yohgaki@dev github-php-src]$ cat b.php
<?php
const LOOP=10000000;
const ITER=12;
echo "Loop time\n";
$t = [];
for ($n = 0; $n < ITER; $n++) {
$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
}
$loop_time = microtime(true) - $start;
echo 'Loop: '.$loop_time."\n";
$t[] = $loop_time;
}
// Remove max for better approximation
unset($t[array_search(max($t), $t)]);
unset($t[array_search(max($t), $t)]);
$loop_time = array_sum($t)/(ITER-2);
echo "Loop Avg: ${loop_time}\n";
$s = 'abcdefgh';
echo "Escaping string ${s}\n";
$t = [];
for ($n = 0; $n < ITER; $n++) {
$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
htmlspecialchars($s);
}
$time = (microtime(true) - $start - $loop_time);
echo "Time: ${time}\n";
$t[] = $time;
}
// Remove max for better approximation
unset($t[array_search(max($t), $t)]);
unset($t[array_search(max($t), $t)]);
$time = array_sum($t)/(ITER-2);
echo "Time Avg: ${time}\n";
$s = 12345678;
echo "Escaping integer ${s}\n";
$t = [];
for ($n = 0; $n < ITER; $n++) {
$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
htmlspecialchars($s);
}
$time = (microtime(true) - $start - $loop_time);
echo "Time: ${time}\n";
$t[] = $time;
}
// Remove max for better approximation
unset($t[array_search(max($t), $t)]);
unset($t[array_search(max($t), $t)]);
$time = array_sum($t)/(ITER-2);
echo "Time Avg: ${time}\n";
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hey:
Hi all,
Did you test the performance impact on strings? Since you changed how it
works the impact can be positive and maybe worth to make the method more
broad.It's a bit slower (~3%) for 'abc' as input string. If I remove redundant
type
check(convert_to_string), it's better though. ~3% seems too large for 2
additional if statements - parameter type check. I'll look into.[yohgaki@dev github-php-src]$ cat b.php
<?php
const LOOP=100000000;$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
}
$loop_time = microtime(true) - $start;$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
htmlspecialchars('abc');
}
echo 'Time: '.(microtime(true) - $start - $loop_time)."\n";[yohgaki@dev github-php-src]$ ./php.old b.php
Time: 4.4925589561462
[yohgaki@dev github-php-src]$ ./php.old b.php
Time: 4.4821939468384
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 4.5892491340637
[yohgaki@dev github-php-src]$ ./php.new b.php
Time: 4.6097931861877Followings are better approximation. New code is a bit slower even if it's
in
benchmark error range. ~1% - ~3% slower for strings while it's much faster
(~70% or more. This case is more than 100%) for integers depending on
integer size. Since integer/float is much faster, only a few presence of
int/float
negates negative impact.As you can see, loop only benchmark varies a lot.
I also counted total number of CPU instructions by perf. It varies too.
However, new code is a bit slower for sure even if there is only 2 "if"
addition
for strings. I was expecting use of raw zval negates this...I would like to apply similar change to most string escape/conversion
functions.
Sorry for saying this, but I don't see any reason to do so.
the benchmark make nothing to me, since it's actually not a
recommend(right) usage at all. as I said, the fastest way to
htmlspecicalchars an integer is not calling htmlspecicalchars against
number.
and I can always get some "benchmark" to show my "optimization" works,
like if (EXPECT()); I can always make a "benchmark" which make the
condition fails, and get some "improvement"..
if you are insist to do so, please write an RFC, I will vote NO for it.
sorry & thanks
(Please note that subject is "escape and conversion" function only)
Current patch allows "resource" type, but it should raise type error.Comments are appreciated.
Sorry for flood of mails.Regards,
[yohgaki@dev github-php-src]$ ./php.old b.php
Loop time
Loop: 0.08224892616272
Loop: 0.08589506149292
Loop: 0.087536096572876
Loop: 0.082170963287354
Loop: 0.082328081130981
Loop: 0.081686019897461
Loop: 0.087927103042603
Loop: 0.081233024597168
Loop: 0.082290887832642
Loop: 0.081492900848389
Loop: 0.081452131271362
Loop: 0.081291913986206
Loop Avg: 0.08220899105072
Escaping string abcdefgh
Time: 0.6288548707962
Time: 0.61239206790924
Time: 0.6169821023941
Time: 0.61680710315704
Time: 0.60974395275116
Time: 0.61229383945465
Time: 0.61705410480499
Time: 0.61985099315643
Time: 0.61954915523529
Time: 0.62286603450775
Time: 0.62001097202301
Time: 0.6216961145401
Time Avg: 0.6166380405426
Escaping integer 12345678
Time: 0.98103010654449
Time: 0.97324693202972
Time: 0.96535289287567
Time: 0.962033867836
Time: 0.95926511287689
Time: 0.9667671918869
Time: 0.96851003170013
Time: 0.96103513240814
Time: 0.96122300624847
Time: 0.95892608165741
Time: 0.96553599834442
Time: 0.96350586414337
Time Avg: 0.96321551799774[yohgaki@dev github-php-src]$ ./php.new b.php
Loop time
Loop: 0.082480907440186
Loop: 0.081701993942261
Loop: 0.081080913543701
Loop: 0.080965042114258
Loop: 0.081230163574219
Loop: 0.08064603805542
Loop: 0.081595897674561
Loop: 0.081001043319702
Loop: 0.080890893936157
Loop: 0.084527969360352
Loop: 0.082308053970337
Loop: 0.081248998641968
Loop Avg: 0.081266903877258
Escaping string abcdefgh
Time: 0.62283012866974
Time: 0.62788126468658
Time: 0.63206312656403
Time: 0.62420103549957
Time: 0.62370917797089
Time: 0.62745521068573
Time: 0.626731133461
Time: 0.68132016658783
Time: 0.62430212497711
Time: 0.63127825260162
Time: 0.63222811222076
Time: 0.62956402301788
Time Avg: 0.62700154781342
Escaping integer 12345678
Time: 0.41526029109955
Time: 0.41403410434723
Time: 0.41262814998627
Time: 0.41230318546295
Time: 0.41555306911469
Time: 0.40698001384735
Time: 0.41639611721039
Time: 0.41320106983185
Time: 0.41369721889496
Time: 0.42157099246979
Time: 0.42368505001068
Time: 0.42184422016144
Time Avg: 0.4141624212265[yohgaki@dev github-php-src]$ cat b.php
<?php
const LOOP=10000000;
const ITER=12;echo "Loop time\n";
$t = [];
for ($n = 0; $n < ITER; $n++) {
$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
}
$loop_time = microtime(true) - $start;
echo 'Loop: '.$loop_time."\n";
$t[] = $loop_time;
}
// Remove max for better approximation
unset($t[array_search(max($t), $t)]);
unset($t[array_search(max($t), $t)]);
$loop_time = array_sum($t)/(ITER-2);
echo "Loop Avg: ${loop_time}\n";$s = 'abcdefgh';
echo "Escaping string ${s}\n";
$t = [];
for ($n = 0; $n < ITER; $n++) {
$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
htmlspecialchars($s);
}
$time = (microtime(true) - $start - $loop_time);
echo "Time: ${time}\n";
$t[] = $time;
}
// Remove max for better approximation
unset($t[array_search(max($t), $t)]);
unset($t[array_search(max($t), $t)]);
$time = array_sum($t)/(ITER-2);
echo "Time Avg: ${time}\n";$s = 12345678;
echo "Escaping integer ${s}\n";
$t = [];
for ($n = 0; $n < ITER; $n++) {
$start = microtime(true);
for ($i = 0; $i < LOOP; $i++) {
htmlspecialchars($s);
}
$time = (microtime(true) - $start - $loop_time);
echo "Time: ${time}\n";
$t[] = $time;
}
// Remove max for better approximation
unset($t[array_search(max($t), $t)]);
unset($t[array_search(max($t), $t)]);
$time = array_sum($t)/(ITER-2);
echo "Time Avg: ${time}\n";--
Yasuo Ohgaki
yohgaki@ohgaki.net
--
Xinchen Hui
@Laruence
http://www.laruence.com/
Hi Anthony,
On Wed, Jun 24, 2015 at 10:40 AM, Anthony Ferrara ircmaxell@gmail.com
wrote:IMHO, escape/unescape/encode/decode/conversion function is better to
accept
any types.
HTML template may be separated script, but database code etc may not.Writing code like
<?php
declare(strict_types=1);
$sql = 'SELECT * FROM '. pg_escape_identifier((string)$table). ' WHERE
id '.
pg_escpae_literal((string)$id).';';
pg_query($sql);
?>is better to be avoided. i.e. (string) cast before passing parameter.
I agree 100%. Instead, the developer should get an error if the
parameter is not a string. Because it is an error. If you're passingan array to
pg_escape_identifier
, you have FAR WORSE problems.Having the function accept anything and return anything (as you're
proposing) would eliminate any ability to detect this problem.I agree 100%.
If people blind cast, that's their problem. We shouldn't make it
harder for people to detect problems by blindly accepting anything
under the sun.strict_types=1 creates issue for int/float which is valid and accepted
without strict_types.We will have mixed types due to type hint and it's problematic.
If escape functions accept string/int/float/object(only when
it has __toString), it's easier for users. Safety is guaranteed also.Other than escape/conversion functions that expect "string" type
should get type errors.Another example. JSON decode convert numeric to int/float
<?php
declare(strict_types=1);
$data = json_decode($json);
$str = mb_convert_kana((string) $data['some_data'], 'AKHV');
?>Are we going to enforce users to use (string) casts for conversion
functions
to switch
strict_types=1?No, the entire point is to have them actually validate the types.
I fully agree.
But people will do this, unless we make conversion functions accept
safe/valid scalars/objects... Or worse, people make assumption that
variables are safe to output w/o escape...Since there weren't contracts before PHP7, I think we may adjust contract
for some functions before PHP7 release.
Another reason of this optimization is type affinity that I would like to
add.
https://wiki.php.net/rfc/introduce-type-affinity
With type affinity, "integer string"/"float string" are converted to
int/float type
and PHP can execute script more efficiently with type hint. i.e. Type
affinity
eliminates countless "string" -> "int/float" conversions with weak type
hint.
Issue with converted values is int/float cannot escape as it is now because
they expect "string" strictly with "strict_types=1", as well as it requires
needless type conversions to "string" in weak mode.
e.g. htmlspecialchars, pg_escape_string/identifier/literal, etc.
Therefore, I would like to allow int/float for these functions.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Xinchen,
But passing an non-string to htmlspecialchars are not common used cases..
"optimize" not common used cases... will bring nothing to us..
The reason why I brought up this now is scalar type hint.
Before PHP7, people didn't not care if data sent from browser is
actually a string. e.g. age, month, date, etc.
However, this optimization have more effects because of PHP7's type hint
that convert data type "always" and users must escape regardless of it's
type. Wrong date type assumption is common source of JavaScript injections.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Xinchen,
But passing an non-string to htmlspecialchars are not common used cases..
"optimize" not common used cases... will bring nothing to us..
The reason why I looked into this in the first place is one of my friend
complained about slow htmlspecialchars for relatively large table data.
He said more than half of execution time is took by htmlspecialchars
because
- it escapes numeric values as string
- it does not accept array value for to be escaped values
1st is covered by this patch.
2nd issue is covered by this FR.
https://bugs.php.net/bug.php?id=69908
So it brings something for us :)
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hey:
Hi Xinchen,
But passing an non-string to htmlspecialchars are not common used cases..
"optimize" not common used cases... will bring nothing to us..
The reason why I looked into this in the first place is one of my friend
complained about slow htmlspecialchars for relatively large table data.He said more than half of execution time is took by htmlspecialchars
because
- it escapes numeric values as string
- it does not accept array value for to be escaped values
1st is covered by this patch.
2nd issue is covered by this FR.
https://bugs.php.net/bug.php?id=69908So it brings something for us :)
Then what about strlen(number)? are you also going to change its ZPP
to accept zval instead of string?
thanks
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
--
Xinchen Hui
@Laruence
http://www.laruence.com/
Hi Xinchen,
Hi Xinchen,
But passing an non-string to htmlspecialchars are not common used
cases.."optimize" not common used cases... will bring nothing to us..
The reason why I looked into this in the first place is one of my friend
complained about slow htmlspecialchars for relatively large table data.He said more than half of execution time is took by htmlspecialchars
because
- it escapes numeric values as string
- it does not accept array value for to be escaped values
1st is covered by this patch.
2nd issue is covered by this FR.
https://bugs.php.net/bug.php?id=69908So it brings something for us :)
Then what about strlen(number)? are you also going to change its ZPP
to accept zval instead of string?
No. Only some kind of conversion functions are the subject.
e.g. mb_convert_encoding()
, mb_convert_kana()
, etc.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hey:
Hi Xinchen,
Hi Xinchen,
But passing an non-string to htmlspecialchars are not common used
cases.."optimize" not common used cases... will bring nothing to us..
The reason why I looked into this in the first place is one of my friend
complained about slow htmlspecialchars for relatively large table data.He said more than half of execution time is took by htmlspecialchars
because
- it escapes numeric values as string
- it does not accept array value for to be escaped values
1st is covered by this patch.
2nd issue is covered by this FR.
https://bugs.php.net/bug.php?id=69908So it brings something for us :)
Then what about strlen(number)? are you also going to change its ZPP
to accept zval instead of string?No. Only some kind of conversion functions are the subject.
e.g.mb_convert_encoding()
,mb_convert_kana()
, etc.
What I meant is that it's rarely usage(like strlen(number)). we should
not care about them too much and break arg info.
and for the "age" usage you replied in github, I think the author of
such codes should be aware, if it's only number, then instead of
htmlespcicalchars($age), he should use echo $age directly... which is
more faster.
so, I am -1 or this "optimization".
thanks
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
--
Xinchen Hui
@Laruence
http://www.laruence.com/
Hi Xinchen,
and for the "age" usage you replied in github, I think the author of
such codes should be aware, if it's only number, then instead of
htmlespcicalchars($age), he should use echo $age directly... which is
more faster.
To build secure apps, users MUST escape everything for the context by
default.
Selective escaping is the cause of injection vulnerability especially with
language like
PHP.
Principle is "Don't think, escape all (for the context)".
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi,
Hi Xinchen,
and for the "age" usage you replied in github, I think the author of
such codes should be aware, if it's only number, then instead of
htmlespcicalchars($age), he should use echo $age directly... which is
more faster.To build secure apps, users MUST escape everything for the context by
default.
Selective escaping is the cause of injection vulnerability especially with
language like
PHP.Principle is "Don't think, escape all (for the context)".
The key word here is "context" ... you know that there's nothing to
escape for an integer, because the type is your context.
Selective escaping isn't a problem by itself, but that many people use
a blacklist approach instead of a whitelist one; and you can only fix
that with education.
Cheers,
Andrey.
Hi Andrey,
Hi Xinchen,
and for the "age" usage you replied in github, I think the author of
such codes should be aware, if it's only number, then instead of
htmlespcicalchars($age), he should use echo $age directly... which is
more faster.To build secure apps, users MUST escape everything for the context by
default.
Selective escaping is the cause of injection vulnerability especially
with
language like
PHP.Principle is "Don't think, escape all (for the context)".
The key word here is "context" ... you know that there's nothing to
escape for an integer, because the type is your context.Selective escaping isn't a problem by itself, but that many people use
a blacklist approach instead of a whitelist one; and you can only fix
that with education.
Right and agree.
Selective escaping isn't a problem by itself, but people do mistake, make
wrong assumptions, use wrong blacklist approach. The same variable can
be generated by different code path/source. It could be very hard to assure
a variable is really a int/float without validation. If one would like to
make
sure what a variable is and skip escaping, they need something like
It's much easier with unconditional escape everywhere like
<td><?php echo htmlspecialchars($var) ?><td>if htmlspecialchars()
is fast enough for int/float. (I'm not sure which one
is faster)
One example is SQLite that making sure variable type could be difficult.
SQLite can store string regardless of type definition. If developers add
SQLite support, in addition to MySQL/PostgreSQL, they may create
attack vector if they don't escape unconditionally.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi!
Current php_html_entities() convert int/float/etc to string, then convert
it.
int/float/etc is not required to be escaped. Optimize it by simply
converting them to string.
I'm not sure this is the case that is used frequently enough to actually
optimize for it in PHP code. If for particular app it is in the hot
path, wouldn't it be easier to just add a if(is_numeric()) or is_int()
check?
--
Stas Malyshev
smalyshev@gmail.com