All,
We've been working in the last few days to test and tune the Coercive STH
patch. I think the results are quite nice, and surprisingly better than
one might have expected.
Before diving into the results, we did update the RFC
(wiki.php.net/rfc/coercive_sth) - with the most notable difference being
allowing NULL->scalar conversions, for two reasons - it's not uncommon for
code to use 'null' as a way to denote an empty optional parameter to for
internal functions, and many internal functions seem to rely on that
behavior themselves. It should be possible to alter this behavior in the
future by changing the internal functions to explicitly handle NULL
inputs
as 'please use the default value' - but it's outside the scope of the RFC.
In addition, coercion from scalar to boolean is now limited only to
integer - which seems to be the most popular use case; Beforehand,
coercion was also allowed from float and string - but based on feedback
from Mike, we're reconsidering accepting strings again.
Another change being considered and not yet in the RFC is re-allowing
leading and trailing spaces for numeric strings (sorry Paddy.)
Now to the tests we ran. The goal was to see what kind of effect the
changes to the internal function APIs had on real world apps with large
codebase. The test methodology was done by placing a debugger breakpoint
on zend_error, to ensure no error gets lost in the ether of settings or
callbacks. Here are the results:
Drupal homepage: One new E_DEPRECATED
warning, which seems to catch a
real bug, or at least faulty looking code:
$path = trim($path, '/'); // raises E_DEPRECATED, as $path is boolean
false.
return $path;
Drupal admin interface (across the all pages): One new E_DEPRECATED
warning, which again seems to catch a real bug - stripslsahes() operating
on a boolean.
Magento homepage (w/ Magento's huge sample DB): One new E_DEPRECATED
warning, again, seems to be catching a real bug of 'false' being fed as
argument 1 of in json_decode()
- which is expecting a string full of json
there.
WordPress homepage: One new E_DEPRECATED
warning, again, seems to be
catching a real bug of 'false' being fed as argument 1 of substr()
.
Zend Framework 2 skeleton app: Zero new E_DEPRECATED
warnings.
Symfony ACME app: Zero new E_DEPRECATED
warnings (across the app).
As I'm sure you know, the above tests invoke a huge number of lines of
code in total, handling filesystem ops, database ops and all sorts of
other things. This is much of the mini test suite that we use to test
PHP's performance and compatibility (e.g. phpng, now PHP 7). So while
this isn't proof that code in the wild isn't going to have more issues -
it's a pretty good initial indication regarding the level of 'breakage' we
can expect. I'm putting breakage in quotes, as people would have several
years to fix these few issues, before the E_DEPRECATED
becomes an error
(or an exception, if the engine exceptions RFC passes).
In terms of the test suite (.phpts), the changes cause approximately 700
extra tests to fail out of 13,700, in comparison to w/o the patch.
However, even though I didn't have a chance to go over all of them, it
seems that the vast majority of the failed tests are tests that were
intentionally designed to cover the exact parameter passing behavior,
rather than real likely real world code pieces. A huge number of the
internal functions have this in their test suites:
$variation = array(
'float 10.5' => 10.5,
'float -10.5' => -10.5,
'float 12.3456789000e10' => 12.3456789000e10,
'float -12.3456789000e10' => -12.3456789000e10,
'float .5' => .5,
);
foreach ( $variation as $var ) {
var_dump(readgzfile( $filename, $var ) );
}
Which clearly isn't a very likely input to the size argument of
readgzfile()
. All of these now trigger E_DEPRECATED
warnings since we no
longer allow float->int conversions if there's no data loss. For some
reason (perhaps a good one, not sure), we have virtually identical pieces
of code for dozens if not hundreds of internal functions that expect
integers - and these types of failures account for the (looks like vast)
majority of phpt failures. Quoting Andrea from a couple of days ago on
this very topic, "to be fair, most of PHP's test suite basically just
tests zpp's behavior", which appears to be absolutely true. The
real-world app tests are probably a much better indicator of the kind of
behaviors our users are going to see than our test suite is.
The takeaway from this seems to be that the approach of tightening the
existing rule-set and applying it to both internal functions and new
user-land code is viable, and does not cause the sky to come down falling.
Preliminary tests suggest it actually finds real potential problems.
More tomorrow.
Zeev
All,
We've been working in the last few days to test and tune the Coercive STH
patch. I think the results are quite nice, and surprisingly better than
one might have expected.Before diving into the results, we did update the RFC
(wiki.php.net/rfc/coercive_sth) - with the most notable difference being
allowing NULL->scalar conversions, for two reasons - it's not uncommon for
code to use 'null' as a way to denote an empty optional parameter to for
internal functions, and many internal functions seem to rely on that
behavior themselves. It should be possible to alter this behavior in the
future by changing the internal functions to explicitly handleNULL
inputs
as 'please use the default value' - but it's outside the scope of the RFC.In addition, coercion from scalar to boolean is now limited only to
integer - which seems to be the most popular use case; Beforehand,
coercion was also allowed from float and string - but based on feedback
from Mike, we're reconsidering accepting strings again.Another change being considered and not yet in the RFC is re-allowing
leading and trailing spaces for numeric strings (sorry Paddy.)Now to the tests we ran. The goal was to see what kind of effect the
changes to the internal function APIs had on real world apps with large
codebase. The test methodology was done by placing a debugger breakpoint
on zend_error, to ensure no error gets lost in the ether of settings or
callbacks. Here are the results:Drupal homepage: One new
E_DEPRECATED
warning, which seems to catch a
real bug, or at least faulty looking code:
$path = trim($path, '/'); // raises E_DEPRECATED, as $path is boolean
false.
return $path;Drupal admin interface (across the all pages): One new
E_DEPRECATED
warning, which again seems to catch a real bug - stripslsahes() operating
on a boolean.Magento homepage (w/ Magento's huge sample DB): One new
E_DEPRECATED
warning, again, seems to be catching a real bug of 'false' being fed as
argument 1 of injson_decode()
- which is expecting a string full of json
there.WordPress homepage: One new
E_DEPRECATED
warning, again, seems to be
catching a real bug of 'false' being fed as argument 1 ofsubstr()
.Zend Framework 2 skeleton app: Zero new
E_DEPRECATED
warnings.Symfony ACME app: Zero new
E_DEPRECATED
warnings (across the app).As I'm sure you know, the above tests invoke a huge number of lines of
code in total, handling filesystem ops, database ops and all sorts of
other things. This is much of the mini test suite that we use to test
PHP's performance and compatibility (e.g. phpng, now PHP 7). So while
this isn't proof that code in the wild isn't going to have more issues -
it's a pretty good initial indication regarding the level of 'breakage' we
can expect. I'm putting breakage in quotes, as people would have several
years to fix these few issues, before theE_DEPRECATED
becomes an error
(or an exception, if the engine exceptions RFC passes).In terms of the test suite (.phpts), the changes cause approximately 700
extra tests to fail out of 13,700, in comparison to w/o the patch.
However, even though I didn't have a chance to go over all of them, it
seems that the vast majority of the failed tests are tests that were
intentionally designed to cover the exact parameter passing behavior,
rather than real likely real world code pieces. A huge number of the
internal functions have this in their test suites:$variation = array(
'float 10.5' => 10.5,
'float -10.5' => -10.5,
'float 12.3456789000e10' => 12.3456789000e10,
'float -12.3456789000e10' => -12.3456789000e10,
'float .5' => .5,
);foreach ( $variation as $var ) {
var_dump(readgzfile( $filename, $var ) );
}Which clearly isn't a very likely input to the size argument of
readgzfile()
. All of these now triggerE_DEPRECATED
warnings since we no
longer allow float->int conversions if there's no data loss. For some
reason (perhaps a good one, not sure), we have virtually identical pieces
of code for dozens if not hundreds of internal functions that expect
integers - and these types of failures account for the (looks like vast)
majority of phpt failures. Quoting Andrea from a couple of days ago on
this very topic, "to be fair, most of PHP's test suite basically just
tests zpp's behavior", which appears to be absolutely true. The
real-world app tests are probably a much better indicator of the kind of
behaviors our users are going to see than our test suite is.The takeaway from this seems to be that the approach of tightening the
existing rule-set and applying it to both internal functions and new
user-land code is viable, and does not cause the sky to come down falling.
Preliminary tests suggest it actually finds real potential problems.More tomorrow.
Zeev
--
How many deprecations do you get running the ZF2 and Symfony testsuites?
Nikita
How many deprecations do you get running the ZF2 and Symfony testsuites?
None, but it may have to do with the fact I haven't run them yet :)
Zeev
Hi Zeev,
Another change being considered and not yet in the RFC is re-allowing
leading and trailing spaces for numeric strings (sorry Paddy.)
Not to worry. I know where Github keeps the servers ;).
There are far worse things than spaces.
Paddy
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
Drupal homepage: One new
E_DEPRECATED
warning, which seems to catch a
real bug, or at least faulty looking code:
$path = trim($path, '/'); // raises E_DEPRECATED, as $path is boolean
false.
return $path;
So far nobody answered this question but Francois (tried). You keep
using this E_DEPRECATED
message as a safety net to catch possible bad
things. What's about other changes that may not match the current
casting rules (or lack of) and will be different with this RFC? Those
considered as "Yes". This is what I worry about, even if minimal these
cases are pains to track down and fix.
Also I would really like a clear table describing in details what will
be changed, to compare how it works now and how it work later. This is
something that is crystal clear with the dual mode RFC, nothing
changes by defaullt, and strict rules if strict mode is enabled for a
file.
As you would expect, I will vote no as it is proposed now.
De : Pierre Joye [mailto:pierre.php@gmail.com]
So far nobody answered this question but Francois (tried). You keep
using thisE_DEPRECATED
message as a safety net to catch possible bad
things.
Yes, even if we don't really 'catch' things, because we just raise E_DEPRECATED
and, then, continue execution exactly as before. Actually, it is more an 'alert' net than a safety net, as its role is to alert developers that they have a potential issue to solve during the next 5 or 10 years. In most cases, it is a real bug they'd better fix before 5 years ;)
What's about other changes that may not match the current
casting rules (or lack of) and will be different with this RFC? Those
considered as "Yes".
I now understand your previous post. There's absolutely no change in the way things are converted. Every conversion works exactly the same as in PHP 5. The only additional behavior we introduce is raising some E_DEPRECATED
in some cases. But, after raising E_DEPRECATED, we still do the work ! So, if you turn E_DEPRECATED
off, the behavior is exactly the same as PHP 5.
Even when E_DEPRECATED
will be turned to fatal error in the future, the ruleset will remain a pure subset of the cases supported in PHP 5, failing for more cases, but with no change in the way conversion are performed. I think it is not clear enough in the RFC, explaining your misconception.
Actually, I must admit this is not perfectly true. There is one conversion I would like to modify. If we reintroduce getting bool from string, the current case for false is "0" or empty string only. For consistency reasons, I would like to extend "0" to any string that would convert to a null numeric value (also considering as false "0.0", " 0",...) but I hope you will consider that's a marginal change and the risk is low. But this is just a suggestion. Even for this, if some prefer to stick with the current behavior, it will be preserved and kept for a future RFC.
Also I would really like a clear table describing in details what will
be changed, to compare how it works now and how it work later.
You're right. I had planned to write two tables. One with the changes and one with the final situation. But I didn't have enough time for both. I will add the 'changes' table in the next hours.
Regards
François
Hi Pierre,
De : Pierre Joye [mailto:pierre.php@gmail.com]
Also I would really like a clear table describing in details what will
be changed, to compare how it works now and how it work later.
The RFC now contains a table listing the changes compared to current PHP 5 rules. Enjoy it.
Regards
François
Zeev,
All,
We've been working in the last few days to test and tune the Coercive STH
patch. I think the results are quite nice, and surprisingly better than
one might have expected.
Can we try the patch ourselves? I would love to run it against some
libraries as well.
Before diving into the results, we did update the RFC
(wiki.php.net/rfc/coercive_sth) - with the most notable difference being
allowing NULL->scalar conversions, for two reasons - it's not uncommon for
code to use 'null' as a way to denote an empty optional parameter to for
internal functions, and many internal functions seem to rely on that
behavior themselves. It should be possible to alter this behavior in the
future by changing the internal functions to explicitly handleNULL
inputs
as 'please use the default value' - but it's outside the scope of the RFC.
This RFC trying to simpliy and cleanup the coercison rules, having two
different conversion rules for NULL->scalar
depending on userland or internal is counter-productive and bad. The
behavior you describe as null being
empty value is wide-spread in PHP userland code as well.
In addition, coercion from scalar to boolean is now limited only to
integer - which seems to be the most popular use case; Beforehand,
coercion was also allowed from float and string - but based on feedback
from Mike, we're reconsidering accepting strings again.Another change being considered and not yet in the RFC is re-allowing
leading and trailing spaces for numeric strings (sorry Paddy.)
I agree with Pierre here, it would be super helpful if we had a log in the
RFC of the actual changes that will be happening.
As in Francois original patch this seems to be a game of having 20 changes
and then picking which ones to do and which not.
Now to the tests we ran. The goal was to see what kind of effect the
changes to the internal function APIs had on real world apps with large
codebase. The test methodology was done by placing a debugger breakpoint
on zend_error, to ensure no error gets lost in the ether of settings or
callbacks. Here are the results:Drupal homepage: One new
E_DEPRECATED
warning, which seems to catch a
real bug, or at least faulty looking code:
$path = trim($path, '/'); // raises E_DEPRECATED, as $path is boolean
false.
return $path;Drupal admin interface (across the all pages): One new
E_DEPRECATED
warning, which again seems to catch a real bug - stripslsahes() operating
on a boolean.Magento homepage (w/ Magento's huge sample DB): One new
E_DEPRECATED
warning, again, seems to be catching a real bug of 'false' being fed as
argument 1 of injson_decode()
- which is expecting a string full of json
there.WordPress homepage: One new
E_DEPRECATED
warning, again, seems to be
catching a real bug of 'false' being fed as argument 1 ofsubstr()
.Zend Framework 2 skeleton app: Zero new
E_DEPRECATED
warnings.Symfony ACME app: Zero new
E_DEPRECATED
warnings (across the app).
I was expecting this, because the rule changes are mostly in regard to not
accepting
invalid input, so what you need to test is all the edge cases.
Say I rely on a validation in count()
somewhere in the code to implicitly
validate its an array:
if (count($_GET['filters'])) {
echo "Filtering my query";
}
This would work in the "happy path" case, because i have a filter set. But
maybe
there is some invalid state i can get into and only then the E_DEPRECATED
is produced.
The homepages of popular systems being the essential "happy path" for a
project, I wouldnt expect many errors to occur.
As I'm sure you know, the above tests invoke a huge number of lines of
code in total, handling filesystem ops, database ops and all sorts of
other things. This is much of the mini test suite that we use to test
PHP's performance and compatibility (e.g. phpng, now PHP 7). So while
this isn't proof that code in the wild isn't going to have more issues -
it's a pretty good initial indication regarding the level of 'breakage' we
can expect. I'm putting breakage in quotes, as people would have several
years to fix these few issues, before theE_DEPRECATED
becomes an error
(or an exception, if the engine exceptions RFC passes).In terms of the test suite (.phpts), the changes cause approximately 700
extra tests to fail out of 13,700, in comparison to w/o the patch.
However, even though I didn't have a chance to go over all of them, it
seems that the vast majority of the failed tests are tests that were
intentionally designed to cover the exact parameter passing behavior,
rather than real likely real world code pieces. A huge number of the
internal functions have this in their test suites:$variation = array(
'float 10.5' => 10.5,
'float -10.5' => -10.5,
'float 12.3456789000e10' => 12.3456789000e10,
'float -12.3456789000e10' => -12.3456789000e10,
'float .5' => .5,
);foreach ( $variation as $var ) {
var_dump(readgzfile( $filename, $var ) );
}Which clearly isn't a very likely input to the size argument of
readgzfile()
. All of these now triggerE_DEPRECATED
warnings since we no
longer allow float->int conversions if there's no data loss. For some
reason (perhaps a good one, not sure), we have virtually identical pieces
of code for dozens if not hundreds of internal functions that expect
integers - and these types of failures account for the (looks like vast)
majority of phpt failures. Quoting Andrea from a couple of days ago on
this very topic, "to be fair, most of PHP's test suite basically just
tests zpp's behavior", which appears to be absolutely true. The
real-world app tests are probably a much better indicator of the kind of
behaviors our users are going to see than our test suite is.The takeaway from this seems to be that the approach of tightening the
existing rule-set and applying it to both internal functions and new
user-land code is viable, and does not cause the sky to come down falling.
Preliminary tests suggest it actually finds real potential problems.
Nobody I think will argue that tightening the rules will not detect more
problems
and may lead to better code written by developers.
The question is if this a BC break that is acceptable or not, one that our
users can stand behind and say "Yes I am fine with being forced to invest
time updating my application so that it still runs on PHP 8".
More tomorrow.
Zeev
I've added the link to the patch
https://github.com/php/php-src/pull/1125/files
Thanks. Dmitry.
On Fri, Feb 27, 2015 at 11:03 AM, Benjamin Eberlei kontakt@beberlei.de
wrote:
Zeev,
All,
We've been working in the last few days to test and tune the Coercive STH
patch. I think the results are quite nice, and surprisingly better than
one might have expected.Can we try the patch ourselves? I would love to run it against some
libraries as well.Before diving into the results, we did update the RFC
(wiki.php.net/rfc/coercive_sth) - with the most notable difference being
allowing NULL->scalar conversions, for two reasons - it's not uncommon
for
code to use 'null' as a way to denote an empty optional parameter to for
internal functions, and many internal functions seem to rely on that
behavior themselves. It should be possible to alter this behavior in the
future by changing the internal functions to explicitly handleNULL
inputs
as 'please use the default value' - but it's outside the scope of the
RFC.This RFC trying to simpliy and cleanup the coercison rules, having two
different conversion rules for NULL->scalar
depending on userland or internal is counter-productive and bad. The
behavior you describe as null being
empty value is wide-spread in PHP userland code as well.In addition, coercion from scalar to boolean is now limited only to
integer - which seems to be the most popular use case; Beforehand,
coercion was also allowed from float and string - but based on feedback
from Mike, we're reconsidering accepting strings again.Another change being considered and not yet in the RFC is re-allowing
leading and trailing spaces for numeric strings (sorry Paddy.)I agree with Pierre here, it would be super helpful if we had a log in the
RFC of the actual changes that will be happening.
As in Francois original patch this seems to be a game of having 20 changes
and then picking which ones to do and which not.Now to the tests we ran. The goal was to see what kind of effect the
changes to the internal function APIs had on real world apps with large
codebase. The test methodology was done by placing a debugger breakpoint
on zend_error, to ensure no error gets lost in the ether of settings or
callbacks. Here are the results:Drupal homepage: One new
E_DEPRECATED
warning, which seems to catch a
real bug, or at least faulty looking code:
$path = trim($path, '/'); // raises E_DEPRECATED, as $path is boolean
false.
return $path;Drupal admin interface (across the all pages): One new
E_DEPRECATED
warning, which again seems to catch a real bug - stripslsahes() operating
on a boolean.Magento homepage (w/ Magento's huge sample DB): One new
E_DEPRECATED
warning, again, seems to be catching a real bug of 'false' being fed as
argument 1 of injson_decode()
- which is expecting a string full of json
there.WordPress homepage: One new
E_DEPRECATED
warning, again, seems to be
catching a real bug of 'false' being fed as argument 1 ofsubstr()
.Zend Framework 2 skeleton app: Zero new
E_DEPRECATED
warnings.Symfony ACME app: Zero new
E_DEPRECATED
warnings (across the app).I was expecting this, because the rule changes are mostly in regard to not
accepting
invalid input, so what you need to test is all the edge cases.Say I rely on a validation in
count()
somewhere in the code to implicitly
validate its an array:if (count($_GET['filters'])) {
echo "Filtering my query";
}This would work in the "happy path" case, because i have a filter set. But
maybe
there is some invalid state i can get into and only then theE_DEPRECATED
is produced.The homepages of popular systems being the essential "happy path" for a
project, I wouldnt expect many errors to occur.As I'm sure you know, the above tests invoke a huge number of lines of
code in total, handling filesystem ops, database ops and all sorts of
other things. This is much of the mini test suite that we use to test
PHP's performance and compatibility (e.g. phpng, now PHP 7). So while
this isn't proof that code in the wild isn't going to have more issues -
it's a pretty good initial indication regarding the level of 'breakage'
we
can expect. I'm putting breakage in quotes, as people would have several
years to fix these few issues, before theE_DEPRECATED
becomes an error
(or an exception, if the engine exceptions RFC passes).In terms of the test suite (.phpts), the changes cause approximately 700
extra tests to fail out of 13,700, in comparison to w/o the patch.
However, even though I didn't have a chance to go over all of them, it
seems that the vast majority of the failed tests are tests that were
intentionally designed to cover the exact parameter passing behavior,
rather than real likely real world code pieces. A huge number of the
internal functions have this in their test suites:$variation = array(
'float 10.5' => 10.5,
'float -10.5' => -10.5,
'float 12.3456789000e10' => 12.3456789000e10,
'float -12.3456789000e10' => -12.3456789000e10,
'float .5' => .5,
);foreach ( $variation as $var ) {
var_dump(readgzfile( $filename, $var ) );
}Which clearly isn't a very likely input to the size argument of
readgzfile()
. All of these now triggerE_DEPRECATED
warnings since we no
longer allow float->int conversions if there's no data loss. For some
reason (perhaps a good one, not sure), we have virtually identical pieces
of code for dozens if not hundreds of internal functions that expect
integers - and these types of failures account for the (looks like vast)
majority of phpt failures. Quoting Andrea from a couple of days ago on
this very topic, "to be fair, most of PHP's test suite basically just
tests zpp's behavior", which appears to be absolutely true. The
real-world app tests are probably a much better indicator of the kind of
behaviors our users are going to see than our test suite is.The takeaway from this seems to be that the approach of tightening the
existing rule-set and applying it to both internal functions and new
user-land code is viable, and does not cause the sky to come down
falling.
Preliminary tests suggest it actually finds real potential problems.Nobody I think will argue that tightening the rules will not detect more
problems
and may lead to better code written by developers.The question is if this a BC break that is acceptable or not, one that our
users can stand behind and say "Yes I am fine with being forced to invest
time updating my application so that it still runs on PHP 8".More tomorrow.
Zeev
I've added the link to the patch
Thanks!
First, the necessary PHPUnit changes (dev-master) to avoid errors:
https://gist.github.com/beberlei/8a33ae940829f1186da2
- Doctrine DBAL testsuite: 8 failures
- Doctrine ORM: Crashes unrelated after half the tests, but has about
30-50% failures like Symfony2 - Symfony2 Testsuite: 6215 failures
- Twig: Tests: 1108, Assertions: 1616, Errors: 125.
Now probably many of the failures are related to few code paths that need
fixing, however I have to find out that and will be a lot of work. But this
is the good PHP code!
For untested or older PHP code (and yes there is alot out there) i now have
to collect the errors in production to find all the occurances. This is
nothing i can just do whenever I want, I need a process to collect and fix
this over time. Now every company needs this process for every project they
have out there. And the typical agency has hundrets/thousands of drupal,
typo3, wordpress installations out there.
Thanks. Dmitry.
On Fri, Feb 27, 2015 at 11:03 AM, Benjamin Eberlei kontakt@beberlei.de
wrote:Zeev,
All,
We've been working in the last few days to test and tune the Coercive
STH
patch. I think the results are quite nice, and surprisingly better than
one might have expected.Can we try the patch ourselves? I would love to run it against some
libraries as well.Before diving into the results, we did update the RFC
(wiki.php.net/rfc/coercive_sth) - with the most notable difference
being
allowing NULL->scalar conversions, for two reasons - it's not uncommon
for
code to use 'null' as a way to denote an empty optional parameter to for
internal functions, and many internal functions seem to rely on that
behavior themselves. It should be possible to alter this behavior in
the
future by changing the internal functions to explicitly handleNULL
inputs
as 'please use the default value' - but it's outside the scope of the
RFC.This RFC trying to simpliy and cleanup the coercison rules, having two
different conversion rules for NULL->scalar
depending on userland or internal is counter-productive and bad. The
behavior you describe as null being
empty value is wide-spread in PHP userland code as well.In addition, coercion from scalar to boolean is now limited only to
integer - which seems to be the most popular use case; Beforehand,
coercion was also allowed from float and string - but based on feedback
from Mike, we're reconsidering accepting strings again.Another change being considered and not yet in the RFC is re-allowing
leading and trailing spaces for numeric strings (sorry Paddy.)I agree with Pierre here, it would be super helpful if we had a log in the
RFC of the actual changes that will be happening.
As in Francois original patch this seems to be a game of having 20 changes
and then picking which ones to do and which not.Now to the tests we ran. The goal was to see what kind of effect the
changes to the internal function APIs had on real world apps with large
codebase. The test methodology was done by placing a debugger
breakpoint
on zend_error, to ensure no error gets lost in the ether of settings or
callbacks. Here are the results:Drupal homepage: One new
E_DEPRECATED
warning, which seems to catch a
real bug, or at least faulty looking code:
$path = trim($path, '/'); // raises E_DEPRECATED, as $path is boolean
false.
return $path;Drupal admin interface (across the all pages): One new
E_DEPRECATED
warning, which again seems to catch a real bug - stripslsahes()
operating
on a boolean.Magento homepage (w/ Magento's huge sample DB): One new
E_DEPRECATED
warning, again, seems to be catching a real bug of 'false' being fed as
argument 1 of injson_decode()
- which is expecting a string full of
json
there.WordPress homepage: One new
E_DEPRECATED
warning, again, seems to be
catching a real bug of 'false' being fed as argument 1 ofsubstr()
.Zend Framework 2 skeleton app: Zero new
E_DEPRECATED
warnings.Symfony ACME app: Zero new
E_DEPRECATED
warnings (across the app).I was expecting this, because the rule changes are mostly in regard to not
accepting
invalid input, so what you need to test is all the edge cases.Say I rely on a validation in
count()
somewhere in the code to implicitly
validate its an array:if (count($_GET['filters'])) {
echo "Filtering my query";
}This would work in the "happy path" case, because i have a filter set. But
maybe
there is some invalid state i can get into and only then theE_DEPRECATED
is produced.The homepages of popular systems being the essential "happy path" for a
project, I wouldnt expect many errors to occur.As I'm sure you know, the above tests invoke a huge number of lines of
code in total, handling filesystem ops, database ops and all sorts of
other things. This is much of the mini test suite that we use to test
PHP's performance and compatibility (e.g. phpng, now PHP 7). So while
this isn't proof that code in the wild isn't going to have more issues -
it's a pretty good initial indication regarding the level of 'breakage'
we
can expect. I'm putting breakage in quotes, as people would have
several
years to fix these few issues, before theE_DEPRECATED
becomes an error
(or an exception, if the engine exceptions RFC passes).In terms of the test suite (.phpts), the changes cause approximately 700
extra tests to fail out of 13,700, in comparison to w/o the patch.
However, even though I didn't have a chance to go over all of them, it
seems that the vast majority of the failed tests are tests that were
intentionally designed to cover the exact parameter passing behavior,
rather than real likely real world code pieces. A huge number of the
internal functions have this in their test suites:$variation = array(
'float 10.5' => 10.5,
'float -10.5' => -10.5,
'float 12.3456789000e10' => 12.3456789000e10,
'float -12.3456789000e10' => -12.3456789000e10,
'float .5' => .5,
);foreach ( $variation as $var ) {
var_dump(readgzfile( $filename, $var ) );
}Which clearly isn't a very likely input to the size argument of
readgzfile()
. All of these now triggerE_DEPRECATED
warnings since we
no
longer allow float->int conversions if there's no data loss. For some
reason (perhaps a good one, not sure), we have virtually identical
pieces
of code for dozens if not hundreds of internal functions that expect
integers - and these types of failures account for the (looks like vast)
majority of phpt failures. Quoting Andrea from a couple of days ago on
this very topic, "to be fair, most of PHP's test suite basically just
tests zpp's behavior", which appears to be absolutely true. The
real-world app tests are probably a much better indicator of the kind of
behaviors our users are going to see than our test suite is.The takeaway from this seems to be that the approach of tightening the
existing rule-set and applying it to both internal functions and new
user-land code is viable, and does not cause the sky to come down
falling.
Preliminary tests suggest it actually finds real potential problems.Nobody I think will argue that tightening the rules will not detect more
problems
and may lead to better code written by developers.The question is if this a BC break that is acceptable or not, one that our
users can stand behind and say "Yes I am fine with being forced to invest
time updating my application so that it still runs on PHP 8".More tomorrow.
Zeev
On Fri, Feb 27, 2015 at 12:18 PM, Benjamin Eberlei kontakt@beberlei.de
wrote:
I've added the link to the patch
Thanks!
First, the necessary PHPUnit changes (dev-master) to avoid errors:
https://gist.github.com/beberlei/8a33ae940829f1186da2
- Doctrine DBAL testsuite: 8 failures
- Doctrine ORM: Crashes unrelated after half the tests, but has about
30-50% failures like Symfony2- Symfony2 Testsuite: 6215 failures
- Twig: Tests: 1108, Assertions: 1616, Errors: 125.
Now probably many of the failures are related to few code paths that need
fixing, however I have to find out that and will be a lot of work. But this
is the good PHP code!For untested or older PHP code (and yes there is alot out there) i now
have to collect the errors in production to find all the occurances. This
is nothing i can just do whenever I want, I need a process to collect and
fix this over time. Now every company needs this process for every project
they have out there. And the typical agency has hundrets/thousands of
drupal, typo3, wordpress installations out there.
I used the following configure line and then /usr/local/php7/lib/php.ini
./configure --prefix=/usr/local/php7 --with-mysql=mysqlnd
--with-mysqli=mysqlnd --with-tidy=/usr --with-curl=/usr --with-curlwrappers
--with-zlib-dir=/usr --enable-mbstring --with-xpm-dir=/usr
--enable-pdo=shared --with-pdo-mysql=shared,mysqlnd --without-sqlite
--with-pdo-sqlite=shared,/usr --with-xsl=/usr --with-xmlrpc
--with-iconv-dir=/usr --with-snmp=/usr --enable-exif --enable-calendar
--with-bz2=/usr --with-mcrypt=/usr --with-gd --with-jpeg-dir=/usr
--with-png-dir=/usr --with-zlib-dir=/usr --with-freetype-dir=/usr
--enable-mbstring --enable-zip --with-pear --enable-soap --with-gd
--enable-intl --with-bz2=/usr --enable-sockets --with-sqlite3=shared
--enable-pcntl
max_execution_time=600
memory_limit=128M
error_reporting=-1
display_errors=1
log_errors=0
user_ini.filename=
realpath_cache_size=2M
cgi.check_shebang_line=0
zend_extension=opcache.so
opcache.enable_cli=1
opcache.save_comments=1
opcache.fast_shutdown=1
opcache.validate_timestamps=1
opcache.revalidate_freq=60
opcache.use_cwd=1
opcache.max_accelerated_files=100000
opcache.max_wasted_percentage=5
opcache.memory_consumption=128
opcache.consistency_checks=0
date.timezone=Europe/Berlin
extension=pdo.so
extension=pdo_sqlite.so
extension=pdo_mysql.so
extension=sqlite3.so
-----Original Message-----
From: Benjamin Eberlei [mailto:kontakt@beberlei.de]
Sent: Friday, February 27, 2015 1:19 PM
To: Dmitry Stogov
Cc: Zeev Suraski; PHP internals
Subject: Re: [PHP-DEV] Coercive STH - some real world tests and updated
RFCI've added the link to the patch
https://github.com/php/php-src/pull/1125/files
Thanks!
First, the necessary PHPUnit changes (dev-master) to avoid errors:
https://gist.github.com/beberlei/8a33ae940829f1186da2
- Doctrine DBAL testsuite: 8 failures
- Doctrine ORM: Crashes unrelated after half the tests, but has about
30-50%
failures like Symfony2- Symfony2 Testsuite: 6215 failures
- Twig: Tests: 1108, Assertions: 1616, Errors: 125.
Thanks for the tests! We'll look into those.
Now probably many of the failures are related to few code paths that need
fixing, however I have to find out that and will be a lot of work. But
this is the
good PHP code!
Judging by the fact WordPress, Magento and Drupal appear a lot cleaner - you
have to wonder..? :)
But on a more serious note, I don't think we should assume just about
anything from the test suite as reflecting on real world app behavior. I'm
not familiar with the Symfony test suite (yet), but I do know that the
Symfony skeleton app ran cleanly without a single warning. Is it possible
it's testing a lot of things with garbage input, as you would in unit tests,
as the our own PHP test suite does? Those account for most of the failures.
Also take into account that a project with that many tests (and
specifically, failures) is probably not going to have just one or two people
responsible for it, but more. It's not that you personally would have to
fix Symfony, there are at least a dozen if not dozen people that would help
out.
For untested or older PHP code (and yes there is alot out there) i now
have to
collect the errors in production to find all the occurances. This is
nothing i
can just do whenever I want, I need a process to collect and fix this over
time. Now every company needs this process for every project they have out
there.
I wish that were true, but in reality, migration is painfully slow and
judging by the numbers, doesn't happen - more often than happening. The
'good' companies have ~5yrs after PHP 7 comes out and before PHP 7.x will no
longer be supported (and that's according to my hope for a fast PHP 8
timeline; it may be longer as you suggested). Would all projects migrate
by that timeline? Let's make a simpler question - would all projects
migrate to PHP 7 by 2017, the year 5.6 theoretically stops being supported?
We all know the answer to that.
The good companies, the security conscious ones that do keep up with new
versions, will likely be willing to invest the effort of fixing the code (or
hiring someone to do it for them), especially as it'll help them make their
apps more robust/secure. And especially as - at least so far - it seems
that real world apps have not even a handful of issues per page.
All that said, I think we must understand the huge gap between the very high
degree of test failures you're seeing in test suites, and the almost none
we're seeing in real world apps. If it's similar to the PHP test suite,
it's a non-issue, and even if it would take some work to fix the unit tests,
it's well worth it. If it's legit, but as you're guessing, may just be
several common paths of code that will do away with most of the failures -
it's still worth it. Investing several days over the course of several
years is great ROI for the value it brings. If, however, the test suite
does find a multitude of many different issues - it may bring me to
reconsider the RFC, as I told Anthony a few days ago - especially if the
signal to noise ratio would be bad. After the results we've seen with real
world apps I find it hard to believe, but it's certainly a possible
scenario.
At the end of the day, it boils down to how much real work do we think will
be required to update real world apps out there. Let's also not set
unreasonable bars for this change compared to other compatibility breakages
we've done (and are stil doing) over the years - again, most of which didn't
bring any tangible value to developers (which this change does, a lot).
And the typical agency has hundrets/thousands of drupal, typo3,
wordpress installations out there
Correct, but would you have to patch each one separately? You'd have to
install updated core for Drupal, Typo3, WP - which you commonly have to do
anyway if you care about security. Then there'd be some work to get your
common custom modules that you developed update - between the two of those,
that should account for most of the code. It's not as if you're going to
start with each deployment from scratch, is it?
Anyway, thanks again for running the tests - we'll definitely need to
understand that better.
Thanks,
Zeev
This RFC trying to simpliy and cleanup the coercison rules, having two
different conversion rules for NULL->scalar
depending on userland or internal is counter-productive and bad. The
behavior you describe as null being
empty value is wide-spread in PHP userland code as well.
I agree here with Benjamin. The thing I have wanted in user land for
years is to be able to build a user land function that worked the way
internal functions do in terms of type checking without having to build
my own type checking system. Having internal functions convert null to a
scalar and not having user land do the same creates an inconsistency.
For example, sometimes you want to wrap/extend an internal function (you
see it a lot with json* to do encoding checking). I can't reliably take
the same input to my user land wrapper that I can send to the internal
function. I would prefer to see the same rules apply to internal and
user land.
--
Brian.
De : Brian Moon [mailto:brian@moonspot.net]
I agree here with Benjamin. The thing I have wanted in user land for
years is to be able to build a user land function that worked the way
internal functions do in terms of type checking without having to build
my own type checking system. Having internal functions convert null to a
scalar and not having user land do the same creates an inconsistency.
For example, sometimes you want to wrap/extend an internal function (you
see it a lot with json* to do encoding checking). I can't reliably take
the same input to my user land wrapper that I can send to the internal
function. I would prefer to see the same rules apply to internal and
user land.
It would be better, yes. Unfortunately, accepting or refusing null to scalar in both is difficult. If we disable null to scalar, we potentially generate a lot of deprecation messages for existing code calling internal functions this way. If we enable null to scalar, we create massive future inconsistencies now and in the future. An example :
Function foo(): string {} // OK because returns null and null accepted as string.
My opinion, but that's only mine, is that I would favor deprecating null to scalar, even for internal functions. It would add deprecation messages but we'd end up with a cleaner code.
Regards
François
My opinion, but that's only mine, is that I would favor deprecating null to scalar, even for internal functions. It would add deprecation messages but we'd end up with a cleaner code.
null is a very specific 'state' for anybody working with databases. It
means that no value was found. Mapping that to a default value for
future processing is a job for the application rather than the data
packet as returned. It ONLY needs to overridden from a default "" or 0
in cases where the simple default is not required, but on the whole the
default 'case' is ALL that is needed. That null is also false in
situations where a bool is required is also natural. It's now having to
go through all the code and manually add these quite natural casts which
is the problem here. It's not a bug or mistake, it just flows in the
whole process.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi Zeev,
Drupal homepage: One new
E_DEPRECATED
warning, which seems to catch a
real bug, or at least faulty looking code:
$path = trim($path, '/'); // raises E_DEPRECATED, as $path is boolean
false.
return $path;Drupal admin interface (across the all pages): One new
E_DEPRECATED
warning, which again seems to catch a real bug - stripslsahes() operating
on a boolean.
All those are due to a bug in substr()
, that we see now only thanks to
proper type identification. There is no reason for substr()
to ever return
a boolean. It really needs to be fix to always return a string.
Damien
Hi Zeev,
Drupal homepage: One new
E_DEPRECATED
warning, which seems to catch a
real bug, or at least faulty looking code:
$path = trim($path, '/'); // raises E_DEPRECATED, as $path is boolean
false.
return $path;Drupal admin interface (across the all pages): One new
E_DEPRECATED
warning, which again seems to catch a real bug - stripslsahes() operating
on a boolean.All those are due to a bug in
substr()
, that we see now only thanks to
proper type identification. There is no reason forsubstr()
to ever return
a boolean. It really needs to be fix to always return a string.
Yes, weird behavior that substr("", 2, 2); for example returns false. But
changing that is just another evil BC break.
Damien
Drupal admin interface (across the all pages): One new
E_DEPRECATED
warning, which again seems to catch a real bug - stripslsahes() operating
on a boolean.All those are due to a bug in
substr()
, that we see now only thanks to
proper type identification. There is no reason forsubstr()
to ever return
a boolean. It really needs to be fix to always return a string.Yes, weird behavior that substr("", 2, 2); for example returns false. But
changing thatis just another evil BC break.
Now I don' think that 'weird' ... Although the correct return should
perhaps be 'null', but it's long been practice that s there is no result
we get 'false' so how any places will have a sanity check based on the
'false' return?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Drupal admin interface (across the all pages): One new
E_DEPRECATED
warning, which again seems to catch a real bug - stripslsahes()
operating
on a boolean.All those are due to a bug in
substr()
, that we see now only thanks to
proper type identification. There is no reason forsubstr()
to ever
return
a boolean. It really needs to be fix to always return a string.Yes, weird behavior that substr("", 2, 2); for example returns false. But
changing thatis just another evil BC break.Now I don' think that 'weird' ... Although the correct return should
perhaps be 'null', but it's long been practice that s there is no result
we get 'false' so how any places will have a sanity check based on the
'false' return?
As an example i ported PDepend to work with coercive typehints, this is the
quick and dirty patch:
https://gist.github.com/beberlei/6a5a6b65839d35bb27f0
In longer chains of string handling, i just cast everything to (string)
that could potentially return false here.
Interestingly functions like is_infinite()
, is_dir()
, is_file()
all expect
correct types like float or string here although from the naming convention
"is_something" i would expect it says just false on anything else (which
it almost does right now).
The patch also shows that this will lead to weird behavior when the PHP API
is actually wrong like DOMNode::cloneNode() expecting an integer and not a
boolean as argument, even though the PHP Documentation says it should be a
boolean.
I can literally forsee this is the way this kind of code will be fixed.
Imho the problem is that the return values of php internal functions being
string|false will lead to massive consecutive errors when passing this on
to other internal unctions.
--
Lester Caine - G8HFLContact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Imho the problem is that the return values of php internal functions being
string|false will lead to massive consecutive errors when passing this on
to other internal functions.
This is perhaps the crux of my objection to both types of 'error
checking' ... OK the return should be an empty string rather than false,
but certainly one does not want an exception when nothing is returned.
So much code IS based on doing one thing if there is a value and another
when there is not, so if that is the problem people are claiming needs
fixing perhaps that is what needs addressing? string|false is a core
element of most of PHP?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
De : Benjamin Eberlei [mailto:kontakt@beberlei.de]
Interestingly functions like
is_infinite()
,is_dir()
,is_file()
all expect
correct types like float or string here although from the naming convention
"is_something" i would expect it says just false on anything else (which
it almost does right now).
Agreed. Must be modified. Even for 7.0 if it can be approved before freeze. All is_xxx functions must be checked and made more permissive on input arg (most, if not all, should probably accept any zval, minor BC break).
Imho the problem is that the return values of php internal functions being
string|false will lead to massive consecutive errors when passing this on
to other internal unctions.
Right. A side effect of the RFC is to encourage for an explicit check of false. May I say that this won't exactly lead to 'errors', as we are just deprecating usage of the implicit cast in such case. So it may lead to massive 'deprecated' messages but, apart from additional messages, behavior won't change.
Regards
François
Hi Zeev,
Drupal homepage: One new
E_DEPRECATED
warning, which seems to catch a
real bug, or at least faulty looking code:
$path = trim($path, '/'); // raises E_DEPRECATED, as $path is boolean
false.
return $path;Drupal admin interface (across the all pages): One new
E_DEPRECATED
warning, which again seems to catch a real bug - stripslsahes() operating
on a boolean.All those are due to a bug in
substr()
, that we see now only thanks to
proper type identification. There is no reason forsubstr()
to ever return
a boolean. It really needs to be fix to always return a string.Damien
This kind of code here exhibits the failure:
https://github.com/pdepend/pdepend/blob/master/src/main/php/PDepend/Source/Language/PHP/PHPTokenizerInternal.php#L584
This is classic PHP style code where you rely on the implicit casting to
make the algorithm work for you.
if substr()
here returns false, then the error is:
PDepend\Source\Parser\UnstructuredCodeTest::testParserHandlesNonPhpCodeInFileProlog
strlen()
expects parameter 1 to be string, boolean given
/home/benny/code/php/workspace/pdepend/src/main/php/PDepend/Source/Language/PHP/PHPTokenizerInternal.php:586
The funny thing is that the fix for this is:
- substr($image, strrpos($image, "\n") + 1)
- (string)substr($image, strrpos($image, "\n") + 1)
Which is that sort of casting that is put forward as argument against the
dual/strict mode.
De : Benjamin Eberlei [mailto:kontakt@beberlei.de]
The funny thing is that the fix for this is:
- substr($image, strrpos($image, "\n") + 1)
- (string)substr($image, strrpos($image, "\n") + 1)
Which is that sort of casting that is put forward as argument against the
dual/strict mode.
It is just one possible fix, but definitely not the one I would suggest, because too permissive. No controversy please, because both are valid. I would just prefer adding a test like 'if ($result===false) $result='';'.
Regards
François
From: Damien Tournoud [mailto:damz@damz.org], Sent: Friday, February 27, 2015 2:38 PM
Hi Zeev,
Drupal homepage: One new
E_DEPRECATED
warning, which seems to catch a
real bug, or at least faulty looking code:
$path = trim($path, '/'); // raises E_DEPRECATED, as $path is boolean
false.
return $path;Drupal admin interface (across the all pages): One new
E_DEPRECATED
warning, which again seems to catch a real bug - stripslsahes() operating
on a boolean.All those are due to a bug in
substr()
, that we see now only thanks to
proper type identification. There is no reason forsubstr()
to ever return
a boolean. It really needs to be fix to always return a string.Damien
It is not a bug. FALSE
as a return value of substr()
is the identificator
for an error (e.g. invalid arguments), as it is stated in the documentation:
"Returns the extracted part of string; or FALSE
on failure, or an
empty string." [1]
This is an example which shows, that Zeevs RFC helps to find bugs in
applications. Because here are given invalid arguments to the method. In
other languages for example Java, you'll get an IndexOutOfBoundsException [2]
if you are trying the same ;-)
1: http://php.net/manual/en/function.substr.php
2: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#substring%28int,%20int%29
Cristian,
All those are due to a bug in
substr()
, that we see now only thanks to
proper type identification. There is no reason forsubstr()
to ever return
a boolean. It really needs to be fix to always return a string.Damien
It is not a bug.
FALSE
as a return value ofsubstr()
is the identificator
for an error (e.g. invalid arguments), as it is stated in the documentation:"Returns the extracted part of string; or
FALSE
on failure, or an
empty string." [1]This is an example which shows, that Zeevs RFC helps to find bugs in
applications. Because here are given invalid arguments to the method. In
other languages for example Java, you'll get an IndexOutOfBoundsException [2]
if you are trying the same ;-)
Well, it depends on your definition of bug.
If you look at PHPUnit's TestSuite::isTestMethod method, you'll see this:
$doc_comment = $method->getDocComment();
return strpos($doc_comment, '@test') !== false ||
strpos($doc_comment, '@scenario') !== false;
Now, with the coercive mode (and my strict mode), that will error if
the method does not have a docblock.
The reason is that $method->getDocComment()
returns false if one
does not exist.
Is that a bug? Well, semantically, maybe. But from the application
level, no. Passing the boolean result to strpos
directly ensures
that the result will be bool(false)
. So the overall method behaves
as expected returning true in the correct context (it's a test method)
and false in incorrect contexts (it's not).
So did the coercive mode find a bug? Not really. It found something
that may have worked in a way you didn't expect it to, but it wasn't a
bug (at least to the application).
Now, if you care about types and type changes, then yes, that is a
bug. And that's why the dual-mode RFC gives you the choice. You can
choose to care about types or not. Not some weird hybrid where you
have to care even if the code is practically correct...
Anthony
Hi Christian,
On Fri, Feb 27, 2015 at 3:38 PM, Christian Stoller stoller@leonex.de
wrote:
It is not a bug.
FALSE
as a return value ofsubstr()
is the identificator
for an error (e.g. invalid arguments), as it is stated in the
documentation:
[...]
"It is documented that way" and "it is not a bug" are two very different
things.
In that case, the semantics of substr()
are just wrong. It makes a lot
more sense for a sub-string function to silently allow reading before the
beginning and past the end of a string.
Moreover, the conditions under which substr()
returns FALSE
looks
completely arbitrary, see http://3v4l.org/gtFjk -- in a nutshell: so it is
OK to read past the beginning, it is OK to try to get more characters than
the string has, but it is NOT OK to try to start reading past the end?
This behavior is unpredictable, and as a consequence using substr()
properly would require a lot of painful and unnecessary up-front checks. It
traditionally did not matter, because of the fluid scalar casts. But if we
want now to introduce stricter casts, we are going to have to tackle this
problem, we cannot just dismiss it as a programming error.
Damien
From: Damien Tournoud [mailto:damz@damz.org], Sent: Friday, February 27, 2015 4:54 PM
Hi Christian,
It is not a bug.
FALSE
as a return value ofsubstr()
is the identificator
for an error (e.g. invalid arguments), as it is stated in the documentation:
[...]"It is documented that way" and "it is not a bug" are two very different things.
That’s not true. Quoting Wikipedia "A software bug is an error, flaw,
failure, or fault in a computer program or system that causes it
to produce an incorrect or unexpected result, or to behave in
unintended ways." [1]
In this case FALSE
is an expected result and it is intended. And as
I said other languages are going the same way.
In that case, the semantics of
substr()
are just wrong. It makes
a lot more sense for a sub-string function > to silently allow
reading before the beginning and past the end > of a string.
Just because it makes more sense for you, it does not imply that
it makes more sense for everybody. In my opinion it makes sense to
return an "error code" if a function is called with invalid
arguments.
This behavior is unpredictable, and as a consequence using
substr()
properly would require a lot of painful and unnecessary up-front
checks.
I agree with you, here. With the coercive patch this will lead to
More checks. But that's different from "it is a bug" ;-)
Hi Christian,
On Fri, Feb 27, 2015 at 5:15 PM, Christian Stoller stoller@leonex.de
wrote:
In this case
FALSE
is an expected result and it is intended. And as
I said other languages are going the same way.
Other languages are also doing something saner. It's not really an argument
either way.
Just because it makes more sense for you, it does not imply that
it makes more sense for everybody. In my opinion it makes sense to
return an "error code" if a function is called with invalid
arguments.
You probably haven't read the examples that I pasted. If you think that it
makes sense to return an error code if the function is called with invalid
arguments, it should be done consistently. The current behavior is just an
absurdly inconsistent, it is really hard to find an argument in favor of
it, other than the argument in favor of backward compatibility.
Damien
You probably haven't read the examples that I pasted. If you think that it
makes sense to return an error code if the function is called with invalid
arguments, it should be done consistently. The current behavior is just an
absurdly inconsistent, it is really hard to find an argument in favor of
it, other than the argument in favor of backward compatibility.
That some changes to the logic of substr have been made is a fact.
Negative values were added to read back from the end of the string a
function that has been played with over time,
It is NOT OK to start reading past the end of the string, no, no. Bad
programmer.
If there are no characters left ... the string is empty.
But it is OK to ask for a bigger slice than the length of the string.
Yes ... length is the MOST it will return, if less are available then
that is all you can get.
It is also OK to read past the begining of the string, why not.
If the reading point does not cut characters then yes this is
consistent. If the length requested was '2', then I would expect 'a' and
false which I think was what the 'bug fix' for PHP5.2.2 was supposed to
do, but it was reverted later in 5.2.7
On the other hand, it is NOT OK to stop reading before the end of the
string. Bad programmer.
Take 2 characters off the string ... empty string.
This may not be what YOU want substr to do and it would perhaps be
useful to ADD additional checks so that 'false' is returned when it
can't created a string because of the 'invalid arguments', but type
hints makes no difference to part of the jigsaw. What happens is exactly
what one would expect ... nothing left in the string -> an empty string.
Having then to check every time for a '0' string length got shortened to
simply being able to check 'is there a string' yes/no, and if no you can
do something else. There is nothing inconsistent ...
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi Lester,
This may not be what YOU want substr to do and it would perhaps be
useful to ADD additional checks so that 'false' is returned when it
can't created a string because of the 'invalid arguments', but type
hints makes no difference to part of the jigsaw. What happens is exactly
what one would expect ... nothing left in the string -> an empty string.
Having then to check every time for a '0' string length got shortened to
simply being able to check 'is there a string' yes/no, and if no you can
do something else. There is nothing inconsistent ...
Please, read the examples again, the current behavior is nothing but
inconsistent:
substr("a", 1) => FALSE
substr("a", -300) => ""
But anyway, that's not even the point. The point is that return values that
"worked" in a world of transparent type-jungling will not work in a world
of stricter typing checks. Which means that we need:
(1) To accept that for now casting FALSE
to the empty string is the right
thing to do;
until / unless:
(2) We fixed all the common functions in the standard library so that they
stop returning inconsistent types, in particular in the cases where it
should not even be a debate what is the right thing to do (for substr, if
the arguments point to an empty slice, return the empty string).
Damien
Hi Lester,
On Fri, Feb 27, 2015 at 6:51 PM, Lester Caine <lester@lsces.co.uk
mailto:lester@lsces.co.uk> wrote:This may not be what YOU want substr to do and it would perhaps be useful to ADD additional checks so that 'false' is returned when it can't created a string because of the 'invalid arguments', but type hints makes no difference to part of the jigsaw. What happens is exactly what one would expect ... nothing left in the string -> an empty string. Having then to check every time for a '0' string length got shortened to simply being able to check 'is there a string' yes/no, and if no you can do something else. There is nothing inconsistent ...
Please, read the examples again, the current behavior is nothing but
inconsistent:substr("a", 1) =>
FALSE
substr("a", -300) => ""
? That was the case prior to PHP5.2.1
The fixes in 5.2.2 were not commonly accepted but give false for both,
but 5.2.7 and later give false and "a" which is what was the preferred
result at the time. Unless we are seeing something different, I'm only
seeing "a" or false for a current output of all your examples.
But anyway, that's not even the point. The point is that return values
that "worked" in a world of transparent type-jungling will not work in a
world of stricter typing checks. Which means that we need:(1) To accept that for now casting
FALSE
to the empty string is the
right thing to do;until / unless:
(2) We fixed all the common functions in the standard library so that
they stop returning inconsistent types, in particular in the cases where
it should not even be a debate what is the right thing to do (for
substr, if the arguments point to an empty slice, return the empty string).
That side I totally agree with. The 'fixes' posted so far such as adding
(string) or replacing false by 0 should be enough to prevent adoption,
and personally I find the strict option just as problematic.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi Lester,
Please, read the examples again, the current behavior is nothing but
inconsistent:substr("a", 1) =>
FALSE
substr("a", -300) => ""? That was the case prior to PHP5.2.1
The fixes in 5.2.2 were not commonly accepted but give false for both,
but 5.2.7 and later give false and "a" which is what was the preferred
result at the time. Unless we are seeing something different, I'm only
seeing "a" or false for a current output of all your examples.
I meant "a", but you are right, it's a bit less inconsistent than I
thought. The current behavior could be defined as "if the resulting slice
is so that (start index <= end index) and (either start index or end index
is in bound) return the slice, else return FALSE", which is not very useful
but not so bad anyway.
The only real annoyance is that the check is strict on the right bound, so
that:
substr("abcd", 5) => `FALSE`
while:
substr("abcd", -10, -4) => ""
That explains one of the E_DEPRECATED
triggered by Drupal 7, because when
you want to remove a prefix from a string, you often do:
if (substr($str, 0, strlen($prefix)) {
$str = substr($str, strlen($prefix))
}
But this currently returns FALSE
when $str == $prefix.
(Obviously, the most useful behavior would be to return a string in all
cases, like for example, Python.)
Damien
Hi Lester,
On Fri, Feb 27, 2015 at 9:53 PM, Lester Caine <lester@lsces.co.uk
mailto:lester@lsces.co.uk> wrote:> Please, read the examples again, the current behavior is nothing but > inconsistent: > > substr("a", 1) => `FALSE` > substr("a", -300) => "" ? That was the case prior to PHP5.2.1 The fixes in 5.2.2 were not commonly accepted but give false for both, but 5.2.7 and later give false and "a" which is what was the preferred result at the time. Unless we are seeing something different, I'm only seeing "a" or false for a current output of all your examples.
I meant "a", but you are right, it's a bit less inconsistent than I
thought. The current behavior could be defined as "if the resulting
slice is so that (start index <= end index) and (either start index or
end index is in bound) return the slice, else return FALSE", which is
not very useful but not so bad anyway.The only real annoyance is that the check is strict on the right bound,
so that:substr("abcd", 5) => `FALSE`
while:
substr("abcd", -10, -4) => ""
This is the where the -ve logic stuff was changed. If you try -5 rather
than -4 things swap around versions wise. The 'problem' is where the
actual buffer length goes to 0 and then -ve. There still seem to be a
few edge cases that are not correctly caught, but then currently "" and
false default to false anyway so there is ONLY a problem if one DOES
remove the weak casting.
That explains one of the
E_DEPRECATED
triggered by Drupal 7, because
when you want to remove a prefix from a string, you often do:if (substr($str, 0, strlen($prefix)) { $str = substr($str, strlen($prefix)) }
But this currently returns
FALSE
when $str == $prefix.
(Obviously, the most useful behavior would be to return a string in all
cases, like for example, Python.)
If you only want strings then OK, but in that case how would you rework
the drupal problem anyway? Although I'd not start from that if statement
anyway if there is a chance that the $str and $prefix ARE the same
string? The current if statement is what is wrong not the second substr.
if ( strlen($str) > strlen($prefix) ) {
$str = substr($str, strlen($prefix))
} else { // nothing left }
But how do you know that the string contains the prefix? Yes the legacy
code base needs tidying up and is full of bugs, but we often just need
to tidy up the logic ...
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
De : Damien Tournoud [mailto:damz@damz.org]
But anyway, that's not even the point. The point is that return values that
"worked" in a world of transparent type-jungling will not work in a world
of stricter typing checks. Which means that we need:(1) To accept that for now casting
FALSE
to the empty string is the right
thing to do;
No. Add an explicit for false. We want programs to be cleaner, not encourage casting.
until / unless:
(2) We fixed all the common functions in the standard library so that they
stop returning inconsistent types, in particular in the cases where it
should not even be a debate what is the right thing to do (for substr, if
the arguments point to an empty slice, return the empty string).
That's history. Propose an RFC to fix substr()
behavior (good luck :)
Yes, we propose to deprecate potentially-problematic behaviors. That's not because we are running behind Anthony's RFC, as some may believe, but we sincerely think, at the end, it will bring more good than bad to PHP overall. You will have at least 5 years to fix your code. PHP is evolving. You like it or not. If you don't, you vote against the feature. That's democracy, which is not always the case for other languages. Can I say more ?
François
From: Damien Tournoud [mailto:damz@damz.org], Sent: Friday, February 27, 2015 4:54 PM
Hi Christian,
It is not a bug.
FALSE
as a return value ofsubstr()
is the identificator
for an error (e.g. invalid arguments), as it is stated in the documentation:
[...]"It is documented that way" and "it is not a bug" are two very different things.
That’s not true. Quoting Wikipedia "A software bug is an error, flaw,
failure, or fault in a computer program or system that causes it
to produce an incorrect or unexpected result, or to behave in
unintended ways." [1]In this case
FALSE
is an expected result and it is intended. And as
I said other languages are going the same way.
Q: How many MicroSoft engineers does it take to change a light bulb?
A: None. Bill Gates will just redefine Darkness(TM) as the new industry
standard.
--Larry Garfield
(Sorry, it was just such an obvious opening...)
De : Damien Tournoud [mailto:damz@damz.org]
All those are due to a bug in
substr()
, that we see now only thanks to
proper type identification. There is no reason forsubstr()
to ever return
a boolean. It really needs to be fix to always return a string.
Sorry, not a bug. Documentation is clear. You get 'string|false' from substr()
and send it to a function expecting 'string'. Most languages will fail on this. It worked in PHP because of implicit casting from bool to string. We just decided to deprecate this implicit cast.
Now, we can discuss about substr()
, whether it should always return string or not. But that's another subject.
Regards
François
Sorry, not a bug. Documentation is clear. You get 'string|false' from
substr()
and send it to a function expecting 'string'. Most languages will fail on this. It worked in PHP because of implicit casting from bool to string. We just decided to deprecate this implicit cast.Now, we can discuss about
substr()
, whether it should always return string or not. But that's another subject.
Isn't this the whole crux of the problem?
The strict/coercive world fix for 'is there a string left' is to create
an error which has to be handled, when workflow wise simply branching on
the empty string is the correct action. I can see that some people need
the empty string rather than the 'false' but THAT is the sort of thing
that 'Coercive' should be handling rather than dictating that it is now
an error.
This is fundamental to the nature of PHP and dismissing a basic premiss
'another subject' is exactly what both 'camps' are currently doing. I
want to maintain the ability to write code that runs in sequence, that
is what a script language should do. It is NOT designed to be compiled,
but processed and having to compile sections to create new functions
such as exception returns which may never be used is wrong, just as
trying to optimise something that may need the now optimized out element
next time a bit of code is used is equally wrong. If you want the
ultimate fastest performance just use C or one of the other fully
compiled languages an live with the 'delay when you first run it'.
Ultimately PHP works because it returns the type of object you need at
the time ... fixed single types are not what PHP is ... and that is the
type of PHP I want but I seem to be in an obsolete minority? I only
need the current build of PHP7 because it puts back speed lost due to
other changes but it is most definitely heading somewhere I don't want
to be.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hey Zeev,
Another change being considered and not yet in the RFC is re-allowing
leading and trailing spaces for numeric strings (sorry Paddy.)
I think that rejecting leading and trailing spaces for stringy ints is for the best.
If I only want to accept an integer (in either int or string form), then by enabling
spaces in that input, it would destroy the overall usability of it.
One scenario I have in mind for this is validating $_GET information for a RESTful web
service. Having potentially an infinite number of URIs that all point to the same resource
isn't good:
/users/1
/users/%201
/users/1%20
/users/%201%20
/users/%201%20%20
etc.
In this case, I don't want to accept any leading or trailing spaces for the user ID. So I
would therefore not be able to use an int
type hint because its acceptance rules would
be too lax.
If spaces are not accepted in stringy ints, and I want to pass a stringy int that may have
leading or trailing spaces in it, then I know I can simply apply a trim()
to it before passing
it into a function that's expecting only an int. This way, the usability of integer-only inputs
(as string or ints) isn't compromised by the rules being too weak.
The takeaway from this seems to be that the approach of tightening the
existing rule-set and applying it to both internal functions and new
user-land code is viable, and does not cause the sky to come down falling.
Preliminary tests suggest it actually finds real potential problems.More tomorrow.
Overall, a big +1 on this though! The type coercion rules at the moment are far
too lax to be considered useful in most situations IMO.
Zeev
Thanks,
Tom
-----Original Message-----
From: Thomas Punt [mailto:tpunt@hotmail.co.uk]
Sent: Sunday, March 01, 2015 4:03 PM
To: Zeev Suraski; PHP internals
Subject: RE: [PHP-DEV] Coercive STH - some real world tests and updated
RFCHey Zeev,
Another change being considered and not yet in the RFC is re-allowing
leading and trailing spaces for numeric strings (sorry Paddy.)I think that rejecting leading and trailing spaces for stringy ints is
for the best.
If I only want to accept an integer (in either int or string form), then
by
enabling spaces in that input, it would destroy the overall usability of
it.One scenario I have in mind for this is validating $_GET information for
a
RESTful web service. Having potentially an infinite number of URIs that
all
point to the same resource isn't good:/users/1
/users/%201
/users/1%20
/users/%201%20
/users/%201%20%20
etc.In this case, I don't want to accept any leading or trailing spaces for
the user
ID. So I would therefore not be able to use anint
type hint because
its
acceptance rules would be too lax.If spaces are not accepted in stringy ints, and I want to pass a stringy
int that
may have leading or trailing spaces in it, then I know I can simply
apply a
trim()
to it before passing it into a function that's expecting only an
int. This
way, the usability of integer-only inputs (as string or ints) isn't
compromised
by the rules being too weak.
Tom,
First of all thanks for the feedback! I think that with leading/trailing
spaces we could go either way. It boils down to the question of whether
we want spaces to be explicitly trim()
'd and have the rule-set more
restrictive, or have the rule-set more permissive - preventing some use
cases from using it and having to do manual validation instead. Based on
the fact this has been a source of back & forth changes
(twitter.com/AndreaFaulds/status/570979956349665281), there's not going to
be a one size fits all rule-set here. I think that the use cases where it
will be harmless to ignore leading/trailing spaces would be more common
than those where it's risky or undesired...
The takeaway from this seems to be that the approach of tightening the
existing rule-set and applying it to both internal functions and new
user-land code is viable, and does not cause the sky to come down
falling.
Preliminary tests suggest it actually finds real potential problems.More tomorrow.
Overall, a big +1 on this though! The type coercion rules at the moment
are
far too lax to be considered useful in most situations IMO.
Thanks for the feedback!
Zeev
One scenario I have in mind for this is validating $_GET information for a
RESTful web service. Having potentially an infinite number of URIs that all
point to the same resource isn't good:/users/1
/users/%201
/users/1%20
/users/%201%20
/users/%201%20%20
etc.In this case, I don't want to accept any leading or trailing spaces for the user
ID. So I would therefore not be able to use anint
type hint because its
acceptance rules would be too lax.
If spaces are not accepted in stringy ints, and I want to pass a stringy int that
may have leading or trailing spaces in it, then I know I can simply apply a
trim()
to it before passing it into a function that's expecting only an int. This
way, the usability of integer-only inputs (as string or ints) isn't compromised
by the rules being too weak.
Tom,First of all thanks for the feedback! I think that with leading/trailing
spaces we could go either way. It boils down to the question of whether
we want spaces to be explicitlytrim()
'd and have the rule-set more
restrictive, or have the rule-set more permissive - preventing some use
cases from using it and having to do manual validation instead. Based on
the fact this has been a source of back & forth changes
(twitter.com/AndreaFaulds/status/570979956349665281), there's not going to
be a one size fits all rule-set here. I think that the use cases where it
will be harmless to ignore leading/trailing spaces would be more common
than those where it's risky or undesired...
Andrea's post highlights the fact that we did try a fix when PHP5 came
out. What it fails to add is perhaps why the change was reverted? Still
today there are people pressuring to have it restored? As this thread
has shown. At the end of the day, this fine tuning of casting action has
very little to do with the type hinting debate? Now IS the time to
discuss the rules but not as part of some other RFC? It deserves it's
own independent discussion as it SHOULD apply what ever happens over
type hinting.
Thomas's example in my book is where a number of things come into play?
My first thought would be just where is this actually processed and so
where is it trimmed? Additionally how about '001'? However it does
highlight that a single 'int' hint is not going to satisfy everybody
anyway? The thing perhaps to point out here is that we are looking in
this case at a source that may well be using unicode and wonder if THAT
may not be more of a problem here?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
-----Original Message-----
From: Lester Caine [mailto:lester@lsces.co.uk]
Sent: Monday, March 02, 2015 11:31 AM
To: internals@lists.php.net
Subject: Re: [PHP-DEV] Coercive STH - some real world tests and updated
RFCAndrea's post highlights the fact that we did try a fix when PHP5 came
out.
What it fails to add is perhaps why the change was reverted? Still today
there
are people pressuring to have it restored? As this thread has shown. At
the
end of the day, this fine tuning of casting action has very little to do
with the
type hinting debate? Now IS the time to discuss the rules but not as part
of
some other RFC? It deserves it's own independent discussion as it SHOULD
apply what ever happens over type hinting.
The whole discussion is presently limited to the area of type hints, and
this is unlikely to change for 7.0. I think it's good to think about post
7.0 changes - perhaps bringing the rule-set to other areas of PHP (namely
implicit casting), but no, it's not the time to seriously discuss this IMHO.
Thomas's example in my book is where a number of things come into play?
My first thought would be just where is this actually processed and so
where
is it trimmed? Additionally how about '001'? However it does highlight
that a
single 'int' hint is not going to satisfy everybody anyway? The thing
perhaps
to point out here is that we are looking in this case at a source that may
well
be using unicode and wonder if THAT may not be more of a problem here?
Whether we have weak, strict, coercive or yellow-red type hints added to
PHP, they're absolutely not supposed to satisfy everyone. They're supposed
to satisfy most use cases, but not all - for the remaining X percent, you'd
have to write a bit of custom code, which is fine (either do away with the
hint and do custom validation, or use the hint - but do a bit of
preprocessing in the calling code). It's not as if everything we deal with
in scalars is going to fit into the various scalar type hint buckets anyway.
Functions that accept URLs, or emails, or other types of
scalar-but-specifically-formatted data have to be validated with custom code
anyway. It's fine for other cases to require a bit of custom validation -
as long as the key use cases are covered.
I'm obviously biased but I believe that the coercive rules actually cover a
lot more ground than the weak+strict sets. In other words, with
weak+strict, overall, you're going to have to add custom code in a lot more
cases than with the coercive rule set (either because weak is too weak and
strict will very frequently be too strict, while coercive provides a
rule-set that accepts sensible values and reject non-sensible ones, making
it work out of the box in most cases). But even with coercive, it's
definitely not a one size fits all solution, and it's not supposed to be
either. You still have custom code available to you.
Zeev
I'm obviously biased but I believe that the coercive rules actually cover a
lot more ground than the weak+strict sets. In other words, with
weak+strict, overall, you're going to have to add custom code in a lot more
cases than with the coercive rule set (either because weak is too weak and
strict will very frequently be too strict, while coercive provides a
rule-set that accepts sensible values and reject non-sensible ones, making
it work out of the box in most cases). But even with coercive, it's
definitely not a one size fits all solution, and it's not supposed to be
either. You still have custom code available to you.
But since the horse is now out of the stable, we have to live with
strict anyway, so where will coercive rules fit in if they get accepted
as well? All of this is just creating different rules for sub-sets of
users :(
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
-----Original Message-----
From: Lester Caine [mailto:lester@lsces.co.uk]
Sent: Monday, March 02, 2015 12:10 PM
To: Zeev Suraski
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Coercive STH - some real world tests and updated
RFCI'm obviously biased but I believe that the coercive rules actually
cover a lot more ground than the weak+strict sets. In other words,
with
weak+strict, overall, you're going to have to add custom code in a lot
weak+more
cases than with the coercive rule set (either because weak is too weak
and strict will very frequently be too strict, while coercive provides
a rule-set that accepts sensible values and reject non-sensible ones,
making it work out of the box in most cases). But even with coercive,
it's definitely not a one size fits all solution, and it's not
supposed to be either. You still have custom code available to you.But since the horse is now out of the stable, we have to live with strict
anyway, so where will coercive rules fit in if they get accepted as well?
All of
this is just creating different rules for sub-sets of users :(
As Anthony proposed if both pass, we're going to have a vote to decide
between them, as they do contradict each other. Note that we're not going
to know the status of either vote with certainty until March 13th.
Zeev