Hi,
I've done a bit of refactoring work around code using "zend_result",
but I keep wondering why it even exists.
It was added in 1999 by commit 573b46022c46 in a huge 16k line commit
(as macros, converted to enum in 2012 by commit e3ef84c59bf).
In 1999, C99 was brand new, and the "bool" type had just been
introduced to the C language - yes, C was 18 years old when a native
boolean type was added - but PHP didn't switch to C99 for another 19
years later (b51a99ae358b).
C had a long history of abusing "int" as boolean type, wasting 2 or 4
bytes for something that could have easily fit in 1 byte. As if that
wasn't obscure enough, POSIX used 0 for success and -1 for error (and
missed the chance to return error codes, and instead added a global
variable for that which caused more headaches). (And don't get me
started on the horrible strcmp()
return value.) Returning errors in C
is a huge obscure and inconsistent mess; every function does it
differently.
This is PHP's original definition:
#define SUCCESS 0
#define FAILURE -1 /* this MUST stay a negative number, or it may effect functions! */
This appears to follow the POSIX school of bad error return values.
There's a comment which makes the thing even more obscure.
Really, why does it need to be negative?
Why does it imitate POSIX instead of using a boolean? (i.e. "int" and
non-zero for success for old-schoolers)
The obvious way to indicate success or failure (without giving details
about the nature of the failure) would be to just return "bool". With
C99, any discussion on "0 or 1" vs "-1 or 0" is obsolete - there is
now a canonical boolean type that should be used.
Of course, I know already that getting rid of "zend_result" in favor
of "bool" would be a major API breakage that requires careful
refactoring of almost all extensions.
I just want to understand why "zend_result" was ever invented and why
it uses those surprising integer values. The commit message and that
code comment doesn't explain it.
Rephrased: do you consider it a worthy goal to eventually get rid of
"zend_result", or do you believe it's good API design that should stay
forever?
(Yet again I wish PHP was fully C++ - unlike C, C++ has strongly-typed
enums which cannot be casted implicitly to "int" or "bool"; that makes
refactoring a lot easier and safer.)
Max
Hi,
I've done a bit of refactoring work around code using "zend_result",
but I keep wondering why it even exists.It was added in 1999 by commit 573b46022c46 in a huge 16k line commit
(as macros, converted to enum in 2012 by commit e3ef84c59bf).In 1999, C99 was brand new, and the "bool" type had just been
introduced to the C language - yes, C was 18 years old when a native
boolean type was added - but PHP didn't switch to C99 for another 19
years later (b51a99ae358b).C had a long history of abusing "int" as boolean type, wasting 2 or 4
bytes for something that could have easily fit in 1 byte. As if that
wasn't obscure enough, POSIX used 0 for success and -1 for error (and
missed the chance to return error codes, and instead added a global
variable for that which caused more headaches). (And don't get me
started on the horriblestrcmp()
return value.) Returning errors in C
is a huge obscure and inconsistent mess; every function does it
differently.This is PHP's original definition:
#define SUCCESS 0
#define FAILURE -1 /* this MUST stay a negative number, or it may effect functions! */This appears to follow the POSIX school of bad error return values.
There's a comment which makes the thing even more obscure.Really, why does it need to be negative?
Why does it imitate POSIX instead of using a boolean? (i.e. "int" and
non-zero for success for old-schoolers)The obvious way to indicate success or failure (without giving details
about the nature of the failure) would be to just return "bool". With
C99, any discussion on "0 or 1" vs "-1 or 0" is obsolete - there is
now a canonical boolean type that should be used.Of course, I know already that getting rid of "zend_result" in favor
of "bool" would be a major API breakage that requires careful
refactoring of almost all extensions.I just want to understand why "zend_result" was ever invented and why
it uses those surprising integer values. The commit message and that
code comment doesn't explain it.Rephrased: do you consider it a worthy goal to eventually get rid of
"zend_result", or do you believe it's good API design that should stay
forever?(Yet again I wish PHP was fully C++ - unlike C, C++ has strongly-typed
enums which cannot be casted implicitly to "int" or "bool"; that makes
refactoring a lot easier and safer.)Max
Any type that has only two values is isomorphic to a boolean. However, for us humans, not all two-valued types are semantically equivalent.
If you have a function like zend_hash_exists(), true and false are directly meaningful values.
If you have a function like zend_stream_open_function(), SUCCESS and FAILURE are directly meaningful values.
Now, if you make zend_stream_open_function() return a boolean instead, it's no longer clear what the return value means. Does a true return value mean that the stream was opened successfully, or that it failed?
For a PHP programmer, that might sound silly -- of course true means success. However, in C the common error reporting convention is actually the other way around: Non-zero return values indicate failure. This means that false indicates success and true indicates failure. (I'm not kidding you -- I'm literally working on a project that uses boolean return values with this convention right now.)
The current guideline for use of bool and zend_result in php-src is that bool is an appropriate return value for "is" or "has" style functions, which return a yes/no answer. zend_result is an appropriate return value for functions that perform some operation that may succeed or fail.
I think that's a pretty reasonable state of things, and don't think there is a need to change it.
Regards,
Nikita
If you have a function like zend_stream_open_function(), SUCCESS and FAILURE are directly meaningful values.
Agree, but that doesn't explain why FAILURE needs to be negative.
The current guideline for use of bool and zend_result in php-src is that bool is an appropriate return value for "is" or "has" style functions, which return a yes/no answer. zend_result is an appropriate return value for functions that perform some operation that may succeed or fail.
What does the return value of these functions mean?
- zend_make_printable_zval()
- zend_make_callable()
- zend_parse_arg_bool()
- zend_fiber_init_context()
- zend_observer_remove_begin_handler()
- php_execute_script()1
If I understand the guideline correctly, then those examples (and
countless others) are defective and should be fixed, correct?
Max
If you have a function like zend_stream_open_function(), SUCCESS and FAILURE are directly meaningful values.
Agree, but that doesn't explain why FAILURE needs to be negative.
I expect that there are two main reasons for that:
- There are probably some places that return a (non-negative) value or FAILURE.
- There are probably some places that check for success/failure using >= 0 and < 0. Another POSIX-ism.
I don't think we endorse such usage, but it likely exists.
Let me turn the question around: Is there some reason to change the value of FAILURE at this point?
The current guideline for use of bool and zend_result in php-src is that bool is an appropriate return value for "is" or "has" style functions, which return a yes/no answer. zend_result is an appropriate return value for functions that perform some operation that may succeed or fail.
What does the return value of these functions mean?
- zend_make_printable_zval()
- zend_make_callable()
- zend_parse_arg_bool()
- zend_fiber_init_context()
- zend_observer_remove_begin_handler()
- php_execute_script()1
If I understand the guideline correctly, then those examples (and
countless others) are defective and should be fixed, correct?
At least in principle, yes. Of course, actually doing a change may not always be worthwhile, especially as this might result in a silent behavior change for extensions (as putting the return value into an "if" would invert behavior now).
I believe Girgias has done extensive work on making the int vs bool vs zend_result situation more consistent, so you might want to coordinate with him.
Regards,
Nikita
I expect that there are two main reasons for that:
- There are probably some places that return a (non-negative) value or FAILURE.
- There are probably some places that check for success/failure using >= 0 and < 0. Another POSIX-ism.
I don't think we endorse such usage, but it likely exists.
Oh. That's .... bad.
(Again, with C++'s strictly-typed enums, this would be easy to find:
if it compiles, it's correct.)
Let me turn the question around: Is there some reason to change the value of FAILURE at this point?
Right now, I'm trying to understand this choice of number and the
obscure code comment NOT explaining it. A comprehensive understanding
of a design is an important first step for any other decision.
And indeed there would be a good practical reason to change this. For
example, on x86, the check for -1 compiles to:
83 f8 ff cmp $0xffffffff,%eax
While the check for false (or 0) compiles to:
85 c0 test %eax,%eax
On Aarch64, check for -1 is:
3100041f cmn w0, #0x1
54000060 b.eq 1c <Foo>
While false/0 compile to just one conditional branch:
35000060 cbnz w0, 3c <Foo>
For such a basic definition used everywhere, this does make a
difference.
It was rather clumsy to choose -1 for FAILURE because it leads to
less-than-optimal machine code. Larger machine code means slower
execution. For one central basic definition as zend_result that gets
used everywhere, it can make a measurable difference.
(Note that using "bool" would still be faster than using an enum with
0 and 1, but everything's better than -1.)
If I understand the guideline correctly, then those examples (and
countless others) are defective and should be fixed, correct?At least in principle, yes. Of course, actually doing a change may not always be worthwhile, especially as this might result in a silent behavior change for extensions (as putting the return value into an "if" would invert behavior now).
(... and yet again, C++ would have saved us from this pain by not
allowing implicit bool casts - sorry for annoying you with my C++
fanboyism, but any pain that stems from choosing C is self-inflicted.)
The zend_fibers thing is a fairly recent addition, yet nobody spotted
this API mistake befor merging it. I've submitted a PR to fix it:
https://github.com/php/php-src/pull/10622
Surprisingly, CODING_STANDARDS.md doesn't mention this API convention.
Maybe it should?
Max
I expect that there are two main reasons for that:
- There are probably some places that return a (non-negative) value or FAILURE.
- There are probably some places that check for success/failure using >= 0 and < 0. Another POSIX-ism.
I don't think we endorse such usage, but it likely exists.
Oh. That's .... bad.
(Again, with C++'s strictly-typed enums, this would be easy to find:
if it compiles, it's correct.)Let me turn the question around: Is there some reason to change the value of FAILURE at this point?
Right now, I'm trying to understand this choice of number and the
obscure code comment NOT explaining it. A comprehensive understanding
of a design is an important first step for any other decision.And indeed there would be a good practical reason to change this. For
example, on x86, the check for -1 compiles to:83 f8 ff cmp $0xffffffff,%eax
While the check for false (or 0) compiles to:
85 c0 test %eax,%eax
On Aarch64, check for -1 is:
3100041f cmn w0, #0x1
54000060 b.eq 1c <Foo>While false/0 compile to just one conditional branch:
35000060 cbnz w0, 3c <Foo>
For such a basic definition used everywhere, this does make a
difference
Just chiming in real quickly here for two small notes...
We could work around this issue by checking if (function() != SUCCESS)
instead of if (function() == FAILURE). Not sure if it gives any measurable
improvement though.
It's also worth noting that there's a couple of places where there's
just a check for if (function()) { failure code }. i.e. there is no
check for == FAILURE, it's just "implied". So if there would be changes
to their values, this needs to be taken into account or fixed. Although
there might be 3rd party extensions doing this kind of code as well.
It was rather clumsy to choose -1 for FAILURE because it leads to
less-than-optimal machine code. Larger machine code means slower
execution. For one central basic definition as zend_result that gets
used everywhere, it can make a measurable difference.(Note that using "bool" would still be faster than using an enum with
0 and 1, but everything's better than -1.)If I understand the guideline correctly, then those examples (and
countless others) are defective and should be fixed, correct?At least in principle, yes. Of course, actually doing a change may not always be worthwhile, especially as this might result in a silent behavior change for extensions (as putting the return value into an "if" would invert behavior now).
(... and yet again, C++ would have saved us from this pain by not
allowing implicit bool casts - sorry for annoying you with my C++
fanboyism, but any pain that stems from choosing C is self-inflicted.)The zend_fibers thing is a fairly recent addition, yet nobody spotted
this API mistake befor merging it. I've submitted a PR to fix it:
https://github.com/php/php-src/pull/10622Surprisingly, CODING_STANDARDS.md doesn't mention this API convention.
Maybe it should?Max
It's also worth noting that there's a couple of places where there's
just a check for if (function()) { failure code }. i.e. there is no
check for == FAILURE, it's just "implied".
Ouch. That's not only fragile, but also very obscure.
(Yadda yadda, C++ wouldn't even allow this, etc., you know)
If you have a function like zend_stream_open_function(), SUCCESS and
FAILURE are directly meaningful values.Agree, but that doesn't explain why FAILURE needs to be negative.
I expect that there are two main reasons for that:
- There are probably some places that return a (non-negative) value or
FAILURE.- There are probably some places that check for success/failure using >=
0 and < 0. Another POSIX-ism.I don't think we endorse such usage, but it likely exists.
Let me turn the question around: Is there some reason to change the value
of FAILURE at this point?The current guideline for use of bool and zend_result in php-src is
that bool is an appropriate return value for "is" or "has" style functions,
which return a yes/no answer. zend_result is an appropriate return value
for functions that perform some operation that may succeed or fail.What does the return value of these functions mean?
- zend_make_printable_zval()
- zend_make_callable()
- zend_parse_arg_bool()
- zend_fiber_init_context()
- zend_observer_remove_begin_handler()
- php_execute_script()1
If I understand the guideline correctly, then those examples (and
countless others) are defective and should be fixed, correct?At least in principle, yes. Of course, actually doing a change may not
always be worthwhile, especially as this might result in a silent behavior
change for extensions (as putting the return value into an "if" would
invert behavior now).I believe Girgias has done extensive work on making the int vs bool vs
zend_result situation more consistent, so you might want to coordinate with
him.Regards,
Nikita
Yeah, I spent a lot of time around the release of PHP 8.0 and a bit in
between releases to convert various int types to bool or zend_result just
to make the API clearer.
One of the main cases I missed was to change the return value of object
handlers, as those, for the most part, are meant to return zend_result.
I didn't push forward with that one as when I realized we were already in
beta releases or maybe RC and even changing it for a minor version did feel
a bit disruptive.
However, it may make sense to tackle this sooner than later if we are doing
some other object handler changes.
There are however definitely still cases where FAILURE is assumed to be -1
either because this is what the function is expected to return on failure
or checking the value of a POSIX like API.
Ideally, all cases where zend_result is used would actually indicate this
to make it possible to maybe convert SUCCESS to true and FAILURE to false
and typedef zend_result to bool.
Best,
George P. Banyard