Hello internals,
The implementation for the RFC still hasn't landed due to some helpful
remarks made by Tyson Andre.
The issues lie not with the core functionality itself but how to amend
extensions which have a notion of numeric string literals. I have added
test cases for the extension but some of the behaviour is rather
surprising. I'll try to detail them below, but they are all hopefully
covered with a test case in my PR. [1]
GMP:
According to the GMP extension the following strings are valid numbers:
var_dump(gmp_init('0x'));
var_dump(gmp_init('0X'));
var_dump(gmp_init('0b'));
var_dump(gmp_init('0B'));
all evaluate to 0, but
var_dump(gmp_init(''))
Is not and will throw a TypeError.
Filter:
According to the filter extension the following strings are valid numbers
var_dump(filter_var('0x', FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_HEX));
var_dump(filter_var('0X', FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_HEX));
and evaluate to 0, but, the following octals
var_dump(filter_var('O', FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_OCTAL));
var_dump(filter_var('', FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_OCTAL));
Are invalid and will evaluate to false, the case '0' is debatable if it
should be considered a valid integer, but the following case is also
invalid according to the filter extension:
var_dump(filter_var("010", FILTER_VALIDATE_INT));
As it is interpreted as an octal number and not decimal.
Base conversion functions in standard math lib:
We'll be looking at base_convert()
as it exhibits the same behaviour than
bindec()
, octdec()
, and hexdec()
(except for one case which will be covered
later)
// Binary to decimal:
var_dump(base_convert('0b', 2, 10));
var_dump(base_convert('0B', 2, 10));
var_dump(base_convert('', 2, 10));
// Octal to decimal:
var_dump(base_convert('0o', 8, 10));
var_dump(base_convert('0O', 8, 10));
var_dump(base_convert('', 8, 10));
// Hexadecimal to decimal
var_dump(base_convert('0x', 16, 10));
var_dump(base_convert('0X', 16, 10));
var_dump(base_convert('', 16, 10));
These all evaluate to 0 (for base_convert it will be a string and thus "0",
for the explicit functions it will return an integer).
Now, onto the weird special case, which looks like a bug in the
implementation of base_convert()
as
var_dump(base_convert('O', 8, 10));
Will emit the following deprecation warning (but only if the starting base
is 8):
Deprecated: Invalid characters passed for attempted conversion, these have
been ignored in %s on line %d
string(1) "0"
But when using octdec()
it doesn't.
As you can see, the behaviour is rather suboptimal and inconsistent. I can
see a couple of ways on how to handle this:
- Make the octal prefix behave according to the respective extension,
thus no BC, but more surprising results. - Make the octal prefix behaviour of GMP and filter extension sane (as
base_convert and co, already supports it) and leave the current behaviour
for Hex (and Binary for GMP) - Same as 2, but warn for these edge cases such that we can error out
in PHP 9 - Make BC break in PHP 8.1, and remove all the edge cases for GMP and
the Filter extension, what to do withbase_convert()
and co would still be
up to debate. - Something else?
On top of this, I wonder if for the filter extension if we should be adding
the following flags: FILTER_FLAG_ALLOW_EXPLICIT_OCTAL and
FILTER_FLAG_ALLOW_IMPLICIT_OCTAL to remove ambiguity and deprecate later
the usage of FILTER_FLAG_ALLOW_OCTAL
without at least one of the above two
flags. (one could also add a flag to allow leading 0 for decimal numbers.)
Hope to hear your thoughts about this.
Best regards,
George P. Banyard
Now, onto the weird special case, which looks like a bug in the
implementation ofbase_convert()
as
var_dump(base_convert('O', 8, 10));
Will emit the following deprecation warning (but only if the starting base
is 8):
Deprecated: Invalid characters passed for attempted conversion, these have
been ignored in %s on line %d
string(1) "0"
But when usingoctdec()
it doesn't.
Correction, this is not the case. I was mistakenly using 'O' (capital o
instead of zero) in the test case.
This does not emit a deprecation notice.
Best regards,
George P. Banyard