Hi internals!
The internal is_numeric_string 1 function is used to check whether a
string contains a number (and to extract that number).
Currently is_numeric_string also accepts hexadecimal strings 2
(apart from the normal decimal integers and doubles).
This can cause some quite odd behavior at times. E.g. string
comparisons also use is_numeric_string, resulting in the behavior:
var_dump('123' == '0x7b'); // true
In all other parts of the engine hexadecimal strings are not recognized 3:
var_dump((int) '0x7b'); // int(0)
This also causes minor problems in other parts of the engine where
is_numeric_string is used. E.g.
$string = 'abc';
var_dump($string['0xabc']); // string("a")
// 0xabc is first accepted as a number by is_numeric_string, but then
cast to 0 by convert_to_long
But:
$string = 'abc';
var_dump($string['0abc']);
// outputs (as expected):
Notice: A non well formed numeric value encountered in /code/8KXrYZ on line 9
NULL
In my eyes accepting hex strings in is_numeric_string leads to a quite
big WTF effect and causes problems and as such should be dropped.
I don't think this has much BC impact, so it should be possible to change it.
Nikita
hi!
[3]: http://www.php.net/manual/en/language.types.string.php#language.types.string.conversion
From the manual:
"If the string starts with valid numeric data, this will be the value
used. Otherwise, the value will be 0 (zero). Valid numeric data is an
optional sign, followed by one or more digits (optionally containing a
decimal point), followed by an optional exponent. The exponent is an
'e' or 'E' followed by one or more digits. "
So no problem from to change this confusing behavior.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
2012/4/17 Nikita Popov nikita.ppv@googlemail.com
var_dump('123' == '0x7b'); // true
In all other parts of the engine hexadecimal strings are not recognized
[3]:var_dump((int) '0x7b'); // int(0)
Hi, Nikita
I personally would rather change the type-conversion for strings to integer ...
At least if you force it to do a type-cast (in other words: forcing to
get any valuable integer of that string) ...
Bye
Simon
On Tue, 17 Apr 2012 13:35:48 +0200, Simon Schick
simonsimcity@googlemail.com wrote:
2012/4/17 Nikita Popov nikita.ppv@googlemail.com
var_dump('123' == '0x7b'); // true
In all other parts of the engine hexadecimal strings are not recognized
[3]:var_dump((int) '0x7b'); // int(0)
I personally would rather change the type-conversion for strings to
integer ...
At least if you force it to do a type-cast (in other words: forcing to
get any valuable integer of that string) ...
I think that would be an error. As was mentioned a few months ago when 0b
was introduced, no other number format has this behavior. You can't do
"123" == "0b10" or "123" == "0876". Extending this hexadecimal oddity
instead of eliminating it is inconsistent with the treatment given to
those other formats.
--
Gustavo Lopes
2012/4/17 Gustavo Lopes glopes@nebm.ist.utl.pt:
I think that would be an error. As was mentioned a few months ago when 0b
was introduced, no other number format has this behavior. You can't do "123"
== "0b10" or "123" == "0876". Extending this hexadecimal oddity instead of
eliminating it is inconsistent with the treatment given to those other
formats.--
Gustavo Lopes
Hi, Gustavo
That's something I didn't know of ... if we're doing that, it should,
of course, be also be done for the dual system.
The only thing I wonder about is the code examples you're giving ...
I would expect this to work if we start to change something here:
var_dump((int) '0x7b'); // int(123)
var_dump((int) '0b1111011'); // int(123)
var_dump((int) '0123'); // int(123)
The last example was not mentioned here before but as you set in an
example, I did it here as well ...
Bye
Simon
2012/4/17 Simon Schick simonsimcity@googlemail.com
Hi, Gustavo
That's something I didn't know of ... if we're doing that, it should,
of course, be also be done for the dual system.
The only thing I wonder about is the code examples you're giving ...I would expect this to work if we start to change something here:
var_dump((int) '0x7b'); // int(123)
var_dump((int) '0b1111011'); // int(123)
var_dump((int) '0123'); // int(123)The last example was not mentioned here before but as you set in an
example, I did it here as well ...Bye
Simon
Hi, all
As I saw now in another thread - I forgot the octal number-system
which takes 0 as prefix ... and this would change the result of my
last example:
var_dump((int) '0173'); // int(123)
This makes me quite unsure if this should be done the way I proposed ...
Here I would not expect it to happen like this.
Bye
Simon
On Tue, 17 Apr 2012 13:20:33 +0200, Nikita Popov
nikita.ppv@googlemail.com wrote:
The internal is_numeric_string [1] function is used to check whether a
string contains a number (and to extract that number).Currently is_numeric_string also accepts hexadecimal strings [2]
(apart from the normal decimal integers and doubles).[...]
In my eyes accepting hex strings in is_numeric_string leads to a quite
big WTF effect and causes problems and as such should be dropped.I don't think this has much BC impact, so it should be possible to
change it.
I think definitely has a larger BC impact than you're portraying, I can
see some people making comparisons against '0xA' instead of 0xA.
Besides, this is part of the Zend API. It's already used in many
extensions (though possibly some of these should be using a stricter
function) and changing its behavior is a stable branch is not wise:
http://lxr.php.net/opengrok/search?q=&project=PHP_TRUNK&defs=&refs=is_numeric_string
But in case, if there are no graver BC impacts, +1 for master.
--
Gustavo Lopes
On Tue, 17 Apr 2012 13:20:33 +0200, Nikita Popov nikita.ppv@googlemail.com
wrote:The internal is_numeric_string [1] function is used to check whether a
string contains a number (and to extract that number).Currently is_numeric_string also accepts hexadecimal strings [2]
(apart from the normal decimal integers and doubles).[...]
In my eyes accepting hex strings in is_numeric_string leads to a quite
big WTF effect and causes problems and as such should be dropped.I don't think this has much BC impact, so it should be possible to change
it.I think definitely has a larger BC impact than you're portraying, I can see
some people making comparisons against '0xA' instead of 0xA.
Yes, this definitely does have BC impact, but I don't think it is
particularly large.
The affected areas mainly would be:
- String comparisons using ==
- Strings passed to internal functions which accept the value through
an "l" zend_parse_parameters (functions doing manual type handling via
the Z_TYPE and convert_to_long do not accept hex already now) - The userland function is_numeric
The first two would mainly be a problem if somebody - as you already
mention - has written '0xA' == $foo style comparisons or did stuff
like round($number, '0xA'). Both cases - in my eyes - aren't
particularly probably as anyone who knows what a hex number is
probably also knows the difference between a string literal and a
number literal.
The last one is more problematic. It is explicitly documented as
accepting hexadecimal numbers. In my eyes it too should not accept
them, but I could imagine that people rely on this.
Besides, this is part of the Zend API. It's already used in many extensions
(though possibly some of these should be using a stricter function) and
changing its behavior is a stable branch is not wise:http://lxr.php.net/opengrok/search?q=&project=PHP_TRUNK&defs=&refs=is_numeric_string
I've already looked at some of these and in most (all?) cases the
intended behavior seems to be to not allow hex (passing hex in those
situations actually creates some kind of broken behavior).
Nikita
I don't think this has much BC impact, so it should be possible to change it.
Same here, i never even knew that this worked in a string context
until recently. Autocast/comparison rules are already complicated
enough as they are documented now, and i failed to find anything
in the manual that would actually say that hex in a string
context is support to work at all ...
I can't really judge the BC implications though, so the best way would
be to start throwing E_DEPRECATED
warnings for now ... or maybe go the
X11 way of "deliberately break obscure feature and see how many
complaints we get" ;)
--
hartmut
Same here, i never even knew that this worked in a string context
until recently. Autocast/comparison rules are already complicated
enough as they are documented now, and i failed to find anything
in the manual that would actually say that hex in a string
context is support to work at all ...
Would this end up changing the behavior of the user land is_numeric()
function? The behavior actually is documented under that function:
"Finds whether the given variable is numeric. Numeric strings consist of [...]. Hexadecimal notation (0xFF) is allowed too but only without sign, decimal and exponential part."
If so, although this does technically break BC in that case, I for one will not miss it. The only effect this will have on our code is to make validation of numeric input much easier and less error-prone.
--
Bob Williams
Sent from my iPad
Notice: This communication, including attachments, may contain information that is confidential. It constitutes non-public information intended to be conveyed only to the designated recipient(s). If the reader or recipient of this communication is not the intended recipient, an employee or agent of the intended recipient who is responsible for delivering it to the intended recipient, or if you believe that you have received this communication in error, please notify the sender immediately by return e-mail and promptly delete this e-mail, including attachments without reading or saving them in any manner. The unauthorized use, dissemination, distribution, or reproduction of this e-mail, including attachments, is prohibited and may be unlawful. If you have received this email in error, please notify us immediately by e-mail or telephone and delete the e-mail and the attachments (if any).