Hi Matt,
Hello Pierre,
Thanks for your reply. :-)
----- Original Message -----
From: "Pierre"
Sent: Friday, August 11, 2006
Hello,
Note that I also answer your previous mail here :)
On Fri, 11 Aug 2006 06:18:13 -0500
php_lists@realplain.com ("Matt W") wrote:
Hello again,
I discovered a couple more things is_numeric... is causing problems
with (leading whitespace). I doubt any of the examples I've given
make sense to regular users who don't know what's happening behind
the scenes. Add these to the "wrong" list:
is_numeric(' .123') // bool(false)
this one should return true.
' .123' + 0 // int(0)
' 0.123' is casted to 0, 0+0. But if the ' .123' is allowed, it should
then result in 0.123+0, which is the correct behavior.
I may be misunderstanding you, but ' 0.123'+0 results in the correct 0.123.
Just without the leading 0 that it becomes wrong. :-)
I think we should consider '.123' as 0.123. This expression is then correct.
One more thing I was curious about as far as keeping things
consistent is with is_numeric... (and therefore
convert_scalar_to_number()), hex strings are allowed/work, but not
with convert_to_long|double.
I did not check convert_* while fixing/enhancing filters, but I think
there is a higher risk of breakages if you change these functions. We
should first have a clear view of what is used where and how the
changes affect end user scripts and extensions. It sounds like an
impossible task (except for 6.0).
I was just wondering if convert_to_[long|double] should check for and allow
hex strings like convert_scalar_to_number does (because it uses
is_numeric...).
I suggest you to take a look at the ext/filter code and what we accept.
I spend a far amount of times to ask and listen to users to see what
they expext. I'm quite happy with the current state and for what I
hear, the users too.
You can check the FILTER_VALIDATE_* mode, they do the same operations
that we are discussing here. The sanitize mode only checks for
unexpected chars.
Have to admit that I'm not really familiar with any of the filter stuff. :-/
I'll keep that in mind though.
Please take a look, it really solves many of these issues.
So a few PHP functions properly
accept hex strings, but most will convert one to 0. Should anything
be done about this difference? I have an idea about allowing hex
strings in to_[long|double] using the new is_numeric... functions I
will propose.
Few things about the current is_numeric... and hex strings, which I
think I'll change in my proposal unless I hear opinions otherwise:
*) Leading whitespace isn't allowed
They should be allowed (leading/ending).
Not sure if you're talking about currently, or agreeing with me. ;-) For
these 3 points, I was only referring to hex strings. Leading space is
allowed with non-hex.
I would like to allow them for all types (float, integer or hex), it
is what I did in ext/filter.
*) A sign (±) isn't allowed
It is allowed except for in the hexadecimal notation (see the manual
page of is_numeric), so if you talk only about is_numeric and the hex
notation, it is a bug fix.
Again, only referring to hex. Ah yes, I see the manual page for
is_numeric()
. Hmm OK, not sure if you'd want that to be changed then...
The internal function would be fine ($n = -0xABC works in the parser), I
guess, but maybe sign & hex returning true with PHP's is_numeric()
is
undesired.
+/- are not allowed for hex. I think we should make the difference
between a string conversion and the parser.
(a) $a = - 0xFF
(b) $a = " - 0xFF; "
(a) is a perfectly valid expression within a script (just like
$a=-2;), however (b) is a string and will require a cast to INT. The
two cases should not be processed the same way.
(a) can be read as $a = -1; $a *= 0xFF;
(b) is only a string assignement, it will be casted to INT when
required and failed (int(0)).
*) Hex doubles don't work. I think they should (for whole numbers
only obviously, no "."). So '0xFFFFFFFFFF' + 0 for example, works on
a 32-bit system.
They should not, an hexadecimal notation represents an integer (long),
not a double. A double could be the result of a cast when it is out of
the integer range.
Well, I think of the hex notation as just a whole number (non-floating) of
whatever range/size. About the cast, yeah, I see that's what is done now in
the parser if the hex number is between LONG_MAX and ULONG_MAX -- results in
a double. hexdec()
, etc. will also return a double if needed.
Right now, since hex doubles don't work, you also have (on 32-bit):
I don't understand what you mean by hex double :) Do you mean that we
should convert out of range HEX to double in any case?
is_numeric('0x7FFFFFFF') // bool(true)
is_numeric('0x80000000') // bool(false)
If that last one can be changed, it also should be in the language
parser of course (you know, for $n = 0xFFFFFFFFFF;).
It is the endless problem about 32/64bits issues, also I don't think
you are considering to use double in a for loop? :)
In a for loop of a script? No :-), but someone may want to specify a number
in hex larger than ULONG_MAX (and it may work if they're 64-bit, then break
on 32-bit). I've not had a need for it (in parser), but I would like the
larger hex strings to work as I have code like ('0x' . $hexstr) + 0 where
$hexstr comes from a packed/binary number (after bin2hex()
) that may be any
size. I don't want to use hexdec()
because it's slower (and this is
speed-critical, in a loop) and usually the value WILL fit in a long.
That's two different things, parser and cast operations. But I agree,
it is a bit tricky to keep this difference in mind while coding. But
you should do:
$a = 0+ ('0x'.$hexstr);
Some benchmarks (amd64):
$hexstr="FFFFFF";
$iter = 1000000;
$s1 = microtime(true);
for($i=0;$i<$iter;$i++) $a=hexdec("0x".$hexstr);
$s2 = microtime(true);
echo "hexdec: " . ($s2 - $s1) . "\n";
$s1 = microtime(true);
for($i=0;$i<$iter;$i++) $a=0+("0x".$hexstr);
$s2 = microtime(true);
echo "cast: " . ($s2 - $s1) . "\n";
hexdec: 2.6401779651642
cast: 1.4510979652405
That reminds me, a "(number)" typecast would be nice to have. :-)
number is a human thing, I'm not sure it fits our needs :)
You get the idea. That's because is_numeric_string() ignores the
value from zend_strtod() if errno==ERANGE. I don't think that's
right, and it doesn't happen when convert_to_double() uses
zend_strtod():
I have to check the sources :)
Yes, it ignores the INF/ERANGE, for whatever reason. :-)
In my opinion, these issues can be considered as bugs and should be
fixed easily done without rewriting everything).
I will send a new example function to the list in a few days, after doing
some tests, etc. Its behavior should be the same, except for these bugs.
And currently, strto[l|d] is sometimes called unnecessarily -- for example,
I think '123 foo' would result in zend_strtod() after strtol() -- pretty
sure that can be avoided. You'll see soon...
Again, take a look at ext/filter, I rewrite both float and integer
(not sure if I commited "int" yet, I will check later :), without
anything of these functions. However I consider that we should first
determine what to change and where. Like separate the parser from the
cast operations (string_to_*). Also we are missing way too many tests
to valid the changes. But as I said, it is necessary to bring
consistency in this area :)
print_r(array_count_values(array(1, ' 1', ' 1 ')))
Array
(
[1] => 2
[ 1 ] => 1
)
This is typically an example of why we cannot not change the behaviors
in php5, but I definitively like to do it for php 6.x.
I e-mailed Andrei about array_count_values()
since I think it's incorrect.
If so, it should be very simple to fix -- just eliminate the use of
is_numeric_string.
The fix is certainly easy (leading spaces management), but it will
break things out there. That's what we have to avoid, imho.
P.S. I forgot to add before that I noticed a comment Derick added a few
days ago for is_numeric_string -- only in 5.2
(http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?r1=1.94.2.4.2.2&
r2=1.94.2.4.2.3&view=patch). It says it returns IS_DOUBLE if the number
didn't fit in the integer range, but that's wrong if it's INF. :-)
INF
is per definition not out of range, it is out of everything (-INF too) ;-)
Cheers,
--Pierre