Something for the weekend (flamewar)
http://www.reddit.com/r/programming/comments/s6477/9223372036854775807_9223372036854775808/
http://news.ycombinator.com/item?id=3832069
hi
No thanks, but read:
https://bugs.php.net/bug.php?id=54547
everything is in there.
We don't need more rant posts on this list :)
Cheers,
Something for the weekend (flamewar)
http://www.reddit.com/r/programming/comments/s6477/9223372036854775807_9223372036854775808/
http://news.ycombinator.com/item?id=3832069--
--
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
On Fri, 13 Apr 2012 11:48:10 +0200, Pierre Joye pierre.php@gmail.com
wrote:
On Fri, Apr 13, 2012 at 11:40 AM, marius adrian popa mapopa@gmail.com
wrote:Something for the weekend (flamewar)
http://www.reddit.com/r/programming/comments/s6477/9223372036854775807_9223372036854775808/
http://news.ycombinator.com/item?id=3832069
This conversation has become a quite muddled with usual controversies
about the nature of PHP. Instead of engaging in that sort of discussion,
I'll explain the situation.
Currently, when two strings are compared, PHP tries to do a numerical
comparison first. Therefore:
var_dump("010" == "10"); //true
The current implementation, however, gives up when the numbers are too
large or too small.
Zend/zend_operators.c
2049 /* Both values overflowed and have the same sign,
2050 * so a numeric comparison would be inaccurate */
var_dump("1.7976931348623157e308" ==
"1.79769313486231571e308"); //true
var_dump(1.7976931348623157e400); //INF
var_dump("1.7976931348623157e400" ==
"1.79769313486231571e400"); //false
In this case, a string comparison is done instead (memcmp).
There are other situations where the result of the comparison may be
"inaccurate" -- in the sense that two strings may be constructed as
representing different numbers, but they compare equal.
-
Comparing two different real numbers that map to the same double
precision number:
var_dump("1.9999999999999999" == "2"); //true -
Comparing two integers that are not representable as ints and that map
to the same double precision number
var_dump("9223372036854775810" == "9223372036854775811"); //true -
Comparing an integer that's representable as an int with another that's
not. That is what the bug is about.
var_dump(PHP_INT_MAX); //9223372036854775807
var_dump("9223372036854775807" == "9223372036854775808"); //true
In this case, the integer is also converted to a double; if long is 64-bit
wide, this will result in loss of precision and now the operand will
compare equal.
These two last cases are easy to detect; I wrote a patch that does that
and forces a plain string comparison.
However, taking the last case an example, this is the same that happens if
you compare:
var_dump((int)"9223372036854775807" == (double)"9223372036854775808");
//true
So in that respect the current behavior is consistent with the comparison
that would take place if the numeric strings were to be converted to ints
or doubles (following the usual rules to choose between the two, i.e., if
there's no decimal place or exponent, take an int unless it's too large in
absolute value).
Whether the current behavior is correct depends on whether you consider
(string)"9223372036854775808" to represent an integer value that, not
being representable as int in PHP, should preclude a numeric comparison,
or if you consider (string)"9223372036854775808" to be the same as
(double)"9223372036854775808" for numeric comparison purposes. And I'm not
sure here.
--
Gustavo Lopes
Hi!
There are other situations where the result of the comparison may be
"inaccurate" -- in the sense that two strings may be constructed as
representing different numbers, but they compare equal.
- Comparing two different real numbers that map to the same double
precision number:
var_dump("1.9999999999999999" == "2"); //true
For floats, there's no accurate comparison anyway, it is a known fact.
However for edge cases like one mentioned in the subject, I think it may
make sense to make an exception, since it indeed is kind of obscure why
it works this way and I do not see why this behavior - while having its
(obscure) reasons - would be useful to anybody.
However, taking the last case an example, this is the same that happens if
you compare:
var_dump((int)"9223372036854775807" == (double)"9223372036854775808");
//true
This, however is a different case since you explicitly coerce the types
and you must know that both conversions are lossy. It's like doing
substr($a, 0, 1) == substr($b, 0, 1) - of course it can return true even
if $a and $b different. When you convert bigger type (string) to smaller
type (int) you must accept the potential loss or check for it if it's
important.
However I think it would make sense not to use this conversion in string
comparisons when we know it's lossy - it seems to be outside of the use
case for such comparisons and it seems apparent by now that it is hard
for people to understand why it works this way.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
However I think it would make sense not to use this conversion in string
comparisons when we know it's lossy - it seems to be outside of the use
case for such comparisons and it seems apparent by now that it is hard
for people to understand why it works this way.
Yup, I agree, if the type juggling loses data we should skip the juggle
and do a direct comparison which I think is what Gustavo's patch does.
If people explicitly cast things then the cast, even if it is lossy,
should happen.
-Rasmus
However I think it would make sense not to use this conversion in string
comparisons when we know it's lossy - it seems to be outside of the use
case for such comparisons and it seems apparent by now that it is hard
for people to understand why it works this way.
Yup, I agree, if the type juggling loses data we should skip the juggle
and do a direct comparison which I think is what Gustavo's patch does.
If people explicitly cast things then the cast, even if it is lossy,
should happen.-Rasmus
Sadly when I proposed this many months ago my post was ignored. :-(
Just chiming in to say +1 for this type of operation. This problem was
extremely frustrating and took a while to figure out exactly what was
going on, as I had wrongly assumed casting to string at the comparison
would cause a strcmp.
-Matt
Hi!
Sadly when I proposed this many months ago my post was ignored. :-(
I've looked up your post
(http://marc.info/?l=php-internals&m=130348253124215&w=2 if anybody's
interested) and unfortunately I think it did not have due attention
because it mixed two issues - converting comparison that == does (which
is OK), and information loss in border cases on such comparison (which
is not OK). Of course, it would be better if we understood that the
latter is the root of your problems but apparently nobody understood
that then.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
On Fri, 13 Apr 2012 18:09:24 +0200, Stas Malyshev smalyshev@sugarcrm.com
wrote:
There are other situations where the result of the comparison may be
"inaccurate" -- in the sense that two strings may be constructed as
representing different numbers, but they compare equal.
- Comparing two different real numbers that map to the same double
precision number:
var_dump("1.9999999999999999" == "2"); //trueFor floats, there's no accurate comparison anyway, it is a known fact.
However, you are not comparing floats, you're comparing strings. As I
showed, floats in strings are already treated differently depending on
whether they're in string form or not (1e400 == 1e400 vs "1e400" ==
"1e400"). What's under discussion is once again whether to treat
distinctly a proper integer from a integer in string form.
[...]
However, taking the last case an example, this is the same that happens
if
you compare:
var_dump((int)"9223372036854775807" == (double)"9223372036854775808");
//trueThis, however is a different case since you explicitly coerce the types
and you must know that both conversions are lossy. It's like doing
substr($a, 0, 1) == substr($b, 0, 1) - of course it can return true even
if $a and $b different. When you convert bigger type (string) to smaller
type (int) you must accept the potential loss or check for it if it's
important.
However I think it would make sense not to use this conversion in string
comparisons when we know it's lossy - it seems to be outside of the use
case for such comparisons and it seems apparent by now that it is hard
for people to understand why it works this way.
First, I don't think this discussion gets any clearer by using ambiguous
terms such as "lossy" and saying "lossy is bad". Is (int) " 02" a lossy
conversion -- you lose the space and 0? What about even (float) "1" -- 1.
is mapped from a infinite number of real numbers due to rounding error and
you have no way to know which one was the original? And in case, I don't
think you mean that (int)"9223372036854775807" is a lossy conversion as it
results in 9223372036854775807 (depending on the width of long, of course).
(by the way, these are rhetorical questions, I don't care about
establishing a definition of "lossy" in this thread)
In any case, your selective quoting destroyed the main point of my e-mail
-- that is, this problem implicates these questions: is
"9223372036854775808" different from 9223372036854775808? Is
"9223372036854775808" still deemed to represent an integer, even though we
cannot represent it as an integer type?
I think most people can agree that this behavior is correct:
var_dump(9223372036854775807 == 9223372036854775808); //true
therefore, we need some -- principled -- distinction to treat case
"9223372036854775807" == "9223372036854775808" differently. The
distinction I propose is answering "yes" to the questions above -- they
represent different entities and when no conversion of the integer string
to the integer type can't be done we should fall back to memcmp(). This is
what is already done with the overflowing "1e400". I don't find it
particularly convincing, though.
--
Gustavo Lopes
Hi!
In any case, your selective quoting destroyed the main point of my
e-mail -- that is, this problem implicates these questions: is
"9223372036854775808" different from 9223372036854775808? Is
"9223372036854775808" still deemed to represent an integer, even
though we cannot represent it as an integer type?
Well, it is different, as it looks like from usage patterns. You can't
get int 9223372036854775808 from database or web form, but you very well
can get string "9223372036854775808".
I think most people can agree that this behavior is correct:
var_dump(9223372036854775807 == 9223372036854775808); //true
I would say, yes, this is fine.
therefore, we need some -- principled -- distinction to treat case
"9223372036854775807" == "9223372036854775808" differently. The
distinction I propose is answering "yes" to the questions above --
they represent different entities and when no conversion of the
integer string to the integer type can't be done we should fall back
to memcmp(). This is what is already done with the overflowing
"1e400". I don't find it particularly convincing, though.
I think this is the way to go, unless somebody proposes a better way to
handle it.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Currently, when two strings are compared, PHP tries to do a numerical comparison first. Therefore:
var_dump("010" == "10"); //true
I'm replying to this old mail now, because of the current discussion
about hex numbers. The phrase "numerical comparison" caught my eye
because it seems in-precise, for example:
var_dump(010 == 10); // false
Don't forget octal.
Chris