double val to long val conversion issue

12 years ago by Remi Collet — view source

unread

Le 09/02/2013 16:10, Remi Collet a écrit :

http://git.php.net/?p=php-src.git;a=commitdiff;h=79956330fe17cfd5f60de456497541b21a89bddf

Also see https://bugs.php.net/bug.php?id=64142

12 years ago by Pierre Joye — view source

unread

hi Remi

About
http://git.php.net/?p=php-src.git;a=commitdiff;h=79956330fe17cfd5f60de456497541b21a89bddf
(For now, I have reverted this fix)

Here some explanations.

LONG_MAX is 9223372036854775807 (0x7fffffffffffffff)
double representation of LONG_MAX is 9223372036854775808

(d > LONG_MAX) is evaluated in double space.
So is false for double which have the same value than (double)LONG_MAX.

So, for (double)LONG_MAX the cast used is
(long)d

9223372036854775807 on ppc64
9223372036854775808 on x86_64 (gcc without optimization)
9223372036854775807 on x86_64 (gcc -O2)

PHP expected value is 9223372036854775808
(Btw, I don't understand why PHP, build on x86_64, with -O2, gives the
good result, some environment mystery)

Obviously, we could have different result on different platform,
compiler, architecture.

I will be very interested by result on other platform (mac, windows),
compiler (Visual C), architecture.

If we switch to the unsigned cast:
(long)(unsigned long)d;

Any comments ?

IIRC, on windows/visualC, no matter if it is x86 or x64, long is
always 32bits, so it won't change the size of long.

Cheers,

Pierre

@pierrejoye

12 years ago by Terry Ellison — view source

unread

hi Remi

About
http://git.php.net/?p=php-src.git;a=commitdiff;h=79956330fe17cfd5f60de456497541b21a89bddf
(For now, I have reverted this fix)

Here some explanations.

LONG_MAX is 9223372036854775807 (0x7fffffffffffffff)
double representation of LONG_MAX is 9223372036854775808

(d > LONG_MAX) is evaluated in double space.
So is false for double which have the same value than (double)LONG_MAX.

So, for (double)LONG_MAX the cast used is
(long)d

9223372036854775807 on ppc64
9223372036854775808 on x86_64 (gcc without optimization)
9223372036854775807 on x86_64 (gcc -O2)

PHP expected value is 9223372036854775808
(Btw, I don't understand why PHP, build on x86_64, with -O2, gives the
good result, some environment mystery)

Obviously, we could have different result on different platform,
compiler, architecture.

I will be very interested by result on other platform (mac, windows),
compiler (Visual C), architecture.

If we switch to the unsigned cast:
(long)(unsigned long)d;
Any comments ?
IIRC, on windows/visualC, no matter if it is x86 or x64, long is
always 32bits, so it won't change the size of long.
See http://en.wikipedia.org/wiki/LLP64#64-bit_data_models for a good
description of this mess. AFAIK many packages that target both 32 and
64 bit environments MS and *nix, define explicitly adopt XXX_int32,
XXX_uint32, XXX_int64, XXX_uint64, ... datatypes and use wrappers to map
these onto the appropriate visualC / gcc types. As far as I can see,
PHP doesn't and seems to use long and int almost interchangeably which
causes problems as LP64/I32LP64 and LLP64/IL32P64 are very different.
This is one reason for 64-bit support on Windows being problematic.

It would be good for PHP to have a road map to removed data
model-specific potholes, say by 5.6 or 5.7.

//Terry

12 years ago by Stas Malyshev — view source

unread

Hi!

these onto the appropriate visualC / gcc types. As far as I can see,
PHP doesn't and seems to use long and int almost interchangeably which

PHP indeed does not use fixed-size types in zvals, etc. but it
definitely does not "use long and int almost interchangeably". In almost
any place where int is used instead of long or vice versa (unless it is
a specific small value that is nowhere near limits of either int or long
and used in very restricted context) - it is a bug and should be fixed.
If you know of such places, please name them or even better, submit a
bug report pointing them out.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Terry Ellison — view source

unread

these onto the appropriate visualC / gcc types. As far as I can see,
PHP doesn't and seems to use long and int almost interchangeably which
PHP indeed does not use fixed-size types in zvals, etc. but it
definitely does not "use long and int almost interchangeably". In almost
any place where int is used instead of long or vice versa (unless it is
a specific small value that is nowhere near limits of either int or long
and used in very restricted context) - it is a bug and should be fixed.
If you know of such places, please name them or even better, submit a
bug report pointing them out.
Stan, you are right to correct me. Sorry. However, I still feel that
the implicit assumption is that sizeof(long) == 2*sizeof(int) and this
isn't the case with visualC, and PHP internal data structures compiled
with visualC and gcc are significantly different; for example hash keys
are 32 bits long on Windows and 64bits on *nix. Why aren't they 32bits,
say, on both? (as there is no performance benefit is having 64bit hash
keys when the maximum size of a hash table is an int).

12 years ago by Stas Malyshev — view source

unread

Hi!

Stan, you are right to correct me. Sorry. However, I still feel that
the implicit assumption is that sizeof(long) == 2*sizeof(int) and this

I'm not sure where this assumption exists. Could you be more specific?
PHP uses long for most integer values, and int for some internal things
and sometimes those intersect, but usually there is either conversion or
we can be pretty sure the value, while being nominally long, is in range
of int (such as when it is an option with fixed set of values, etc.).
However, there were certainly bugs in the past in this area, and may be
still some, so eliminating them of course would be good. But right know
I don't know of any code that makes this specific assumption.

isn't the case with visualC, and PHP internal data structures compiled
with visualC and gcc are significantly different; for example hash keys
are 32 bits long on Windows and 64bits on *nix. Why aren't they 32bits,

Yes, they are different, because long size is different, and PHP uses
long (more specific, ulong) to store hash values. This is because
numeric values are long, and it's easier to use the same type for both
than bother with converting back and forth. As you noted, the difference
for hashing is minimal since the value is anyway brought to hash size,
and hashes of size more than 32-bit LONG_MAX aren't very practical.
However, it matters in other parts of the code.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Terry Ellison — view source

unread

isn't the case with visualC, and PHP internal data structures compiled
with visualC and gcc are significantly different; for example hash keys
are 32 bits long on Windows and 64bits on *nix. Why aren't they 32bits,
Yes, they are different, because long size is different,
This my point: programmers from the *nix world tend to assume that longs
are longer than ints. In the MS world they are synonymous. This is one
reason why code that has been developed in the *nix can be difficult to
get working reliably in the MS world.
and PHP uses
long (more specific, ulong) to store hash values. This is because
numeric values are long,
I don't follow this reasoning. "Numeric value" in this context is a PHP
(application space) concept. Hashes are internal to the Zend EE and are
never exposed to the PHP programmer, so this is a case of comparing
apples and pears. All that having a 64-bit hash does (when used %
nTableSize which smaller than maxint) is to add 8 bytes to every Bucket
entry stored by the EE for no practical benefit.

(It's 8 bytes because ulong h; uint nKeyLength; <long boundary> takes 16
bytes but uint32 h; uint nKeyLength; <long boundary> takes 8 because of
data alignment.)

and it's easier to use the same type for both
than bother with converting back and forth. As you noted, the difference
for hashing is minimal since the value is anyway brought to hash size,
and hashes of size more than 32-bit LONG_MAX aren't very practical.
However, it matters in other parts of the code.

12 years ago by Pierre Joye — view source

unread

hi Terry.

hi Remi

About
http://git.php.net/?p=php-src.git;a=commitdiff;h=79956330fe17cfd5f60de456497541b21a89bddf
(For now, I have reverted this fix)

Here some explanations.

LONG_MAX is 9223372036854775807 (0x7fffffffffffffff)
double representation of LONG_MAX is 9223372036854775808

(d > LONG_MAX) is evaluated in double space.
So is false for double which have the same value than (double)LONG_MAX.

So, for (double)LONG_MAX the cast used is
(long)d

9223372036854775807 on ppc64
9223372036854775808 on x86_64 (gcc without optimization)
9223372036854775807 on x86_64 (gcc -O2)

PHP expected value is 9223372036854775808
(Btw, I don't understand why PHP, build on x86_64, with -O2, gives the
good result, some environment mystery)

Obviously, we could have different result on different platform,
compiler, architecture.

I will be very interested by result on other platform (mac, windows),
compiler (Visual C), architecture.

If we switch to the unsigned cast:
(long)(unsigned long)d;

Any comments ?

IIRC, on windows/visualC, no matter if it is x86 or x64, long is
always 32bits, so it won't change the size of long.

Yes, that's the case, see my other reply in this thread.

It would be good for PHP to have a road map to removed data model-specific
potholes, say by 5.6 or 5.7.

I do not think this change is small enough to be done in 5.x but
definitively in the next major. We have to keep in mind the impacts of
such a change on the code base and all external projects, be
extensions or libraries used by PHP.

In addition, buffers size should use (u)size_t and not int32/64.

Cheers,

Pierre

@pierrejoye

12 years ago by Stas Malyshev — view source

unread

Hi!

About
http://git.php.net/?p=php-src.git;a=commitdiff;h=79956330fe17cfd5f60de456497541b21a89bddf
(For now, I have reverted this fix)

Here some explanations.

LONG_MAX is 9223372036854775807 (0x7fffffffffffffff)
double representation of LONG_MAX is 9223372036854775808

I see what's going on here. We have precision loss due to the fact that
the range of double (53 bits) is smaller than the range of long (63
bits) on 64-bit system. When it's converted back to long, I guess it has
to be expanded back to 63 bits.

9223372036854775807 on ppc64
9223372036854775808 on x86_64 (gcc without optimization)
9223372036854775807 on x86_64 (gcc -O2)

PHP expected value is 9223372036854775808

Not sure how value of long can be expected to be 9223372036854775808 -
9223372036854775808 is not representable in signed long. If you do
(long)(unsigned long), you'd get -9223372036854775808.

(Btw, I don't understand why PHP, build on x86_64, with -O2, gives the
good result, some environment mystery)

With -O2, gcc probably pre-calculates the result of the operation if you
use simple test program. So, if I write:

double b = LONG_MAX;
printf("%ld", (long)b);

Then without -O2, I'm getting:

    movsd   -16(%rbp), %xmm0
    cvttsd2siq      %xmm0, %rsi

But with -O2, it's just:

    movabsq $9223372036854775807, %rsi

So the difference in results may stem from the fact that the
calculations go in different way here.

Obviously, we could have different result on different platform,
compiler, architecture.

I will be very interested by result on other platform (mac, windows),
compiler (Visual C), architecture.

If we switch to the unsigned cast:
(long)(unsigned long)d;

The result is always 9223372036854775808 (which is expected)

Wait, long can not be 9223372036854775808, how the result of long
conversion can be that?

From my tests, this doesn't change anything on x86_64, and improves

Well, for the value of (double)LONG_MAX the code actually changes
substantially, and if I test this code:

double b = LONG_MAX;
printf("%ld %ld", (long)(unsigned long)b, (long)b);

Then with -O2 I get different results:

-9223372036854775808 9223372036854775807

But that may be an optimization artifact. With real PHP, the result of
converting 9223372036854775807 to double and then back to int seems to
always be -9223372036854775808 on Intel, with or without the patch. I
don't have access to ppc64 machine so no idea what's going on there.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Remi Collet — view source

unread

Le 10/02/2013 05:54, Stas Malyshev a écrit :

Hi!

About
http://git.php.net/?p=php-src.git;a=commitdiff;h=79956330fe17cfd5f60de456497541b21a89bddf
(For now, I have reverted this fix)

Here some explanations.

LONG_MAX is 9223372036854775807 (0x7fffffffffffffff)
double representation of LONG_MAX is 9223372036854775808

I see what's going on here. We have precision loss due to the fact that
the range of double (53 bits) is smaller than the range of long (63
bits) on 64-bit system. When it's converted back to long, I guess it has
to be expanded back to 63 bits.

9223372036854775807 on ppc64
9223372036854775808 on x86_64 (gcc without optimization)
9223372036854775807 on x86_64 (gcc -O2)

PHP expected value is 9223372036854775808

Not sure how value of long can be expected to be 9223372036854775808 -
9223372036854775808 is not representable in signed long. If you do
(long)(unsigned long), you'd get -9223372036854775808.

Yes; I mean -9223372036854775808 (0x8000000000000000)

(sorry, I have use %lu format instead of %ld in my tests)

(Btw, I don't understand why PHP, build on x86_64, with -O2, gives the
good result, some environment mystery)

With -O2, gcc probably pre-calculates the result of the operation if you
use simple test program. So, if I write:

double b = LONG_MAX;
printf("%ld", (long)b);

Then without -O2, I'm getting:
    movsd   -16(%rbp), %xmm0
    cvttsd2siq      %xmm0, %rsi
But with -O2, it's just:
    movabsq $9223372036854775807, %rsi
So the difference in results may stem from the fact that the
calculations go in different way here.

Obviously, we could have different result on different platform,
compiler, architecture.

I will be very interested by result on other platform (mac, windows),
compiler (Visual C), architecture.

If we switch to the unsigned cast:
(long)(unsigned long)d;

The result is always 9223372036854775808 (which is expected)

Wait, long can not be 9223372036854775808, how the result of long
conversion can be that?

Yes, -9223372036854775808 (0x8000000000000000)

From my tests, this doesn't change anything on x86_64, and improves

Well, for the value of (double)LONG_MAX the code actually changes
substantially, and if I test this code:

double b = LONG_MAX;
printf("%ld %ld", (long)(unsigned long)b, (long)b);

Then with -O2 I get different results:

-9223372036854775808 9223372036854775807

But that may be an optimization artifact.

Yes I also think.
But is this couldn't occurs in real PHP, with another compiler or
another gcc version ?

With real PHP, the result of
converting 9223372036854775807 to double and then back to int seems to
always be -9223372036854775808 on Intel, with or without the patch. I
don't have access to ppc64 machine so no idea what's going on there.

That's exactly the problem.
On Intel, we always get -9223372036854775808 (when 9223372036854775807
could be expected, with -O2, from you previous test)

On ppc64, we always get 9223372036854775807.

This is exactly what I'd like to fix.

And, as you said, the patch doesn't change result on Intel.

I could make this change conditionnal on PPC, but I think it will also
protect us from any compiler optimization change.

Remi

12 years ago by Remi Collet — view source

unread

One more test (to get rid of compiler optimization)

int main (int argc, char *argv[]) {
double d;

if (argc>1 && sscanf(argv[1], "%lg", &d)) {
	printf("  double=%.20lg\n", d);		
	printf("  signed=%lx\n", (long)d);
	printf("unsigned=%lx\n", (unsigned long)d);
}
return 0;

}

On x86_64 (so, with or without -O2)

$ gcc -O2 -Wall math2.c -o math2 && ./math2 9223372036854775807
double=9223372036854775808
signed=8000000000000000
unsigned=8000000000000000

On ppc64

$ gcc -O2 -Wall math2.c -o math2 && ./math2 9223372036854775807
double=9223372036854775808
signed=0x7fffffffffffff
unsigned=8000000000000000

Remi.

12 years ago by Stas Malyshev — view source

unread

Hi!

On x86_64 (so, with or without -O2)

$ gcc -O2 -Wall math2.c -o math2 && ./math2 9223372036854775807
double=9223372036854775808
signed=8000000000000000
unsigned=8000000000000000

On ppc64

$ gcc -O2 -Wall math2.c -o math2 && ./math2 9223372036854775807
double=9223372036854775808
signed=0x7fffffffffffff
unsigned=8000000000000000

So, looks like on ppc64 double->long conversion works differently than
on x86_64. Maybe best solution would be to make the patch with ifdef for
ppc64 so that it would do the same as Intel... Though I'm not sure if
somebody on this platform would expect PHP behave as C does, or as Intel
counterpart does. I'd say I personally probably would prefer
Intel-compatible behavior since most software would be expected to be
tested on Intel. But formally I'm not sure it's even wrong behavior...

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Gustavo Lopes — view source

unread

On Sun, 10 Feb 2013 09:35:41 +0100, Stas Malyshev smalyshev@sugarcrm.com
wrote:

So, looks like on ppc64 double->long conversion works differently than
on x86_64. Maybe best solution would be to make the patch with ifdef for
ppc64 so that it would do the same as Intel... Though I'm not sure if
somebody on this platform would expect PHP behave as C does, or as Intel
counterpart does. I'd say I personally probably would prefer
Intel-compatible behavior since most software would be expected to be
tested on Intel. But formally I'm not sure it's even wrong behavior...

I don't think the ifdef is needed at all. Like I said in the bug report,
I'm pretty sure that the patch is correct.

The idea of casting first to unsigned long when d > (double)LONG_MAX is to
use the extra range of unsigned long; if the double is within unsigned
long, the result is well-defined. But that test has a hidden assumption:
if d == (double)LONG_MAX, then we can cast to (long) because d is within
the long range. This assumption is wrong because (double)LONG_MAX is
larger than LONG_MAX -- the value 9223372036854775808.0 is the closest
double to 9223372036854775807. Therefore, changing the test to >= seems to
be the correct course of action.

My concern is that we may also have different behavior when we go outside
the unsigned long range.

As to whether ppc64 or x86_64 is correct when casting (double)LONG_MAX to
a signed long, the answer is this that since the double is outside long
range, the behavior is undefined. If we want more predictability, we have
to do the conversion ourselves and probably take a small performance hit.

--
Gustavo Lopes

12 years ago by Gustavo Lopes — view source

unread

On Sun, 10 Feb 2013 12:00:01 +0100, Gustavo Lopes glopes@nebm.ist.utl.pt
wrote:

My concern is that we may also have different behavior when we go
outside the unsigned long range.

As to whether ppc64 or x86_64 is correct when casting (double)LONG_MAX
to a signed long, the answer is this that since the double is outside
long range, the behavior is undefined. If we want more predictability,
we have to do the conversion ourselves and probably take a small
performance hit.

If we want to eliminate the undefined behavior, we should do (on archs
with 64-bit longs):

(long)(unsigned long)(fmod(d, pow(2., 64.)))

Can you test this program on ppc64:

#include <stdio.h>
#include <math.h>

long convert(double d)
{
return (long)(unsigned long)(fmod(d, pow(2., 64.)));
}

int main(int argc, char *argv[])
{
double d;

 if (argc > 1 && sscanf(argv[1], "%lg", &d)) {
     printf("%ld %ld\n", convert(d), (long)(unsigned long)d);
 }
 return 0;

}

I get this:
$ gcc -O3 -lm conv.c && ./a.out 9223372036854775808
-9223372036854775808 -9223372036854775808
$ gcc -O3 -lm conv.c && ./a.out 4e21
-2943463994972700672 0
$ gcc -O3 -lm conv.c && ./a.out 4e19
3106511852580896768 0

Which seems correct when verified with mathematica:

In[9]:= c =
Function[o,
o // N // SetPrecision[#, [Infinity]] & // Mod[#, 2^64] & //
Piecewise[{{#, # < 2^63}}, -2^64 + #] &];

In[10]:= c /@ {2^63, 4*^21, 4*^19}

Out[10]= {-9223372036854775808, -2943463994972700672,
3106511852580896768}

The cast to long from unsigned long is still implementation-defined
behavior, but all twos complement archs should work like this (i.e. don't
change the representation). In any case, it's not undefined behavior.

--
Gustavo Lopes

12 years ago by Remi Collet — view source

unread

Le 10/02/2013 15:58, Gustavo Lopes a écrit :

If we want to eliminate the undefined behavior, we should do (on archs
with 64-bit longs):

(long)(unsigned long)(fmod(d, pow(2., 64.)))

Can you test this program on ppc64:

$ gcc -O3 -lm conv.c && ./a.out 9223372036854775808
-9223372036854775808 -9223372036854775808

$ gcc -O3 -lm conv.c && ./a.out 4e21
-2943463994972700672 -1

$ gcc -O3 -lm conv.c && ./a.out 4e19
3106511852580896768 -1

I get this:
$ gcc -O3 -lm conv.c && ./a.out 9223372036854775808
-9223372036854775808 -9223372036854775808
$ gcc -O3 -lm conv.c && ./a.out 4e21
-2943463994972700672 0
$ gcc -O3 -lm conv.c && ./a.out 4e19
3106511852580896768 0

Remi.

12 years ago by Gustavo Lopes — view source

unread

On Sun, 10 Feb 2013 16:19:46 +0100, Remi Collet remi@fedoraproject.org
wrote:

Le 10/02/2013 15:58, Gustavo Lopes a écrit :

Can you test this program on ppc64:

$ gcc -O3 -lm conv.c && ./a.out 9223372036854775808
-9223372036854775808 -9223372036854775808

$ gcc -O3 -lm conv.c && ./a.out 4e21
-2943463994972700672 -1

$ gcc -O3 -lm conv.c && ./a.out 4e19
3106511852580896768 -1

I get this:
$ gcc -O3 -lm conv.c && ./a.out 9223372036854775808
-9223372036854775808 -9223372036854775808
$ gcc -O3 -lm conv.c && ./a.out 4e21
-2943463994972700672 0
$ gcc -O3 -lm conv.c && ./a.out 4e19
3106511852580896768 0

This was I was afraid. That bug was just the tip of the iceberg. I suggest
we do change the the > operator to >= like you proposed, but also that we
add the fmod call. The code I gave earlier had a bug btw, as fmod can give
a negative number. I changed it to this:

long convert(double d)
{
double dmod = fmod(d, pow(2., 64.));
if (dmod < 0) {
dmod += pow(2., 64.);
}
return (long)(unsigned long)dmod;
}

I tested the doubles around -4e21 and it worked fine:

$ ./a.out -4000000000000001048576
2943463994971652096 -9223372036854775808
$ ./a.out -4000000000000000524288
2943463994972176384 -9223372036854775808
$ ./a.out -4000000000000000000000
2943463994972700672 -9223372036854775808
$ ./a.out -3999999999999999475712
2943463994973224960 -9223372036854775808
$ ./a.out -3999999999999998951424
2943463994973749248 -9223372036854775808

In[36]:= c /@ Table[-4.^21 + iUlp[-4.*^21], {i, -2, 2}]

Out[36]= {2943463994971652096, 2943463994972176384,
2943463994972700672, 2943463994973224960, 2943463994973749248}

Any reservations?

--
Gustavo Lopes

12 years ago by Remi Collet — view source

unread

Le 10/02/2013 20:27, Gustavo Lopes a écrit :

This was I was afraid. That bug was just the tip of the iceberg. I
suggest we do change the the > operator to >= like you proposed,

I will do it later today.

but also that we add the fmod call. The code I gave earlier had a bug btw,
as fmod can give a negative number. I changed it to this:

long convert(double d)
{
double dmod = fmod(d, pow(2., 64.));
if (dmod < 0) {
dmod += pow(2., 64.);
}
return (long)(unsigned long)dmod;
}

I tested the doubles around -4e21 and it worked fine:

On ppc64,

$ gcc -O2 -Wall conv.c -o conv -lm

$ ./conv -4000000000000001048576
2943463994971652096 -9223372036854775808
$ ./conv -4000000000000000524288
2943463994972176384 -9223372036854775808
$ ./conv -4000000000000000000000
2943463994972700672 -9223372036854775808
$ ./conv -3999999999999999475712
2943463994973224960 -9223372036854775808
$ ./conv -3999999999999998951424
2943463994973749248 -9223372036854775808

$ ./conv 9223372036854775808
-9223372036854775808 -9223372036854775808
$ ./conv 4e21
-2943463994972700672 -1
$ ./conv 4e19
3106511852580896768 -1

$ ./a.out -4000000000000001048576
2943463994971652096 -9223372036854775808
$ ./a.out -4000000000000000524288
2943463994972176384 -9223372036854775808
$ ./a.out -4000000000000000000000
2943463994972700672 -9223372036854775808
$ ./a.out -3999999999999999475712
2943463994973224960 -9223372036854775808
$ ./a.out -3999999999999998951424
2943463994973749248 -9223372036854775808

In[36]:= c /@ Table[-4.^21 + iUlp[-4.*^21], {i, -2, 2}]

Out[36]= {2943463994971652096, 2943463994972176384,
2943463994972700672, 2943463994973224960, 2943463994973749248}

Any reservations?

For which values ? Outside LONG_MIN .. ULONG_MAX ?

Remi.

12 years ago by Stas Malyshev — view source

unread

Hi!

This was I was afraid. That bug was just the tip of the iceberg. I
suggest we do change the the > operator to >= like you proposed,

I will do it later today.

= may be fine, but I am concerned about fmod change for 5.4. If we do
it, I'm afraid we may change some scenario, so I'd prefer to make it 5.5
only.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Gustavo Lopes — view source

unread

On Mon, 11 Feb 2013 08:05:43 +0100, Stas Malyshev smalyshev@sugarcrm.com
wrote:

This was I was afraid. That bug was just the tip of the iceberg. I
suggest we do change the the > operator to >= like you proposed,

I will do it later today.

= may be fine, but I am concerned about fmod change for 5.4. If we do
it, I'm afraid we may change some scenario, so I'd prefer to make it 5.5
only.

If no one objects, I'll merge this into 5.5 and master:

https://github.com/cataphract/php-src/compare/dval_to_lval

--
Gustavo Lopes

12 years ago by Stas Malyshev — view source

unread

Hi!

If no one objects, I'll merge this into 5.5 and master:

https://github.com/cataphract/php-src/compare/dval_to_lval

Maybe add UNEXPECTED around the if condition? Since it's marked as
zend_always_inline I imagine it's supposed to be performance-sensitive...

Also, skip comments in tests seem to be wrong:

+if (PHP_INT_SIZE != 4)
6

die("skip for machines with 32-bit longs");

But actually it skips for longs NOT being 32-bit. Same with other test.

Also, I'm not sure I understand what 64-bit test is supposed to return
and why - why negative number is converted to positive one? What exactly
this change is supposed to improve?

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Gustavo Lopes — view source

unread

On Tue, 19 Feb 2013 03:07:37 +0100, Stas Malyshev smalyshev@sugarcrm.com
wrote:

If no one objects, I'll merge this into 5.5 and master:

https://github.com/cataphract/php-src/compare/dval_to_lval

Maybe add UNEXPECTED around the if condition? Since it's marked as
zend_always_inline I imagine it's supposed to be performance-sensitive...

I can add that.

Also, skip comments in tests seem to be wrong:

+if (PHP_INT_SIZE != 4)
6

die("skip for machines with 32-bit longs");

But actually it skips for longs NOT being 32-bit. Same with other test.

run-tests.php strips off the skip and shows only "for machines with
32-bit". The "skip" is necessary for the test to be skipped.

Also, I'm not sure I understand what 64-bit test is supposed to return
and why - why negative number is converted to positive one? What exactly
this change is supposed to improve?

The least significant bytes are preserved. Take int(-2056257536) and
int(2943463994971652096):

In[6]:= BitAnd[2^32 - 1, {-2056257536, 2943463994971652096}]

Out[6]= {2238709760, 2238709760}

--
Gustavo Lopes

12 years ago by Stas Malyshev — view source

unread

Hi!

Also, I'm not sure I understand what 64-bit test is supposed to return
and why - why negative number is converted to positive one? What exactly
this change is supposed to improve?

The least significant bytes are preserved. Take int(-2056257536) and
int(2943463994971652096):

In[6]:= BitAnd[2^32 - 1, {-2056257536, 2943463994971652096}]

Out[6]= {2238709760, 2238709760}

I'm not sure I understand, what this syntax means and how it explains
why we're doing these conversions? I.e. what requires this change and
how it is beneficial? I understand that it preserves least significant
bits, but why we should preserve them and discard sign, for example?

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Gustavo Lopes — view source

unread

Em 2013-02-20 9:15, Stas Malyshev escreveu:

Hi!

Also, I'm not sure I understand what 64-bit test is supposed to
return
and why - why negative number is converted to positive one? What
exactly
this change is supposed to improve?

The least significant bytes are preserved. Take int(-2056257536) and
int(2943463994971652096):

In[6]:= BitAnd[2^32 - 1, {-2056257536, 2943463994971652096}]

Out[6]= {2238709760, 2238709760}

I'm not sure I understand, what this syntax means and how it explains
why we're doing these conversions?

It doesn't, of course. I was just showing why there are changes in
sign.

I.e. what requires this change and
how it is beneficial? I understand that it preserves least
significant
bits, but why we should preserve them and discard sign, for example?

First of all, there are no change for the ]LONG_MAX, ULONG_MAX] range.
Doubles in that range (notice they're positive numbers) already got
converted to negative numbers.

The problem was that for numbers outside the [LONG_MIN, ULONG_MAX]
range, the code relied on undefined behavior that did vary between
platforms. This change does away with the undefined behavior and
provides a uniform treatment of the finite double domain. (int)$dval is
now round*($dval) mod 2^(PHP_INT_SIZE8) converted to a signed integer
of size PHP_INT_SIZE8 bits assuming a two's complement representation,
for all finite $dval double values.

You could argue that another behavior like clipping at
LONG_MAX/LONG_MIN is preferable. But we can't do it for BC reasons. And
especially for 32-bit long systems, the current behavior is very useful.
For instance let's say you had some network protocol that used an
unsigned 32-bit int on which you needed to do some bitwise operations.
You can use a double to represent the range 2^31..2^32-1 that you got
from, from instance, adding two large PHP integers, cast it to int and
do your bitwise operation.

Actually, I just realized the code probably needs a small adjustment
to make the double always round towards zero. This is only important in
32-bit systems though; for the others, the doubles outside the long
range have no fractional part.

--
Gustavo Lopes

12 years ago by Remi Collet — view source

unread

Le 10/02/2013 20:27, Gustavo Lopes a écrit :

Any reservations?

Do we really want/need to change the current behavior.

Cast from double, outside of integer range, result in a fix value.

The only problem is that the value is not the same on all arch.

x86_64 always return 0x8000...

ppc64 returns
LONG_MAX (0x7fff...) for signed value >= LONG_MAX
LONG_MIN (0x8000...) for signed value <= LONG_MIN
ULONG_MAX (0xffff...) for unsigned value >= ULONG_MAX

NB. The current fix applied only fix LONG_MAX value.
Cast for >= ULONG_MAX is not fixed.

Note (my "very" personal opinion) I think ppc64 behavior is better as it
preserve the sign of the result... ;)

But Intel seems the "standard"...

Remi.

12 years ago by Gustavo Lopes — view source

unread

On Mon, 11 Feb 2013 09:43:36 +0100, Remi Collet remi@fedoraproject.org
wrote:

Le 10/02/2013 20:27, Gustavo Lopes a écrit :

Any reservations?

For which values ? Outside LONG_MIN .. ULONG_MAX ?

It should be valid for the whole domain, but for performance reasons it
would be better to limit it to values at least outside the (LONG_MIN,
LONG_MAX) interval. Testing also against LONG_MIN has a measurable
performance hit in my tests (+40%), but that should not matter (on my old
laptop, you can still run the inlined function 1e9 times per second for
values in the long range, i.e., it takes around 1 ns).

Do we really want/need to change the current behavior.

Cast from double, outside of integer range, result in a fix value.

The only problem is that the value is not the same on all arch.

Sure, that is also an option, but since we already have a wrapping
behavior in the LONG_MAX..ULONG_MAX range, it would make sense to keep a
similar behavior throughout the double domain, I think. It would be
inconsistent to have it wrap on LONG_MAX, but for values smaller than
LONG_MIN or larger than ULONG_MAX give a fixed value.

--
Gustavo Lopes

12 years ago by Remi Collet — view source

unread

If you find some interests on looking at the remaining ppc64 issues,
I have uploaded the 16 failed tests results

http://blog.famillecollet.com/public/reports/failures-5.4.11-ppc64.tgz
(this is 5.4.11, but with ppc64 patches applied, 46 failures before)

Remi.

double val to long val conversion issue

Cheers,

Cheers,

But that may be an optimization artifact. With real PHP, the result of converting 9223372036854775807 to double and then back to int seems to always be -9223372036854775808 on Intel, with or without the patch. I don't have access to ppc64 machine so no idea what's going on there.

Also, I'm not sure I understand what 64-bit test is supposed to return and why - why negative number is converted to positive one? What exactly this change is supposed to improve?

But that may be an optimization artifact. With real PHP, the result of
converting 9223372036854775807 to double and then back to int seems to
always be -9223372036854775808 on Intel, with or without the patch. I
don't have access to ppc64 machine so no idea what's going on there.

Also, I'm not sure I understand what 64-bit test is supposed to return
and why - why negative number is converted to positive one? What exactly
this change is supposed to improve?