Change default serialize precision from 100 to 17

14 years ago by Gustavo Lopes — view source — reply

unread

The default serialize precision is currently [1] set at 100. A little code
inspection shows precision, in this case, takes the usual meaning of
number of significant digits.

Given that the implicit precision of a (normal) IEEE 754 double precision
number is slightly less than 16 digits [2], this is a serious overkill.
Put another way, while the mantissa is composed of 52 bits plus 1 implicit
bit, 100 decimal digits can carry up to 100*log2(10) =~ 332 bits of
information, around 6 times more.

Given this, I propose changing the default precision to 17 (while the
precision is slightly less than 16, a 17th digit is necessary because the
first decimal digit carries little information when it is low).

From my tests, this makes serialization and unserialization of doubles
around 3 times faster (counting the function calls to
serialize/unserialize, plus a loop variable increment and stop condition
check). It also makes the serialization data.

Crucially, from my tests, the condition that the variable stays the same
before and after serialization+unserialization still holds. The test
include, for little endian machines, verifies this.

If no one objects, I'll change the default precision to 17.

//run with php -d serialize_precision=17
$numbers = array(
"0000000000000000", //0
"2d431cebe2362a3f", //.0002
"2e431cebe2362a3f", //.0002 + 10^-Accuracy[.0002]*1.01
"0000000000001000", //2^-1022. (minimum normal double)
"0100000000001000", //2^-1022. + 10^-Accuracy[2^-1022.]*1.01
"ffffffffffffef7f", //2^1024. (maximum normal double)
"feffffffffffef7f", //2^1024. - 10^-Accuracy[2^1024.]
"0100000000000000", //minumum subnormal double
"0200000000000000", //2nd minumum subnormal double
"fffffffffffff000", //maximum subnormal double
"fefffffffffff000", //2nd maximum subnormal double
"0000000000000f7f", //+inf
"0000000000000fff", //-inf
);

foreach ($numbers as $ns) {
$num = unpack("d", pack("H*", $ns)); $num = reset($num);
echo "number: ", sprintf("%.17e", $num), "... ";
$num2 = unserialize(serialize($num));
$repr = unpack("H*", pack("d", $num2)); $repr = reset($repr);
if ($repr == $ns)
echo "OK\n";
else
echo "mismatch\n\twas: $ns\n\tbecame: $repr\n";
}

 [1]: http://lxr.php.net/opengrok/xref/PHP_TRUNK/main/main.c#451
 [2]:

http://en.wikipedia.org/wiki/Double_precision_floating-point_format

Gustavo Lopes

14 years ago by Ilia Alshanetsky — view source — reply

unread

The default serialize precision is currently [1] set at 100. A little code
inspection shows precision, in this case, takes the usual meaning of number
of significant digits.

Given that the implicit precision of a (normal) IEEE 754 double precision
number is slightly less than 16 digits [2], this is a serious overkill. Put
another way, while the mantissa is composed of 52 bits plus 1 implicit bit,
100 decimal digits can carry up to 100*log2(10) =~ 332 bits of information,
around 6 times more.

Given this, I propose changing the default precision to 17 (while the
precision is slightly less than 16, a 17th digit is necessary because the
first decimal digit carries little information when it is low).

From my tests, this makes serialization and unserialization of doubles
around 3 times faster (counting the function calls to serialize/unserialize,
plus a loop variable increment and stop condition check). It also makes the
serialization data.

Crucially, from my tests, the condition that the variable stays the same
before and after serialization+unserialization still holds. The test
include, for little endian machines, verifies this.

If no one objects, I'll change the default precision to 17.

//run with php -d serialize_precision=17
$numbers = array(
"0000000000000000", //0
"2d431cebe2362a3f", //.0002
"2e431cebe2362a3f", //.0002 + 10^-Accuracy[.0002]*1.01
"0000000000001000", //2^-1022. (minimum normal double)
"0100000000001000", //2^-1022. + 10^-Accuracy[2^-1022.]*1.01
"ffffffffffffef7f", //2^1024. (maximum normal double)
"feffffffffffef7f", //2^1024. - 10^-Accuracy[2^1024.]
"0100000000000000", //minumum subnormal double
"0200000000000000", //2nd minumum subnormal double
"fffffffffffff000", //maximum subnormal double
"fefffffffffff000", //2nd maximum subnormal double
"0000000000000f7f", //+inf
"0000000000000fff", //-inf
);

foreach ($numbers as $ns) {
$num = unpack("d", pack("H*", $ns)); $num = reset($num);
echo "number: ", sprintf("%.17e", $num), "... ";
$num2 = unserialize(serialize($num));
$repr = unpack("H*", pack("d", $num2)); $repr = reset($repr);
if ($repr == $ns)
echo "OK\n";
else
echo "mismatch\n\twas: $ns\n\tbecame: $repr\n";
}

[1]: http://lxr.php.net/opengrok/xref/PHP_TRUNK/main/main.c#451
[2]: http://en.wikipedia.org/wiki/Double_precision_floating-point_format

Gustavo Lopes

14 years ago by Stan Vass — view source — reply

unread

The default serialize precision is currently [1] set at 100. A little code
inspection shows precision, in this case, takes the usual meaning of
number of significant digits.

Given that the implicit precision of a (normal) IEEE 754 double precision
number is slightly less than 16 digits [2], this is a serious overkill.
Put another way, while the mantissa is composed of 52 bits plus 1 implicit
bit, 100 decimal digits can carry up to 100*log2(10) =~ 332 bits of
information, around 6 times more.

Given this, I propose changing the default precision to 17 (while the
precision is slightly less than 16, a 17th digit is necessary because the
first decimal digit carries little information when it is low).

From my tests, this makes serialization and unserialization of doubles
around 3 times faster (counting the function calls to
serialize/unserialize, plus a loop variable increment and stop condition
check). It also makes the serialization data.

Crucially, from my tests, the condition that the variable stays the same
before and after serialization+unserialization still holds. The test
include, for little endian machines, verifies this.

If no one objects, I'll change the default precision to 17.

I've always done this, and the default of 100 has been puzzling.

We have another problem. The counterpart of serialize precision, the display
"precision" INI setting is currently lower than 17 on most PHP installs, and
it this leads to actual data loss in places where people expect least.

All database drivers/APIs I've tested have escape/quote routines which are
affected by this supposedly "display only" precision.

The below test is performed on a 32-bit system (integers over 2^31 become
floats):

$x = new PDO('mysql:localhost', '...', '...');

ini_set('precision', 12); // typical for many PHP installs
echo $x->quote(pow(2,50)); // (string) '1.12589990684E+15'

ini_set('precision', 17);
echo $x->quote(pow(2,50)); // (string) '1125899906842624'

In the first example, the E notation, the API sends this to the database:
112589990684000, and was supposed to be 1125899906842624.
It's off by -2624.

The immediate fix for this is to keep both precisions to 17, and in the
long term, they should switch to serialize.precision and not use the display
precision.

Stan Vass

14 years ago by Nicolas Grekas — view source — reply

unread

ini_set('precision', 17);

After some testings, here is what I get :

<?php

ini_set('precision', 14);
echo 0.1; // 0.1
echo 0.2; // 0.2
echo 0.3; // 0.3

ini_set('precision', 17);
echo 0.2; // 0.10000000000000001
echo 0.2; // 0.20000000000000001
echo 0.3; // 0.29999999999999999

The default precision of 14 (or 12) must have been chosen to address
this overlong string representation of many simple floats ?
While I agree with you that any data loss must be forbidden, couldn't
this also break existing code ?
Would it be possible to "displays a value based on the shortest
decimal fraction that rounds correctly back to the true binary value",
like python 2.7 and 3.1 do ?
(http://docs.python.org/tutorial/floatingpoint.html)

Just my 2cts :)

Nicolas

14 years ago by Stan Vass — view source — reply

unread

ini_set('precision', 17);

After some testings, here is what I get :

<?php

ini_set('precision', 14);
echo 0.1; // 0.1
echo 0.2; // 0.2
echo 0.3; // 0.3

ini_set('precision', 17);
echo 0.2; // 0.10000000000000001
echo 0.2; // 0.20000000000000001
echo 0.3; // 0.29999999999999999

?>

The default precision of 14 (or 12) must have been chosen to address
this overlong string representation of many simple floats ?
While I agree with you that any data loss must be forbidden, couldn't
this also break existing code ?
Would it be possible to "displays a value based on the shortest
decimal fraction that rounds correctly back to the true binary value",
like python 2.7 and 3.1 do ?
(http://docs.python.org/tutorial/floatingpoint.html)

Just my 2cts :)

Nicolas

As I've shown in the previous post, "looks better" doesn't mean "is more
accurate". It's true that precision has been set to 12 or 14 in order to
look better with some small decimal fractions, but that can't be the only
concern when that same precision is used also for serialization during
database escaping.

It's hard to imagine when the display precision of an echoed float being
longer or less aesthetically pleasing to a human eye may break code, but
when real precision is lost, and this is sent to the database like that,
this is a much more tangible problem.

In terms of formatting of floats for humans, we still have printf() and
number_format(), which allow control independent of the ini setting (which
people use when they need that explicit control).

Stan Vass

14 years ago by Gustavo Lopes — view source — reply

unread

On Tue, 08 Feb 2011 20:05:05 -0000, Nicolas Grekas
nicolas.grekas+php@gmail.com wrote:

ini_set('precision', 17);

After some testings, here is what I get :

<?php

ini_set('precision', 14);
echo 0.1; // 0.1
echo 0.2; // 0.2
echo 0.3; // 0.3

ini_set('precision', 17);
echo 0.2; // 0.10000000000000001
echo 0.2; // 0.20000000000000001
echo 0.3; // 0.29999999999999999

The default precision of 14 (or 12) must have been chosen to address
this overlong string representation of many simple floats ?
While I agree with you that any data loss must be forbidden, couldn't
this also break existing code ?

Yes, I think it's dangerous to change the default display precision lest
we have a ton of applications that currently show 0.2 showing
0.20000000000000001.

Would it be possible to "displays a value based on the shortest
decimal fraction that rounds correctly back to the true binary value",
like python 2.7 and 3.1 do ?
(http://docs.python.org/tutorial/floatingpoint.html)

This may be a good idea for trunk, but I don't think it's feasible for 5.3
for the same reason. Showing "shortest decimal fraction that rounds
correctly back to the true binary value" works fine for numbers that are
directly input, where the only error is the normal rounding error (i.e.,
total uncertainty for x is x*2^-53). Once you start making calculations
with the numbers the errors start being propagated, so in these scenarios
you would still end up with a lot more "ugly" string representations that
you have today with the default display precision.

I agree that the information loss in e.g. PDO must be fixed, but it seems
more appropriate to fix those problems by forcing another precision only
in those cases.

--
Gustavo Lopes

14 years ago by Ben Schmidt — view source — reply

unread

Yes, I think it's dangerous to change the default display precision
lest we have a ton of applications that currently show 0.2 showing
0.20000000000000001.

Exactly. And remember, PHP output is not necessarily just for web pages
for humans to read. Other apps may rely on parsing this data, etc., and
may break if precision changes unexpectedly.

Would it be possible to "displays a value based on the shortest
decimal fraction that rounds correctly back to the true binary value",
like python 2.7 and 3.1 do ?
(http://docs.python.org/tutorial/floatingpoint.html)

This may be a good idea for trunk, but I don't think it's feasible for
5.3 for the same reason.

Announced as part of a major or point upgrade, I guess this would be OK.

But it still doesn't solve the core problem, which is that the display
precision is being used where it shouldn't be used, e.g. for DB
communications.

Showing "shortest decimal fraction that rounds correctly back to the
true binary value" works fine for numbers that are directly input, where the only
error is the normal rounding error (i.e., total uncertainty for x is x*2^-53).
Once you start making calculations with the numbers the errors start being
propagated, so in these scenarios you would still end up with a lot more "ugly"
string representations that you have today with the default display precision.

Yeah. For this reason, I think it would be more of a nuisance for the
average app than a help. A lower display precision is actually
desirable.

I agree that the information loss in e.g. PDO must be fixed, but it
seems more appropriate to fix those problems by forcing another
precision only in those cases.

A much better way to fix it.

Ben.

14 years ago by Pierre Joye — view source — reply

unread

+1 for trunk (with all necessary documentation and upgrading changes).

Huge -1 for 5.3, just in case :)

Given this, I propose changing the default precision to 17 (while the
precision is slightly less than 16, a 17th digit is necessary because the
first decimal digit carries little information when it is low).

--
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

Change default serialize precision from 100 to 17

http://en.wikipedia.org/wiki/Double_precision_floating-point_format

[1]: http://lxr.php.net/opengrok/xref/PHP_TRUNK/main/main.c#451 [2]: http://en.wikipedia.org/wiki/Double_precision_floating-point_format

[1]: http://lxr.php.net/opengrok/xref/PHP_TRUNK/main/main.c#451
[2]: http://en.wikipedia.org/wiki/Double_precision_floating-point_format