Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:39138
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmx.net designates 213.165.64.20 as permitted sender)
Message-ID: <4884D7BF.909@gmx.net>
Date: Mon, 21 Jul 2008 20:38:55 +0200
User-Agent: Thunderbird 2.0.0.14 (X11/20080421)
MIME-Version: 1.0
To: php-dev List <internals@lists.php.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Rounding in PHP and floating point numbers in general
From: chris_se@gmx.net (Christian Seiler)

Hi,

With this posting I'd like to make another try in resolving the rounding
issues in PHP.

This is basically in response to <http://bugs.php.net/bug.php?id=42294>
(bug #42294) where I added a comment last year. Since then I have read
quite a bit on this subject so I feel that I should post an update to my
previous comment (which isn't completely correct, as are some other
things I reference in this mail, please keep that in mind while reading
them). And since the bug seems to have been forgotten, I think I should
try this on the mailing list.

A) General explanations
=======================

Rounding methods:

  - Round to negative infinity: Should be obvious, some examples:
          -0.5     ->  -1
           0.5     ->   0
           2.4     ->   2
          -3.6     ->  -4
           4.8     ->   4
    This is what the floor() function does.

  - Round to positive infinity: Obvious, some examples:
          -0.5     ->   0 [actually, -0]
           0.5     ->   1
           2.4     ->   3
          -3.6     ->  -3
           4.8     ->   5
    This is what the ceil() function does.

  - Round to zero: Some examples:
          -0.5     ->   0 [actually, -0]
           0.5     ->   0
           2.4     ->   2
          -3.6     ->  -3
           4.8     ->   4

  - Round away from zero: Some examples:
          -0.5     ->  -1
           0.5     ->   1
           2.4     ->   3
          -3.6     ->  -4
           4.8     ->   5

  - Round no nearest: Rounds to nearest integer. Examples not involving
    .5:
           2.4     ->   2
          -3.2     ->  -3
          -3.6     ->  -4
           4.8     ->   5
           2.501   ->   3
          -2.501   ->  -3
    [Note the last two examples: they are not exactly .5 so they do not
    have the equal distance from both integers so the rounding direction
    in this case is clear.]

    There are four main variants of this rounding method regarding the
    treatment of .5 (which is exactly in the middle of two integers):

      * Away from zero: .5 will always be rounded to the next integer
        away from zero:
          -1.5     ->  -2
           1.5     ->   2
          -2.5     ->  -3
           2.5     ->   3
        This is the traditional "arithmetic rounding" people learn in
        school.

      * Towards zero: .5 will always be rounded to the next integer
        towards zero:
          -1.5     ->  -1
           1.5     ->   1
          -2.5     ->  -2
           2.5     ->   2

      * Towards even: .5 will always be rounded to the next even integer:
          -1.5     ->  -2
           1.5     ->   2
          -2.5     ->  -2
           2.5     ->   2
        This is the so-called "banker's rouding".

      * Towards odd: .5 will always be rounded to the next odd integer:
          -1.5     ->  -1
           1.5     ->   1
          -2.5     ->  -3
           2.5     ->   3

In practice, only round to negative/positive infinity (floor/ceil),
arithmetic rounding (round to nearest, away from zero) and banker's
rouding (round to nearest, towards odd) are widely used.

B) Rounding occurs in
=====================

There are various places in PHP where rounding (explicit or implicit)
occurs:

Explicitely:

  * round()
  * number_format()
  * (sf)printf() with %3f etc.

Implicitely:

  * float to string conversion (the 'precision' ini setting)
  * (sf)printf() with %g

The problem is that different methods are used for this rounding.

C) History of the round() function in PHP
=========================================

First version in CVS (math.c):

  - No precision argument, always rounds to integer
  - Use rint() for rounding
        ISO C says: rint() uses the current rounding direction which on
        every machine known to me is round to even (so-called "banker's
        rounding"). But: Rounding direction may be changed by
        fesetround()!
  - On systems where rint() is not specified, rint() is emulated. But the
    emulation is done via an algorithm that does arithmetic rounding.

Version 1.22 (May 17, 2000):

  - Allow arbitrary precision rounding, now implement an algorithm that
    does arithmetic rounding.

Version 1.104 (Aug 8, 2003):

  - Add fuzz due to incorrect results (see below)

Version 1.106 (Aug 9, 2003):

  - Disable fuzz on Win32, make (useless) configure check on UNIX

July 23, 2004: (Slightly flawed) analysis of the FP and rounding
situation in PHP and proposal to fix that on the internals mailinglist:
<http://marc.info/?l=php-internals&m=109057070829170&w=2>
The linked C files are still available through the wayback machine via
<http://web.archive.org/> to get an impression what George actually
meant.

D) Problem analysis
===================

In this section I want to provide an analysis of the problems that occur
with PHPs rounding functionality.

First of all, rounding to integer is never a problem as long as the
number of digits before the decimal point is smaller than the floating
point precision. But does anyone really want to round something to 15
significant (!) digits precision and still use floats? I believe not.
So, if you want to do banker's rounding, simply use rint(), if you want
to do arithmetic rounding, simply do round() or if the C version doesn't
permit that then use the simple algorithm
rounded_val = val >= 0 ? floor (val + 0.5) : ceil (val - 0.5).

Problems arise when rounding to arbitrary precision. The basic problem
is that the only feasible way to do so is by multiplying the number with
pow(base, precision), then round to integer and divide the result with
the same factor.

Example:

round(0.142, 2)
    -> 0.142 * pow(10, 2) = 0.142 * 100 = 14.2
    -> round_to_integer(14.2) = 14
    -> 14/100 = 0.14
    -> result is 0.14

This works independently of the rounding algorithm chosen, i.e. it
doesn't matter if banker's rounding or arithmetic rounding is used.

But problems arise from floating point precision. Not every finite
decimal number can be represented by finite binary number, so for
floating point arithmetic, the floating point number closest to the
decimal number is used. For example, when using double precision, the
number 0.285 is actually stored as (approximately)
0.2849999999999999755750935. This is the closest number to 0.285 a
double representation (64 bits total, 52 bits for the fraction) can
provide. This is usually not a problem since implicit rounding to 10 to
15 significant places is done when printing the floating point number.

However, the problem arises when doing actual floating point
calculations: 0.285 * 100 = 28.4999999999999964472863212 whereas 28.5
can be displayed as an exact (!) floating point number. Therefore,
0.285 * 100 != 28.5. This is bad since arithmetic rounding fails for
this example. Banker's rounding on the other hand works by chance since
the digit in front of the 5 is an 8 and thus even. But there are
examples where banker's rounding fails for the same reason: The internal
representation of 1.255 as a floating point number is
1.2549999999999998934185896, multiplying that with 100 yields
125.4999999999999857891452848 instead of 125.5. Rounding that using both
banker's rounding and arithmetic rounding gives 125 and not 126 as would
be expected.

Also, it gets worse: There are usually three different floating point
types: single (32 bit), double (64 bit) and extended (80 bit). On 32 bit
x86 compatible CPUs all calculations usually take place with Extended
precision registers in the FPU. But when storing data in memory or
reading data from memory, the data is converted to the precision
available. When converting from single or double to extended, no loss
occurs (i.e. when loading memory into the FPU registers). But when
converting back, precision is lost. Also, a number that is nearest to a
decimal number in double precision is not necesaarily the nearest in
extended precision. Take, for example, 0.002877. The nearest double is
0.0028769999999999997832012. The nearest extended precision float is
0.0028770000000000000000416. That extended precision float truncated to
double is 0.0028770000000000002168821, which is in fact further away
from the original decimal number than the nearest double representation.
Now, the problem with arbitrary precision rounding is that you first
have to multiply the number with 10^places, then round it and in the end
divide the number by 10^places. If the last step, i.e. the division
through 10^places is done in extended floating point precision, then the
nearest number in extended precision will be chosed (this is due to the
fact that every integer can be represented exactly in both double and
extended precision and thus the result of floor() will be the same for
both, only the effect of the division will count). If that result is
then truncated to double, that may not be the nearest possible double
representation of that decimal number! Or, to put it that way:
floor (0.002877 * 1000000.0 + 0.5) / 1000000.0 == 0.002877 ONLY IF the
last division is done in double precision. This was pointed out in the
2004 mail.

Now, when does this double / extended precision thing occur? Apparently
only on x86 platforms with a GNU compiler where the default floating
point mode is extended, the Microsoft compiler on the other hand uses
essentially double precision (but with a larger exponent - that however
doesn't matter in this case). On amd64 platforms both the GNU compiler
and the Microsoft compiler always use the precision of the operhands.

E) Current situation
====================

The current approach is to implement arithmetic rounding. The algorithm
in math.c is essentially:

double round (double val, long places) {
   double tmp_val = val, f = pow (10.0, (double) places);
   tmp_val *= f;
   if (tmp_val >= 0.0) {
     tmp_val = floor (tmp_val + 0.5);
   } else {
     tmp_val = ceil (tmp_val - 0.5);
   }
   tmp_val /= f;
   return val;
}

This algorithm is correct - in theory. But as I already explained: Due
to precision problems, the algorithm will not always supply the correct
result.

Other than a simple NaN check, there is one thing I haven't mentioned
yet: In order to cope with the floating point precision problem, not 0.5
but rather 0.50000000001 is added or subtracted from the value. This
value is called PHP_ROUND_FUZZ.

What does this fuzz do? If you have a number such as 1.255 which yields
125.4999999999999857891452848 when multiplied by 100, adding an
additional 1e-11 will cause the rounding to work in this case. This tiny
fuzz elimitates the precision problems for these numbers.

The basic assumption behind the fuzz is that if one wants to round to a
certain level of significance, adding an 1e-11th of that magnitude will
not change the rounding result. This is true for most every-day floating
point numbers one calculates with (such as prices, e.g. adding VAT and
then rounding to 2 places precision) but will fail if one actually wants
to round numbers that contain quite a few 9s, take the examples
round(0.9499999999999,1) or round(144999999999999,-13) given by George
in his mail of 2004.

The real problem with the current PHP is that PHP does not always use
this fuzz. It is completely disabled on Windows platforms and on UNIX
platforms and on UNIX platforms, an autoconf check is done to see if the
fuzz is activated. The autoconf check consists of this algorithm:

#include <math.h>
   /* keep this out-of-line to prevent use of gcc inline floor() */
   double somefn(double n) {
     return floor(n*pow(10,2) + 0.5);
   }
   int main() {
     return somefn(0.045)/10.0 != 0.5;
   }

There are three problems with this check:

1) I can't find a platform where this check actually fails, somehow, for
that number, on every platform I tested (Linux x86 32 bit, Windows x86
32bit, Linux amd64 64 bit) ther return value of main() is zero.

2) For cross-compiling, the fuzz is always enabled.

3) It doesn't make sense! There are certainly numbers which have
different precision problems on different architectures (because of e.g.
the automatic internal expansion to extended precision) - but the basic
problem remains the same on ALL architectures: Rounding to a certain
precision causes problems due to the fact that you can't represent every
decimal number exactly as a floating point number.

[Oh, and by the way: The message "checking whether rounding works
properly" shows up at least ten times in the same console row when
running ./configure, so something doesn't work properly anyway...]

So, to sum it up, the whole fuzz stuff is currently broken because it is
always disabled on Windows, probably always disabled on UNIX unless on
some weird architecture but always enabled on UNIX if configure for some
reason thinks it's cross-compiling.

Oh, just for the record: Please note that the 2004 posting misses the
point of the fuzz, the author assumes that when switching to the double
FPU precision, the reason for implementing the fuzz disappears which
unfortunately is not the case.

F) Proposed fix
===============

1. The idea in George's 2004 posting of having a central function that
encapsules the rounding functionality is a good idea because that makes
sure PHPs behaviour is consistent with all functions that do rounding in
one or another way. On the other hand, I had a quick peek at the
spprintf/snprintf() source code and that itself seems to call
zend_dtoa() which implements quite a complex algorithm for printing
floating point numbers. I can't say how difficult it would be to adjust
that algorithm to a) either use that central function or b) at least
always show the same behaviour. But even if printf() isn't touched just
yet, it still seems like a good idea to expose a generic rounding
function as part of the PHP API for C.

2. Choose the rounding algorithm you want to use. Do you want to do
banker's rounding or arithmetic rounding? Or: How about adding an
optional parameter to the round() function to select the rounding
method?

3. Either ALWAYS use the fuzz (and accept errors in cases where one
wants to round numbers like 0.9499999999999 to 1 digit precision) or
NEVER use the fuzz (and accept errors that 1.255 is always rounded to
1.25 instead of 1.26 which would be correct, banker's and arithmetic).
Or add a parameter to round() or an ini setting to let the user choose.
This is perhaps the most important issue of all: Don't be inconsistent
Make PHP react the same way on all platforms!

4. Incorporate the proposed fix in the 2004 mail for the extended to
double truncation problem, i.e.

// before the rounding operation:
#include <fpu_control.h>
fpu_control_t cw, stored_cw;
_FPU_GETCW(stored_cw);
cw = (stored_cw & ~_FPU_EXTENDED) | _FPU_DOUBLE;
_FPU_SETCW(cw);

// after the rounding operation:
_FPU_SETCW(stored_cw);

Of course, with sufficient #ifdefs for the correct platform.

5. Document all the details with quite a few examples on the cases where
errors occur. The current documentation only states "rounds the number"
and so forth, it doesn't say according to which algorithm, it doesn't go
into the details of the edge cases where errors occur, etc. This is
probably there are so many contradictory (and often blatantly wrong!)
comments below the round() function.

The 2004 proposal also contains quite a few other additions: It also
does automatic rounding on simple operations such as $a = $b + $c, it
always rounds to the 15 significant digits precision. I'm not so sure
about changing floating point arithmetics completely, I don't think that
would be wise. The manual warns about floating point precision already
and changing that would be a major hickup.

round(), on the other hand, does more than the standard C functions do:
With the additional places argument, it allows to round not only to
integers but to arbitrary places. And this is precisely the thing that
causes all the problems. ;-)

G) Final thoughts
=================

As I have elaborated, rounding floating point numbers is highly
non-trivial. The current situation of PHP is a bit of a mess and should
definitely be cleaned up.

I am willing to provide a patch that implements the necessary changes as
soon as the desired behaviour has been discussed (i.e. whether to do
banker's rounding or arithmetic rounding, whether to use fuzz or not use
it or whether to make one or both of the two configurable).

Anyway, thanks for following me for this long. I'd appreciate comments
on this.

Regards,
Christian