Hi,
With this posting I'd like to make another try in resolving the rounding
issues in PHP.
This is basically in response to http://bugs.php.net/bug.php?id=42294
(bug #42294) where I added a comment last year. Since then I have read
quite a bit on this subject so I feel that I should post an update to my
previous comment (which isn't completely correct, as are some other
things I reference in this mail, please keep that in mind while reading
them). And since the bug seems to have been forgotten, I think I should
try this on the mailing list.
A) General explanations
Rounding methods:
-
Round to negative infinity: Should be obvious, some examples:
-0.5 -> -1
0.5 -> 0
2.4 -> 2
-3.6 -> -4
4.8 -> 4
This is what thefloor()
function does. -
Round to positive infinity: Obvious, some examples:
-0.5 -> 0 [actually, -0]
0.5 -> 1
2.4 -> 3
-3.6 -> -3
4.8 -> 5
This is what theceil()
function does. -
Round to zero: Some examples:
-0.5 -> 0 [actually, -0]
0.5 -> 0
2.4 -> 2
-3.6 -> -3
4.8 -> 4 -
Round away from zero: Some examples:
-0.5 -> -1
0.5 -> 1
2.4 -> 3
-3.6 -> -4
4.8 -> 5 -
Round no nearest: Rounds to nearest integer. Examples not involving
.5:
2.4 -> 2
-3.2 -> -3
-3.6 -> -4
4.8 -> 5
2.501 -> 3
-2.501 -> -3
[Note the last two examples: they are not exactly .5 so they do not
have the equal distance from both integers so the rounding direction
in this case is clear.]There are four main variants of this rounding method regarding the
treatment of .5 (which is exactly in the middle of two integers):-
Away from zero: .5 will always be rounded to the next integer
away from zero:
-1.5 -> -2
1.5 -> 2
-2.5 -> -3
2.5 -> 3
This is the traditional "arithmetic rounding" people learn in
school. -
Towards zero: .5 will always be rounded to the next integer
towards zero:
-1.5 -> -1
1.5 -> 1
-2.5 -> -2
2.5 -> 2 -
Towards even: .5 will always be rounded to the next even integer:
-1.5 -> -2
1.5 -> 2
-2.5 -> -2
2.5 -> 2
This is the so-called "banker's rouding". -
Towards odd: .5 will always be rounded to the next odd integer:
-1.5 -> -1
1.5 -> 1
-2.5 -> -3
2.5 -> 3
-
In practice, only round to negative/positive infinity (floor/ceil),
arithmetic rounding (round to nearest, away from zero) and banker's
rouding (round to nearest, towards odd) are widely used.
B) Rounding occurs in
There are various places in PHP where rounding (explicit or implicit)
occurs:
Explicitely:
-
round()
-
number_format()
- (sf)printf() with %3f etc.
Implicitely:
- float to string conversion (the 'precision' ini setting)
- (sf)printf() with %g
The problem is that different methods are used for this rounding.
C) History of the round()
function in PHP
First version in CVS (math.c):
- No precision argument, always rounds to integer
- Use rint() for rounding
ISO C says: rint() uses the current rounding direction which on
every machine known to me is round to even (so-called "banker's
rounding"). But: Rounding direction may be changed by
fesetround()! - On systems where rint() is not specified, rint() is emulated. But the
emulation is done via an algorithm that does arithmetic rounding.
Version 1.22 (May 17, 2000):
- Allow arbitrary precision rounding, now implement an algorithm that
does arithmetic rounding.
Version 1.104 (Aug 8, 2003):
- Add fuzz due to incorrect results (see below)
Version 1.106 (Aug 9, 2003):
- Disable fuzz on Win32, make (useless) configure check on UNIX
July 23, 2004: (Slightly flawed) analysis of the FP and rounding
situation in PHP and proposal to fix that on the internals mailinglist:
http://marc.info/?l=php-internals&m=109057070829170&w=2
The linked C files are still available through the wayback machine via
http://web.archive.org/ to get an impression what George actually
meant.
D) Problem analysis
In this section I want to provide an analysis of the problems that occur
with PHPs rounding functionality.
First of all, rounding to integer is never a problem as long as the
number of digits before the decimal point is smaller than the floating
point precision. But does anyone really want to round something to 15
significant (!) digits precision and still use floats? I believe not.
So, if you want to do banker's rounding, simply use rint(), if you want
to do arithmetic rounding, simply do round()
or if the C version doesn't
permit that then use the simple algorithm
rounded_val = val >= 0 ? floor (val + 0.5) : ceil (val - 0.5).
Problems arise when rounding to arbitrary precision. The basic problem
is that the only feasible way to do so is by multiplying the number with
pow(base, precision), then round to integer and divide the result with
the same factor.
Example:
round(0.142, 2)
-> 0.142 * pow(10, 2) = 0.142 * 100 = 14.2
-> round_to_integer(14.2) = 14
-> 14/100 = 0.14
-> result is 0.14
This works independently of the rounding algorithm chosen, i.e. it
doesn't matter if banker's rounding or arithmetic rounding is used.
But problems arise from floating point precision. Not every finite
decimal number can be represented by finite binary number, so for
floating point arithmetic, the floating point number closest to the
decimal number is used. For example, when using double precision, the
number 0.285 is actually stored as (approximately)
0.2849999999999999755750935. This is the closest number to 0.285 a
double representation (64 bits total, 52 bits for the fraction) can
provide. This is usually not a problem since implicit rounding to 10 to
15 significant places is done when printing the floating point number.
However, the problem arises when doing actual floating point
calculations: 0.285 * 100 = 28.4999999999999964472863212 whereas 28.5
can be displayed as an exact (!) floating point number. Therefore,
0.285 * 100 != 28.5. This is bad since arithmetic rounding fails for
this example. Banker's rounding on the other hand works by chance since
the digit in front of the 5 is an 8 and thus even. But there are
examples where banker's rounding fails for the same reason: The internal
representation of 1.255 as a floating point number is
1.2549999999999998934185896, multiplying that with 100 yields
125.4999999999999857891452848 instead of 125.5. Rounding that using both
banker's rounding and arithmetic rounding gives 125 and not 126 as would
be expected.
Also, it gets worse: There are usually three different floating point
types: single (32 bit), double (64 bit) and extended (80 bit). On 32 bit
x86 compatible CPUs all calculations usually take place with Extended
precision registers in the FPU. But when storing data in memory or
reading data from memory, the data is converted to the precision
available. When converting from single or double to extended, no loss
occurs (i.e. when loading memory into the FPU registers). But when
converting back, precision is lost. Also, a number that is nearest to a
decimal number in double precision is not necesaarily the nearest in
extended precision. Take, for example, 0.002877. The nearest double is
0.0028769999999999997832012. The nearest extended precision float is
0.0028770000000000000000416. That extended precision float truncated to
double is 0.0028770000000000002168821, which is in fact further away
from the original decimal number than the nearest double representation.
Now, the problem with arbitrary precision rounding is that you first
have to multiply the number with 10^places, then round it and in the end
divide the number by 10^places. If the last step, i.e. the division
through 10^places is done in extended floating point precision, then the
nearest number in extended precision will be chosed (this is due to the
fact that every integer can be represented exactly in both double and
extended precision and thus the result of floor()
will be the same for
both, only the effect of the division will count). If that result is
then truncated to double, that may not be the nearest possible double
representation of that decimal number! Or, to put it that way:
floor (0.002877 * 1000000.0 + 0.5) / 1000000.0 == 0.002877 ONLY IF the
last division is done in double precision. This was pointed out in the
2004 mail.
Now, when does this double / extended precision thing occur? Apparently
only on x86 platforms with a GNU compiler where the default floating
point mode is extended, the Microsoft compiler on the other hand uses
essentially double precision (but with a larger exponent - that however
doesn't matter in this case). On amd64 platforms both the GNU compiler
and the Microsoft compiler always use the precision of the operhands.
E) Current situation
The current approach is to implement arithmetic rounding. The algorithm
in math.c is essentially:
double round (double val, long places) {
double tmp_val = val, f = pow (10.0, (double) places);
tmp_val *= f;
if (tmp_val >= 0.0) {
tmp_val = floor (tmp_val + 0.5);
} else {
tmp_val = ceil (tmp_val - 0.5);
}
tmp_val /= f;
return val;
}
This algorithm is correct - in theory. But as I already explained: Due
to precision problems, the algorithm will not always supply the correct
result.
Other than a simple NaN check, there is one thing I haven't mentioned
yet: In order to cope with the floating point precision problem, not 0.5
but rather 0.50000000001 is added or subtracted from the value. This
value is called PHP_ROUND_FUZZ.
What does this fuzz do? If you have a number such as 1.255 which yields
125.4999999999999857891452848 when multiplied by 100, adding an
additional 1e-11 will cause the rounding to work in this case. This tiny
fuzz elimitates the precision problems for these numbers.
The basic assumption behind the fuzz is that if one wants to round to a
certain level of significance, adding an 1e-11th of that magnitude will
not change the rounding result. This is true for most every-day floating
point numbers one calculates with (such as prices, e.g. adding VAT and
then rounding to 2 places precision) but will fail if one actually wants
to round numbers that contain quite a few 9s, take the examples
round(0.9499999999999,1) or round(144999999999999,-13) given by George
in his mail of 2004.
The real problem with the current PHP is that PHP does not always use
this fuzz. It is completely disabled on Windows platforms and on UNIX
platforms and on UNIX platforms, an autoconf check is done to see if the
fuzz is activated. The autoconf check consists of this algorithm:
#include <math.h>
/* keep this out-of-line to prevent use of gcc inline floor()
/
double somefn(double n) {
return floor(npow(10,2) + 0.5);
}
int main() {
return somefn(0.045)/10.0 != 0.5;
}
There are three problems with this check:
-
I can't find a platform where this check actually fails, somehow, for
that number, on every platform I tested (Linux x86 32 bit, Windows x86
32bit, Linux amd64 64 bit) ther return value of main() is zero. -
For cross-compiling, the fuzz is always enabled.
-
It doesn't make sense! There are certainly numbers which have
different precision problems on different architectures (because of e.g.
the automatic internal expansion to extended precision) - but the basic
problem remains the same on ALL architectures: Rounding to a certain
precision causes problems due to the fact that you can't represent every
decimal number exactly as a floating point number.
[Oh, and by the way: The message "checking whether rounding works
properly" shows up at least ten times in the same console row when
running ./configure, so something doesn't work properly anyway...]
So, to sum it up, the whole fuzz stuff is currently broken because it is
always disabled on Windows, probably always disabled on UNIX unless on
some weird architecture but always enabled on UNIX if configure for some
reason thinks it's cross-compiling.
Oh, just for the record: Please note that the 2004 posting misses the
point of the fuzz, the author assumes that when switching to the double
FPU precision, the reason for implementing the fuzz disappears which
unfortunately is not the case.
F) Proposed fix
-
The idea in George's 2004 posting of having a central function that
encapsules the rounding functionality is a good idea because that makes
sure PHPs behaviour is consistent with all functions that do rounding in
one or another way. On the other hand, I had a quick peek at the
spprintf/snprintf() source code and that itself seems to call
zend_dtoa() which implements quite a complex algorithm for printing
floating point numbers. I can't say how difficult it would be to adjust
that algorithm to a) either use that central function or b) at least
always show the same behaviour. But even ifprintf()
isn't touched just
yet, it still seems like a good idea to expose a generic rounding
function as part of the PHP API for C. -
Choose the rounding algorithm you want to use. Do you want to do
banker's rounding or arithmetic rounding? Or: How about adding an
optional parameter to theround()
function to select the rounding
method? -
Either ALWAYS use the fuzz (and accept errors in cases where one
wants to round numbers like 0.9499999999999 to 1 digit precision) or
NEVER use the fuzz (and accept errors that 1.255 is always rounded to
1.25 instead of 1.26 which would be correct, banker's and arithmetic).
Or add a parameter toround()
or an ini setting to let the user choose.
This is perhaps the most important issue of all: Don't be inconsistent
Make PHP react the same way on all platforms! -
Incorporate the proposed fix in the 2004 mail for the extended to
double truncation problem, i.e.
// before the rounding operation:
#include <fpu_control.h>
fpu_control_t cw, stored_cw;
_FPU_GETCW(stored_cw);
cw = (stored_cw & ~_FPU_EXTENDED) | _FPU_DOUBLE;
_FPU_SETCW(cw);
// after the rounding operation:
_FPU_SETCW(stored_cw);
Of course, with sufficient #ifdefs for the correct platform.
- Document all the details with quite a few examples on the cases where
errors occur. The current documentation only states "rounds the number"
and so forth, it doesn't say according to which algorithm, it doesn't go
into the details of the edge cases where errors occur, etc. This is
probably there are so many contradictory (and often blatantly wrong!)
comments below theround()
function.
The 2004 proposal also contains quite a few other additions: It also
does automatic rounding on simple operations such as $a = $b + $c, it
always rounds to the 15 significant digits precision. I'm not so sure
about changing floating point arithmetics completely, I don't think that
would be wise. The manual warns about floating point precision already
and changing that would be a major hickup.
round()
, on the other hand, does more than the standard C functions do:
With the additional places argument, it allows to round not only to
integers but to arbitrary places. And this is precisely the thing that
causes all the problems. ;-)
G) Final thoughts
As I have elaborated, rounding floating point numbers is highly
non-trivial. The current situation of PHP is a bit of a mess and should
definitely be cleaned up.
I am willing to provide a patch that implements the necessary changes as
soon as the desired behaviour has been discussed (i.e. whether to do
banker's rounding or arithmetic rounding, whether to use fuzz or not use
it or whether to make one or both of the two configurable).
Anyway, thanks for following me for this long. I'd appreciate comments
on this.
Regards,
Christian
Posting to newsgroup php.internals, you wrote:
[...]
round()
, on the other hand, does more than the standard C functions do:
With the additional places argument, it allows to round not only to
integers but to arbitrary places. And this is precisely the thing that
causes all the problems. ;-)
Agree, all the problems you pointed out are due to the different representation
of the floating point numbers ("float" for short) against the decimal numbers
we humans are used to write.
A "float" is typically a 53 bits number (IEEE 754 double precision) that can
always be represented exactly as decimal.
But the vice-versa it not true, and also very simple decimal numbers do not
have a finite "float" rapresentation and must be trunked or rounded. So for
example the number 0.1(dec) takes infinite decimal binary gigit once
converted to float:
0.1(dec) = 0.00011001100110011001100110011001100110011...(bin)
Since IEEE 754 double-precision numbers hold 53 bits, this value has to be
rounded to
0.00011001100110011001100110011001100110011001100110011010(bit)
(note the rounding on the last 2 binary digits). This latter value is what will
be stored in the $f variable after a statement like this one:
$f = 0.1;
Note too that the value actually stored in $f differs from that we may expect
simply reading the source: the difference is very small, but it exists. "Float"
values can always be converted back in decimal base with exact precision, so
for example in our case the $f variable now contains exactly
$f == 0.1000000000000000055511151231257827021181583404541015625(dec)
No surprise on that: the "error" was introduced by the PHP parser itself
when the statement "$f = 0.1;" was encountered the first time.
Now going back to the round()
function, it is completely misleading to
analize statements like this one
echo round(0.285, 2);
because here we have actually two conversion problems:
1. The "0.285" decimal number must be converted to "float" at the parse
stage of the PHP interpreter, and that brings to "unexpected" results
due to the rounding performed in this conversion.
2. The `round()` function gets a number that already differs from that we
see in the source, then it perfoms its calculations thinking at the
number as it were a decimal number (multiplying by a power of 10,
adding a rounding term, and then trunking the result), and then
performs a division by a negative power of 10 that cannot be
represented exactly with a float and that must be rounded before the
division be performed, and then the result of the division must be
rounded again to finally give the result of the function.
All these dec-to-bin conversions and "float" roundings sometimes bring to
unexpected results as already pointed out in the 42294 bug: we (humans)
continue to think at numbers like 0.285 as it they were decimals, but these
numbers are actually handled in a completely different base inside the
computer.
The only real bug is the round()
function itself, and number_format()
or
printf()
should be used instead, because these function first convert the
binary float to decimal, and then perform the rounding over the decimal number.
But remember that also these functions operates on floats, i.e. numbers that
already may differ from those we see in the PHP source, so also with these
function we can see "strange" results.
If we really want a round()
function, this function should operate just like
number_format()
and printf()
already do, i.e. returning a string, not a double,
or the second argument of the function should be "deprecated" or dropped at
all.
A final note: programmers concerned with rounding problems and looking for
"exact" results while dealing with floating point numbers are usually
erroneusly using "float" values to store monetary values as if PHP were a desk
calculator, rather than use BCMath or other specialized high-precision library
suitable for business applications.
Just my 0.019999999999 Euros.
Regards,
/|\ Umberto Salsi
/_/ www.icosaedro.it
Hi!
Note too that the value actually stored in $f differs from that we may expect
simply reading the source: the difference is very small, but it exists. "Float"
values can always be converted back in decimal base with exact precision, so
for example in our case the $f variable now contains exactly$f == 0.1000000000000000055511151231257827021181583404541015625(dec)
No surprise on that: the "error" was introduced by the PHP parser itself
when the statement "$f = 0.1;" was encountered the first time.
Yes, sure. But IEEE 754 guarantees that the first 15 significant decimal
digits are correct (for double precision anyway), and in your example,
the first 17 significant decimal digits are in fact correct.
So, if you always round to the first 15 significant decimal digits when
outputting the floats, you always get correct results within that
decimal precision.
All these dec-to-bin conversions and "float" roundings sometimes bring to
unexpected results as already pointed out in the 42294 bug: we (humans)
continue to think at numbers like 0.285 as it they were decimals, but these
numbers are actually handled in a completely different base inside the
computer.
Look it from this perspective: If you do echo 0.285 in current PHP, it
will output 0.285. This is due to the 'precision' ini setting (default
is 14) that implicitly rounds the result. And within those 14 decimal
digits, the results ARE guaranteed to be exact by IEEE 754.
The only real bug is the
round()
function itself, andnumber_format()
or
printf()
should be used instead, because these function first convert the
binary float to decimal, and then perform the rounding over the decimal number.
Not entirely true. ;-) number_format()
uses round internally (math.c,
PHP 5.3 branch, line 944).
And, sprintf ("%.2f", 0.285) == "0.28" btw. - because even if you
convert the float back to a full decimal number, you get
0.284999999...something, so converting to string and rounding then
doesn't work either (at least not as a user would expect). This is the
reason why C for example doesn't bother to provide a round()
function
for arbitrary precision, it can only round to integer which always can
be represented in an exact way.
If we really want a
round()
function, this function should operate just like
number_format()
andprintf()
already do, i.e. returning a string, not a double,
The problem is not returning a double. For example, literal 0.285 in the
source and the result of the expression round (0.2846, 3) are the same
floats. They are not exactly 0.285 but they are represented in the same
way (assuming the FPU mode thing gets added, see my first mail).
The actual problem is that the double you put in to round()
is not
always what people think it is (as you already said). So letting round()
return a string() will not solve the underlying issues.
A final note: programmers concerned with rounding problems and looking for
"exact" results while dealing with floating point numbers are usually
erroneusly using "float" values to store monetary values as if PHP were a desk
calculator, rather than use BCMath or other specialized high-precision library
suitable for business applications.
The problem is that bcmath is not available everywhere - neither are
other such libraries. Also, using bcmath is quite cumberstone to read.
With floats you can do $a = $b + $c * $d + $e / $f - $g * $h / $j which
is very clear to read while with bcmath you have lots of nested function
calls which is really bad to read. And even if you have monetary
calculations: Normally, you are WELL within 14 decimal digits precision.
You also have to consider the following: If this were a proposal to
introduce a precision parameter into the round function, I would
completely agree that adding it will cause unecessary problems and
should not be done. But the fact is that it is already there in PHP and
you can't remove it with some serious BC breaks. And people are using
it. So may proposal is to make sure people get the best results
possible. (and since my proposal leaves some key issues open, what the
"best results possible" are is open to discussion, cf. fuzz vs. no-fuzz
etc.)
Anyway, I'll put my proposal into the wiki tomorrow with some additional
thoughts, perhaps my intentions are clearer then.
Regards,
Christian