Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:39138 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 99504 invoked from network); 21 Jul 2008 18:40:40 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Jul 2008 18:40:40 -0000 Authentication-Results: pb1.pair.com smtp.mail=chris_se@gmx.net; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=chris_se@gmx.net; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmx.net designates 213.165.64.20 as permitted sender) X-PHP-List-Original-Sender: chris_se@gmx.net X-Host-Fingerprint: 213.165.64.20 mail.gmx.net Linux 2.5 (sometimes 2.4) (4) Received: from [213.165.64.20] ([213.165.64.20:47573] helo=mail.gmx.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 50/00-33868-328D4884 for ; Mon, 21 Jul 2008 14:40:38 -0400 Received: (qmail invoked by alias); 21 Jul 2008 18:40:17 -0000 Received: from p54A14FCE.dip.t-dialin.net (EHLO chris-se.dyndns.org) [84.161.79.206] by mail.gmx.net (mp027) with SMTP; 21 Jul 2008 20:40:17 +0200 X-Authenticated: #186999 X-Provags-ID: V01U2FsdGVkX1/qU7QI5Gfqe6YvmSI3XIiu+1rAMoNWvZGkHRN38i j5VfGKI7RLZ4ah Received: from [192.168.100.13] (cobalt.seiler.lan [192.168.100.13]) by chris-se.dyndns.org (Postfix) with ESMTP id 9B53D19446 for ; Mon, 21 Jul 2008 20:09:12 +0200 (CEST) Message-ID: <4884D7BF.909@gmx.net> Date: Mon, 21 Jul 2008 20:38:55 +0200 User-Agent: Thunderbird 2.0.0.14 (X11/20080421) MIME-Version: 1.0 To: php-dev List X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.48 Subject: Rounding in PHP and floating point numbers in general From: chris_se@gmx.net (Christian Seiler) Hi, With this posting I'd like to make another try in resolving the rounding issues in PHP. This is basically in response to (bug #42294) where I added a comment last year. Since then I have read quite a bit on this subject so I feel that I should post an update to my previous comment (which isn't completely correct, as are some other things I reference in this mail, please keep that in mind while reading them). And since the bug seems to have been forgotten, I think I should try this on the mailing list. A) General explanations ======================= Rounding methods: - Round to negative infinity: Should be obvious, some examples: -0.5 -> -1 0.5 -> 0 2.4 -> 2 -3.6 -> -4 4.8 -> 4 This is what the floor() function does. - Round to positive infinity: Obvious, some examples: -0.5 -> 0 [actually, -0] 0.5 -> 1 2.4 -> 3 -3.6 -> -3 4.8 -> 5 This is what the ceil() function does. - Round to zero: Some examples: -0.5 -> 0 [actually, -0] 0.5 -> 0 2.4 -> 2 -3.6 -> -3 4.8 -> 4 - Round away from zero: Some examples: -0.5 -> -1 0.5 -> 1 2.4 -> 3 -3.6 -> -4 4.8 -> 5 - Round no nearest: Rounds to nearest integer. Examples not involving .5: 2.4 -> 2 -3.2 -> -3 -3.6 -> -4 4.8 -> 5 2.501 -> 3 -2.501 -> -3 [Note the last two examples: they are not exactly .5 so they do not have the equal distance from both integers so the rounding direction in this case is clear.] There are four main variants of this rounding method regarding the treatment of .5 (which is exactly in the middle of two integers): * Away from zero: .5 will always be rounded to the next integer away from zero: -1.5 -> -2 1.5 -> 2 -2.5 -> -3 2.5 -> 3 This is the traditional "arithmetic rounding" people learn in school. * Towards zero: .5 will always be rounded to the next integer towards zero: -1.5 -> -1 1.5 -> 1 -2.5 -> -2 2.5 -> 2 * Towards even: .5 will always be rounded to the next even integer: -1.5 -> -2 1.5 -> 2 -2.5 -> -2 2.5 -> 2 This is the so-called "banker's rouding". * Towards odd: .5 will always be rounded to the next odd integer: -1.5 -> -1 1.5 -> 1 -2.5 -> -3 2.5 -> 3 In practice, only round to negative/positive infinity (floor/ceil), arithmetic rounding (round to nearest, away from zero) and banker's rouding (round to nearest, towards odd) are widely used. B) Rounding occurs in ===================== There are various places in PHP where rounding (explicit or implicit) occurs: Explicitely: * round() * number_format() * (sf)printf() with %3f etc. Implicitely: * float to string conversion (the 'precision' ini setting) * (sf)printf() with %g The problem is that different methods are used for this rounding. C) History of the round() function in PHP ========================================= First version in CVS (math.c): - No precision argument, always rounds to integer - Use rint() for rounding ISO C says: rint() uses the current rounding direction which on every machine known to me is round to even (so-called "banker's rounding"). But: Rounding direction may be changed by fesetround()! - On systems where rint() is not specified, rint() is emulated. But the emulation is done via an algorithm that does arithmetic rounding. Version 1.22 (May 17, 2000): - Allow arbitrary precision rounding, now implement an algorithm that does arithmetic rounding. Version 1.104 (Aug 8, 2003): - Add fuzz due to incorrect results (see below) Version 1.106 (Aug 9, 2003): - Disable fuzz on Win32, make (useless) configure check on UNIX July 23, 2004: (Slightly flawed) analysis of the FP and rounding situation in PHP and proposal to fix that on the internals mailinglist: The linked C files are still available through the wayback machine via to get an impression what George actually meant. D) Problem analysis =================== In this section I want to provide an analysis of the problems that occur with PHPs rounding functionality. First of all, rounding to integer is never a problem as long as the number of digits before the decimal point is smaller than the floating point precision. But does anyone really want to round something to 15 significant (!) digits precision and still use floats? I believe not. So, if you want to do banker's rounding, simply use rint(), if you want to do arithmetic rounding, simply do round() or if the C version doesn't permit that then use the simple algorithm rounded_val = val >= 0 ? floor (val + 0.5) : ceil (val - 0.5). Problems arise when rounding to arbitrary precision. The basic problem is that the only feasible way to do so is by multiplying the number with pow(base, precision), then round to integer and divide the result with the same factor. Example: round(0.142, 2) -> 0.142 * pow(10, 2) = 0.142 * 100 = 14.2 -> round_to_integer(14.2) = 14 -> 14/100 = 0.14 -> result is 0.14 This works independently of the rounding algorithm chosen, i.e. it doesn't matter if banker's rounding or arithmetic rounding is used. But problems arise from floating point precision. Not every finite decimal number can be represented by finite binary number, so for floating point arithmetic, the floating point number closest to the decimal number is used. For example, when using double precision, the number 0.285 is actually stored as (approximately) 0.2849999999999999755750935. This is the closest number to 0.285 a double representation (64 bits total, 52 bits for the fraction) can provide. This is usually not a problem since implicit rounding to 10 to 15 significant places is done when printing the floating point number. However, the problem arises when doing actual floating point calculations: 0.285 * 100 = 28.4999999999999964472863212 whereas 28.5 can be displayed as an exact (!) floating point number. Therefore, 0.285 * 100 != 28.5. This is bad since arithmetic rounding fails for this example. Banker's rounding on the other hand works by chance since the digit in front of the 5 is an 8 and thus even. But there are examples where banker's rounding fails for the same reason: The internal representation of 1.255 as a floating point number is 1.2549999999999998934185896, multiplying that with 100 yields 125.4999999999999857891452848 instead of 125.5. Rounding that using both banker's rounding and arithmetic rounding gives 125 and not 126 as would be expected. Also, it gets worse: There are usually three different floating point types: single (32 bit), double (64 bit) and extended (80 bit). On 32 bit x86 compatible CPUs all calculations usually take place with Extended precision registers in the FPU. But when storing data in memory or reading data from memory, the data is converted to the precision available. When converting from single or double to extended, no loss occurs (i.e. when loading memory into the FPU registers). But when converting back, precision is lost. Also, a number that is nearest to a decimal number in double precision is not necesaarily the nearest in extended precision. Take, for example, 0.002877. The nearest double is 0.0028769999999999997832012. The nearest extended precision float is 0.0028770000000000000000416. That extended precision float truncated to double is 0.0028770000000000002168821, which is in fact further away from the original decimal number than the nearest double representation. Now, the problem with arbitrary precision rounding is that you first have to multiply the number with 10^places, then round it and in the end divide the number by 10^places. If the last step, i.e. the division through 10^places is done in extended floating point precision, then the nearest number in extended precision will be chosed (this is due to the fact that every integer can be represented exactly in both double and extended precision and thus the result of floor() will be the same for both, only the effect of the division will count). If that result is then truncated to double, that may not be the nearest possible double representation of that decimal number! Or, to put it that way: floor (0.002877 * 1000000.0 + 0.5) / 1000000.0 == 0.002877 ONLY IF the last division is done in double precision. This was pointed out in the 2004 mail. Now, when does this double / extended precision thing occur? Apparently only on x86 platforms with a GNU compiler where the default floating point mode is extended, the Microsoft compiler on the other hand uses essentially double precision (but with a larger exponent - that however doesn't matter in this case). On amd64 platforms both the GNU compiler and the Microsoft compiler always use the precision of the operhands. E) Current situation ==================== The current approach is to implement arithmetic rounding. The algorithm in math.c is essentially: double round (double val, long places) { double tmp_val = val, f = pow (10.0, (double) places); tmp_val *= f; if (tmp_val >= 0.0) { tmp_val = floor (tmp_val + 0.5); } else { tmp_val = ceil (tmp_val - 0.5); } tmp_val /= f; return val; } This algorithm is correct - in theory. But as I already explained: Due to precision problems, the algorithm will not always supply the correct result. Other than a simple NaN check, there is one thing I haven't mentioned yet: In order to cope with the floating point precision problem, not 0.5 but rather 0.50000000001 is added or subtracted from the value. This value is called PHP_ROUND_FUZZ. What does this fuzz do? If you have a number such as 1.255 which yields 125.4999999999999857891452848 when multiplied by 100, adding an additional 1e-11 will cause the rounding to work in this case. This tiny fuzz elimitates the precision problems for these numbers. The basic assumption behind the fuzz is that if one wants to round to a certain level of significance, adding an 1e-11th of that magnitude will not change the rounding result. This is true for most every-day floating point numbers one calculates with (such as prices, e.g. adding VAT and then rounding to 2 places precision) but will fail if one actually wants to round numbers that contain quite a few 9s, take the examples round(0.9499999999999,1) or round(144999999999999,-13) given by George in his mail of 2004. The real problem with the current PHP is that PHP does not always use this fuzz. It is completely disabled on Windows platforms and on UNIX platforms and on UNIX platforms, an autoconf check is done to see if the fuzz is activated. The autoconf check consists of this algorithm: #include /* keep this out-of-line to prevent use of gcc inline floor() */ double somefn(double n) { return floor(n*pow(10,2) + 0.5); } int main() { return somefn(0.045)/10.0 != 0.5; } There are three problems with this check: 1) I can't find a platform where this check actually fails, somehow, for that number, on every platform I tested (Linux x86 32 bit, Windows x86 32bit, Linux amd64 64 bit) ther return value of main() is zero. 2) For cross-compiling, the fuzz is always enabled. 3) It doesn't make sense! There are certainly numbers which have different precision problems on different architectures (because of e.g. the automatic internal expansion to extended precision) - but the basic problem remains the same on ALL architectures: Rounding to a certain precision causes problems due to the fact that you can't represent every decimal number exactly as a floating point number. [Oh, and by the way: The message "checking whether rounding works properly" shows up at least ten times in the same console row when running ./configure, so something doesn't work properly anyway...] So, to sum it up, the whole fuzz stuff is currently broken because it is always disabled on Windows, probably always disabled on UNIX unless on some weird architecture but always enabled on UNIX if configure for some reason thinks it's cross-compiling. Oh, just for the record: Please note that the 2004 posting misses the point of the fuzz, the author assumes that when switching to the double FPU precision, the reason for implementing the fuzz disappears which unfortunately is not the case. F) Proposed fix =============== 1. The idea in George's 2004 posting of having a central function that encapsules the rounding functionality is a good idea because that makes sure PHPs behaviour is consistent with all functions that do rounding in one or another way. On the other hand, I had a quick peek at the spprintf/snprintf() source code and that itself seems to call zend_dtoa() which implements quite a complex algorithm for printing floating point numbers. I can't say how difficult it would be to adjust that algorithm to a) either use that central function or b) at least always show the same behaviour. But even if printf() isn't touched just yet, it still seems like a good idea to expose a generic rounding function as part of the PHP API for C. 2. Choose the rounding algorithm you want to use. Do you want to do banker's rounding or arithmetic rounding? Or: How about adding an optional parameter to the round() function to select the rounding method? 3. Either ALWAYS use the fuzz (and accept errors in cases where one wants to round numbers like 0.9499999999999 to 1 digit precision) or NEVER use the fuzz (and accept errors that 1.255 is always rounded to 1.25 instead of 1.26 which would be correct, banker's and arithmetic). Or add a parameter to round() or an ini setting to let the user choose. This is perhaps the most important issue of all: Don't be inconsistent Make PHP react the same way on all platforms! 4. Incorporate the proposed fix in the 2004 mail for the extended to double truncation problem, i.e. // before the rounding operation: #include fpu_control_t cw, stored_cw; _FPU_GETCW(stored_cw); cw = (stored_cw & ~_FPU_EXTENDED) | _FPU_DOUBLE; _FPU_SETCW(cw); // after the rounding operation: _FPU_SETCW(stored_cw); Of course, with sufficient #ifdefs for the correct platform. 5. Document all the details with quite a few examples on the cases where errors occur. The current documentation only states "rounds the number" and so forth, it doesn't say according to which algorithm, it doesn't go into the details of the edge cases where errors occur, etc. This is probably there are so many contradictory (and often blatantly wrong!) comments below the round() function. The 2004 proposal also contains quite a few other additions: It also does automatic rounding on simple operations such as $a = $b + $c, it always rounds to the 15 significant digits precision. I'm not so sure about changing floating point arithmetics completely, I don't think that would be wise. The manual warns about floating point precision already and changing that would be a major hickup. round(), on the other hand, does more than the standard C functions do: With the additional places argument, it allows to round not only to integers but to arbitrary places. And this is precisely the thing that causes all the problems. ;-) G) Final thoughts ================= As I have elaborated, rounding floating point numbers is highly non-trivial. The current situation of PHP is a bit of a mess and should definitely be cleaned up. I am willing to provide a patch that implements the necessary changes as soon as the desired behaviour has been discussed (i.e. whether to do banker's rounding or arithmetic rounding, whether to use fuzz or not use it or whether to make one or both of the two configurable). Anyway, thanks for following me for this long. I'd appreciate comments on this. Regards, Christian