Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:11519 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 85929 invoked by uid 1010); 23 Jul 2004 17:30:21 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 83782 invoked from network); 23 Jul 2004 17:29:51 -0000 Received: from unknown (HELO mail.zend.com) (80.74.107.235) by pb1.pair.com with SMTP; 23 Jul 2004 17:29:51 -0000 Received: (qmail 25519 invoked from network); 23 Jul 2004 17:29:46 -0000 Received: from localhost (HELO AndiNotebook.zend.com) (127.0.0.1) by localhost with SMTP; 23 Jul 2004 17:29:46 -0000 Message-ID: <5.1.0.14.2.20040723102540.034d9bb8@127.0.0.1> X-Sender: andi@127.0.0.1 X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Fri, 23 Jul 2004 10:29:43 -0700 To: George Whiffen ,internals@lists.php.net In-Reply-To: <20040723080319.74978.qmail@pb1.pair.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Subject: Re: [PHP-DEV] fp, rounding, decimal arithmetic - definitive account? From: andi@zend.com (Andi Gutmans) References: <20040723080319.74978.qmail@pb1.pair.com> Hi George, Although a bit long, your email was an interesting read. I am not quite sure how we should approach this problem as PHP's FP has gone through a zillion patches in the past few years. Adding another INI option is always something I try and keep away from when possible. As in general, I don't trust C floating point numbers, I do agree with Christian's approach of using BC Math. It might be because we don't know enough about it. I'd like to hear what other people here think. If it's worth messing with this code. By the way, can you please post your code changes as unified diffs? We can't go through the final code files and see what was changed. Thanks, Andi At 02:19 PM 7/23/2004 +0300, George Whiffen wrote: >For the record, herewith: >- definitive (?) list of php's fp/rounding/arithmetic issues >- analysis of the issues >- discussion of options for handling the issues >- description of a comprehensive and "painless" solution >- links to and discussion of proof-of-concept C code > >Apologies for the length of the post, but the subject matter is no more >suited to piece by piece discussion, than it is to piece >by piece resolution. > >N.B. Throughout this post "decimal digits" refers to "significant >decimal figures", and should not be confused with "decimal places". > >PHP FP/DECIMAL/ROUNDING ISSUES >============================== > >A. General Arithmetic Errors >---------------------------- > >A1: Simple Decimal Arithmetic Failure >0.8 == 0.7 + 0.1 ==> false > >A2: Composite Decimal Arithmetic Failure >print floor(8.2 - 0.2) ==> 7 > >A3: Difference Errors >print (0.5001 - 0.5) ==> 9.9999999999989E-05 > >A4: Cumulative Arithmetic Errors >for ($i=1;$i<=1000000;++$i){$sum += 0.1;}print $sum ==> 100000.00000133 > > >B. Intel (non-Win32) Specific Errors >------------------------------------ > >B1: Round (unfixed by fuzziness) >Intel GNU with FUZZ: round(12345678.0255,3) ==>12345678.025 > >B2: Exact Arithmetic Errors >Intel GNU: 0.002877 == 2877/1000000 ==> false > >B3: Round (fpu rounding twice) >Intel GNU with FUZZ: 0.002877 == round(0.002877,6) ==> false > >B4: Round - inexact double - created by fuzziness >Intel GNU with FUZZ: round(0.9499999999999,1) ==> 1.0 > >B5: Round - exact double - created by fuzziness >Intel GNU with FUZZ: round(144999999999999,-13) ==> 1.5e14 > > >C. High Exponent Round Algorithm Failure >---------------------------------------- > >C1: Cross-Platform Rounding >Most (All?) Platforms: 1e-24 == round(1e-24,24) ==> false > > >C2: Platform Specific Variations >Intel GNU: 1e-23 == round(1e-23,23) ==> false >Win32: 1e-23 == round(1e-23,23) ==> true >Intel GNU: 5e-23 == round(5e-23,23) ==> true >Win32: 5e-23 = round(5e-23,23) ==> false > > >D. Inconsistent Rounding >------------------------ > >round(0.255,2) ==> 0.26 >sprintf("%.2f",0.255) ==> 0.25 >ini_set('precision',2);print 0.255 ==> 0.26 > >round(0.235,2) ==> 0.24 >sprintf("%.2f",0.235) ==> 0.23 >ini_set('precision',2);print 0.235 ==> 0.23 > >C: printf("%.2f",0.245) ==> 0.24 >php: printf("%.2f",0.245) ==> 0.25 > > >E. Functional Limitations >------------------------- > >E1: No half-even (bankers) rounding option >See Rasmus Lerdorf's comments on Bug #24142 > >E2: No round to precision > > >ANALYSIS >======== > >A. Arithmetic Errors >-------------------- > >php implements its decimal arithmetic using C doubles, i.e. IEEE754 >64bit binary representations. While the IEEE standard guarantees exact >binary arithmetic results within range and exactly rounded outside of >range, php's decimal arithmetic results are neither exactly rounded nor >exact within range. > >C double representations of decimal numbers (e.g. 0.1) are typically >inexact. These "representation" errors, which only effect users of >C doubles for "other-base" arithmetic, can only be corrected by >rounding of the C double result in the appropriate base, base ten for >php. php does not do this, therefore arithmetic results may be in error. > >Often the incorrect results are masked by subsequent implicit rounding >e.g. at output/string conversion. However, comparisons for equality, >(A1), integer conversions (A2) and difference operations (A3) can >reveal them immediately. If the results are used for further >arithmetic operations the errors can accumulate ad infinitum (A4). > >The errors are strictly bounded. Even with maximal errors in >representation and an inexact final result, the returned result when >correctly rounded to 15 decimal digits will be the same as the true >decimal result, if it is of 15 digits or less. This is particularly >significant since 15 digits is the limit for consistent representation >of decimals by 64-bit binaries (C doubles). (Not all 16 decimal digit >numbers have distinct C double representations e.g. 0.9999999999999995 >and 0.9999999999999994 have the same C double representation). > >Therefore, with the addition of appropriate rounding, exact decimal >arithmetic to 15 significant figures is perfectly possible with php's >current fp implementation. > >What is not possible, is exactly rounded decimal arithmetic, (the basic >IEEE854 requirement) which requires the result be calculated to "infinite" >precision and rounded to the best approximation. Decimally rounded php >arithmetic can only guarantee that when the true result is >of 16 or more digits, the returned result after rounding may be either the >result immediately above or below the true result, but not necessarily the >closer of the two. Practically, exactly rounded decimal fp arithmetic >requires decimal fp hardware or complex and slow multi-precision arithmetic. > >For most php users, most of the time, exact arithmetic is quite good >enough. In particular, for money calculations, where exactness is >typically an overwhelming, and often legal, requirement, exactly rounded >results are largely irrelevant. Either calculations are simple >and guaranteed to be exact within 15 digits e.g. ledger arithmetic, or >they are very tightly specified to guarantee consistency and exactness >at each step, e.g. interest calculations. > >This is in marked contrast to typical C users, e.g. coding complex >scientific and mathematical algorithms. They need the full IEEE >functionality including access to the exact flag, rounding >mode, precision control as well as exactly rounded results, (and >appropriate numerical analysis) to validate their results. > >Unfortunately even just correct decimal rounding of C doubles is a >non-trivial matter as php's other rounding issues clearly illustrate. > > >B. Intel FPU Precision Problems >------------------------------- > >Although only specific to Intel platforms, fpu extended precision >errors in C double arithmetic, can easily confuse attempts to >correctly round decimals across platforms. > >By default Intel FPUs operate in extended precision mode. On Win32 >platforms the compiler forces them to double precision. >However, some *nix compilers e.g. GNU, leave the FPU at the extended >precision default. > >An FPU in extended precision mode creates two problems for decimal >arithetic using C doubles: > >Firstly, different results are returned depending on whether the >optimiser holds intermediate arithmetic results in extended precision >on the FPU registers or converts them to doubles on the way through >memory, (B1). > >Secondly, (and much more rarely), a single arithmetic operation with >an inexact result can be vulnerable to errors from rounding twice. >Rounding once to extended precision during the actual calculation >and then again to double precision on return of the result from >the FPU can introduce new errors, (B2 and B3). > >Both problems can be solved by a fast and thread-safe force/restore of >the FPU precision mode, (e.g. _FPU_GETCW(), _FPU_SETCW()) with no >impact on other code e.g. maths libraries. > >On the other hand, the current solution for problem B1 which has been >implemented in php_round, i.e. addition of a "fuzz factor", can itself >create new errors, (B4, B5), and does nothing to address B2 or B3. > > >C. Algorithmic Failures on Inexact Powers of Ten >------------------------------------------------ > >Normal double FPU arithmetic will typically return the correct >"nearest" C double representation of a decimal fp on division of a >rounded integer mantissa by an appropriate power of ten (i.e. the >algorithm used by php's round). However, this is only guaranteed to work >if the power of ten is itself exact i.e. between +/- 22 (1e-22 to 1e22). > >For decimals with exponents outside this range e.g. 1e-23, 5e24 etc., >php's rounding algorithm and all rounding algorithms which only use fp >arithmetic will sometimes fail to return the closest C double >representation. This becomes a problem, if, as is typically the case, >the platform's strtod() always returns the absolutely closest >representation e.g. by using multi-precision integer arithmetic. >The consequence is that 1e-24 is no longer equal to 1/1e24 or >round(1e-24,24). The simplest, but relatively expensive solution, is >to use the platform's own strtod(),sprintf() as the last part of the >rounding operation for decimal exponents greater than +/- 22 i.e. when >rounding operands with absolute value outside the range of 1e-8 to 1e23 >using the maximum sensible precision of 15 digits. This is much more >expensive in terms of performance than the pure fp algorithm which can >be safely used for numbers such as 0.0000000123456789012345 but not for >0.000000001. > > >D. Inconsistent Implicit Rounding >--------------------------------- > >Although the explicit rounding functions, i.e. php_round and >number_format, now share the same rounding macro, the implictly >rounding functions i.e. the output/string cast functions and >the sprintf/printf family do not. This leads to a host of >inconsistencies depending on which function is used and on which >platform. sprintf/printf are not even consistent with their C >counterparts. > >Much of the problem is related to the issue of whether to round >half-even (the fpu default/bankers rounding), or round half-up >(the common high-school and scientific rounding mode). Both >php's round function and the printf family attempt to round >half-up, the output/string converters stay with the fpu default >of round half-even. > >One way to comprehensively resolve this issue is to include >mode-sensitive rounding using a common rounding routine as a >first stage of the string cast/printf routines. In that case, >not only do all php's rounding functions, implicit and explicit, >return consistent results, but it becomes possible to allow full >user control over rounding mode. > > >E. Functional Limitations >------------------------- > >The absence of explicit control over rounding mode (F1) appears to have >led to much confusion even among php gurus. > >Both round-half-up and round-half-even (US banker's rounding), have >legitimate uses. The absence of more user demand for rounding mode >control probably owes more to the general fp problems forcing users >into hand-crafted solutions, e.g. application-level fixed point shift, >rather than a genuine lack of interest. > >The other obvious functional limitation with regard to rounding in >php is the requirement to code rounding to a specific precision >using php's round to places (F2), i.e. round(expression,precision) must >be coded as: > >round(expression,precision+1+floor(log10(expression))) > >This is not only clumsy but unnecessarily expensive in terms of >performance. > > >SOLUTIONS >========= > >1. User Workarounds >------------------- > >Taken altogether, the issues in php's fp implementation listed above >present the php user with a daunting task if they wish to be confident >of their arithmetic results. Practically they have at least three >alternatives: > >a) Do nothing >This is probably the most popular approach. Most of the time the >problems are succesfully masked by implicit or explicit output >conversions and problematic numbers such as 0.002877 and 0.235 >simply don't turn up. The most common problem they will come across >is in comparisons for equality about which the php documentation >has clear warnings. On average, they will "get away with it". > >The big problem is that they may not realise that their code is >intrinsically unsound and just waiting to meet a problem such as >A2 or A4. Unless they have are already aware of all the problems >testing is unlikely to be comprehensive. The danger is that problems >show up years later, perhaps because of scaling or platform changes. >Then diagnosis and correction of the code may have turned into >nightmares. It's easy to imagine a traumatised end-user when >they finally discover "that's how php arithmetic works" deciding to take >the "safe option" and go for a rewrite using .NET and its decimal classes. > >b) Avoid the fp numeric type >The experienced C/Open Source developer may well simply avoid php's >decimal arithmetic completely, preferring application-level fixed-point >shift i.e. integer arithmetic only. They will have to watch out for >php's automatic type-casting converting their integers back to floating >points, but since fp integer arithmetic is in any case exact, this is >not such a big problem. > >They are most likely to be vulnerable to the coding errors that the >extra complexity and clumsiness of fixed-point shifts necessarily >entails e.g. confusion over where the point is fixed in data and code. >Unfortunately, however experienced they may be with single fixed-point >coding, they are increasingly likely to be faced with variable fixed-point >processing to cope with increasingly complex applications e.g. >multi-currency, multi-sales-tax, multi-tier-commission e-commerce. > >True floating-point, (if results are exact), is much more suited to >generic, reusable coding. > >c) Use "string" arithmetic > >The php specialist may well prefer to keep their numbers "floating" but >to use a "string" arithmetic to ensure correct results. Whether they >opt for a string cast of fp arithmetic results or use of the bc >libraries the arithmetic should work fine. While bc is typically faster, >string casts are generally more flexible since no scale control is required. > >Of course, the problems of carrying out any required rounding on the >results remain, since bc truncates out-of-scale results and string casts >will try to round half-even. They also require greater code discipline >and certainly reduce code "readability". Nevertheless either >of these is probably a better user workaround than the other alternatives >currently available to the user. > > > >2. Developer Fixes >------------------ > >a) Replace php's fp numeric type > >This is the conventional position of most languages from Cobol in the >1960s to .NET's decimal classes. They have specific decimal types for >which decimal arithmetic works. They are typically semi-fixed, (much >like bc with its scale control) and may or may not have the advantage >of dedicated hardware. >Although php could take the same approach, it has special problems: >firstly, of course is the loose typing and automatic type conversion >e.g. context or value triggered integer/fp/string conversions; secondly >there is the issue of compatibility with extensions and external >libraries, such as the C math library; thirdly there is >the problem of ensuring backward compatibility, (a major php strength); >and finally performance, although php's arithmetic may be a couple of >orders of magnitude slower than C's, it would be sad not to take >advantage of the blindingly fast fp hardware on nearly every platform. > >In short, any change to move away from C doubles as the core number >type is fraught with problems. > >b) Problem-by-problem fixups > >This has historically been the php approach. Unfortunately the big >problem is that partial solutions, (such as the fuzzy fix for php_round), >can actually make things worse! The situation is already so complex for >the php user that even "good fixes" may simply confuse them further by >simply replacing one "gotcha" that they have coded around with the next >one on the list. > >c) A Comprehensive Solution retaining C doubles > >The good news is that the alternative of a comprehensive resolution >of all the issues is not only not impossible but also not much more work >than any one partial solution. It can also be implemented in >such a way as to not only retain full backward compatibility but to >greatly simplify the user documentation. >The key requirement is to have a safe, dependable but still fast >central rounding function. The rest is straightforward. It just needs >care over the precision used in rounding e.g. counting significant lead >zeros on differences. > >Such a solution is outlined below: > > >OUTLINE SOLUTION >================ > >The solution described here in outline and with links to proof-of-concept >code resolves all the issues listed with minimal confusion for the php >user. The only visible changes for the user are: > >a) New "exact_decimal_arithmetic" configuration variable > >When set to true, this enables fixes for all the fp issues, >i.e. arithmetic, correct rounding, rounding mode control and >consistent rounding. All the user needs to know is that while 15 >or less digit results will always be correct, 16 or more digit >results will be rounded to a 15 digit decimal immediately above or >below the true result. > >No coding changes are required to take advantage of the new feature, and >historically "unsafe" code becomes immediately "safe" when the variable is >set. It can be switched on and off at run-time at any point in a >calculation with no adverse consequences. > > >b) New "rounding_mode" configuration variable > >When exact_decimal_arithmetic is on, the value of rounding_mode, either >half_up, (the default), or half_even controls rounding mode across the >whole of php. It sets the behaviour of both explicit rounding such as >round and number_format and implicit rounding in string/output conversion, >print/sprintf. It too can be changed at any point during execution. > > >CODE CHANGES >------------ > >As of 5.0.0, implementation of a solution along these lines requires >changes to three sources: Zend/zend_operators.c, ext/standard/math.c >and formatted_print.c as follows: > >ext/standard/math.c >------------------- > >1. Replace PHP_ROUND_WITH_FUZZ macro by central_round() function >with the following features: >- non-Win32 Intel FPUs forced to double-precision >- alternate rounding depending on rounding_mode setting >- sprintf/strtod when abs(places) >= 23 >- pow() replaced by new getintpow10() function for performance > >2. New getintpow10() function (performance tweak) >- returns powers of 10 from 1 to 23 from compiler generated array > >3. New round_to_precision() function >- wrapper for central_round() >- sets "places" to match 15 digits of absolutely largest of up > to 3 non-zero double arguments > >4. New getintlog10() function (performance tweak) >- returns floor(log10(val)) >- uses fast, hard-coded "if" tree for val between 1e-8 and 1e23 > >5. Call to central_round replaces PHP_ROUND_WITH_FUZZ in: >- PHP_FUNCTION(round) >- PHPAPI _php_math_number_format > >ext/standard/formatted_print.c >------------------------------ > >1. Conditional call to round_central in: >- php_sprintf_appenddouble() > >Zend/zend_operators.c >--------------------- > >1. Conditional calls to round_central in: >- _convert_to_string() >- zend_locale_sprintf_double() > >2. Conditional calls to round_to_precision (one non-zero arg) in: >- mul_function() >- div_function() > >3. Conditional calls to round_to_precision (3 non-zero args) in: >- add_function() >- sub_function() >- adjusts for potential "catastrophic loss of precision" > > >Links to Sample Code and Sample Test >------------------------------------ >http://www.whiffen.net/phprounding/math.c_fixed >http://www.whiffen.net/phprounding/formatted_print.c_fixed >http://www.whiffen.net/phprounding/zend_operators.c_fixed >http://www.whiffen.net/phprounding/test.php >http://www.whiffen.net/phprounding/test_results_unfixed.txt >http://www.whiffen.net/phprounding/test_results_fixed.txt > >Notes on Code >------------- > >1. The code is strictly proof-of-concept. The best that >can be said about it is that it works. (I make no claims >to be a C programmer let alone a developer of php). It would >surely need extensive re-working, probably by a core developer, >before submission for consideration. > >2. It's incomplete. The major missing element is the setup >of the two new configuration variables. Code stumps are included to check >the configuration variables but in the sample code >exact_decimal_arithmetic is always true and rounding_mode is always half-up. > >3. Also omitted is the config.m4 code for establishing that >fpu switching is necessary. > >4. The code has only been tested under Debian Linux. > > >Performance >----------- > >Performance impact is negligible for low exponent operands >(15 digit numbers between 1e-8 and 1e23). > >php's round function is actually faster. > >The string conversion/output, printf and sprintf functions are already so >slow that it's hard to detect any impact. > >For arithmetic, where the extra load is detectable it is roughly >comparable to a single array or object member access i.e. >exact $a + $b is as fast as inexact $a->a + $b > >The implication is that unless use of arrays and objects should be >deprecated on performance grounds, there can be no objection on >performance grounds to enabling exact arithmetic for the vast majority of >user calculations. >On the other hand, for high-exponent operands i.e. < 1e-8 or >= 1e23, >performance impact is considerable i.e. comparable to an additional string >cast. > >Since high-exponent data is often scientific or engineering data >which is itself known to be inexact, users may well wish to leave >exact_decimal_arithmetic off for these kinds of applications. > > > >FINAL NOTE >========== > >Hopefully, after appropriate corrections have been posted, we now >have a better overall description of php's fp/rounding issues on >record. > >Furthermore it should be clear that something could be done about >all of them. Whether anything should be done is another question, >and perhaps ultimately a matter of opinion. > >As you either already know or could reasonably guess, my personal >view is that simple, consistent, exact decimal arithmetic is highly >desirable for any development tool intended for either >novices or commercial use. > >The fact that C does not offer decimal arithmetic is unsurprising given >its pedigree and key application areas. The fact that other Open-Source >tools such as tcl, Perl and MySQL have decimal problems >may owe to the difficulties of implementing solutions given their >code structure. The fact that php still has decimal problems given >its user base and the relative ease with which the problems can be solved, >is something I do not yet understand. >Awaiting enlightenment, > > >George Whiffen > >-- >PHP Internals - PHP Runtime Development Mailing List >To unsubscribe, visit: http://www.php.net/unsub.php