For the record, herewith:
- definitive (?) list of php's fp/rounding/arithmetic issues
- analysis of the issues
- discussion of options for handling the issues
- description of a comprehensive and "painless" solution
- links to and discussion of proof-of-concept C code
Apologies for the length of the post, but the subject matter is
no more suited to piece by piece discussion, than it is to piece
by piece resolution.
N.B. Throughout this post "decimal digits" refers to "significant
decimal figures", and should not be confused with "decimal places".
PHP FP/DECIMAL/ROUNDING ISSUES
A. General Arithmetic Errors
A1: Simple Decimal Arithmetic Failure
0.8 == 0.7 + 0.1 ==> false
A2: Composite Decimal Arithmetic Failure
print floor(8.2 - 0.2) ==> 7
A3: Difference Errors
print (0.5001 - 0.5) ==> 9.9999999999989E-05
A4: Cumulative Arithmetic Errors
for ($i=1;$i<=1000000;++$i){$sum += 0.1;}print $sum ==> 100000.00000133
B. Intel (non-Win32) Specific Errors
B1: Round (unfixed by fuzziness)
Intel GNU with FUZZ: round(12345678.0255,3) ==>12345678.025
B2: Exact Arithmetic Errors
Intel GNU: 0.002877 == 2877/1000000 ==> false
B3: Round (fpu rounding twice)
Intel GNU with FUZZ: 0.002877 == round(0.002877,6) ==> false
B4: Round - inexact double - created by fuzziness
Intel GNU with FUZZ: round(0.9499999999999,1) ==> 1.0
B5: Round - exact double - created by fuzziness
Intel GNU with FUZZ: round(144999999999999,-13) ==> 1.5e14
C. High Exponent Round Algorithm Failure
C1: Cross-Platform Rounding
Most (All?) Platforms: 1e-24 == round(1e-24,24) ==> false
C2: Platform Specific Variations
Intel GNU: 1e-23 == round(1e-23,23) ==> false
Win32: 1e-23 == round(1e-23,23) ==> true
Intel GNU: 5e-23 == round(5e-23,23) ==> true
Win32: 5e-23 = round(5e-23,23) ==> false
D. Inconsistent Rounding
round(0.255,2) ==> 0.26
sprintf("%.2f",0.255) ==> 0.25
ini_set('precision',2);print 0.255 ==> 0.26
round(0.235,2) ==> 0.24
sprintf("%.2f",0.235) ==> 0.23
ini_set('precision',2);print 0.235 ==> 0.23
C: printf("%.2f",0.245) ==> 0.24
php: printf("%.2f",0.245) ==> 0.25
E. Functional Limitations
E1: No half-even (bankers) rounding option
See Rasmus Lerdorf's comments on Bug #24142
E2: No round to precision
ANALYSIS
A. Arithmetic Errors
php implements its decimal arithmetic using C doubles, i.e. IEEE754
64bit binary representations. While the IEEE standard guarantees
exact binary arithmetic results within range and exactly rounded
outside of range, php's decimal arithmetic results are neither
exactly rounded nor exact within range.
C double representations of decimal numbers (e.g. 0.1) are typically
inexact. These "representation" errors, which only effect users of
C doubles for "other-base" arithmetic, can only be corrected by
rounding of the C double result in the appropriate base, base ten for
php. php does not do this, therefore arithmetic results may be in error.
Often the incorrect results are masked by subsequent implicit rounding
e.g. at output/string conversion. However, comparisons for equality,
(A1), integer conversions (A2) and difference operations (A3) can
reveal them immediately. If the results are used for further
arithmetic operations the errors can accumulate ad infinitum (A4).
The errors are strictly bounded. Even with maximal errors in
representation and an inexact final result, the returned result when
correctly rounded to 15 decimal digits will be the same as the true
decimal result, if it is of 15 digits or less. This is particularly
significant since 15 digits is the limit for consistent representation
of decimals by 64-bit binaries (C doubles). (Not all 16 decimal digit
numbers have distinct C double representations e.g. 0.9999999999999995
and 0.9999999999999994 have the same C double representation).
Therefore, with the addition of appropriate rounding, exact decimal
arithmetic to 15 significant figures is perfectly possible with php's
current fp implementation.
What is not possible, is exactly rounded decimal arithmetic, (the basic
IEEE854 requirement) which requires the result be calculated to "infinite"
precision and rounded to the best approximation. Decimally rounded php
arithmetic can only guarantee that when the true result is
of 16 or more digits, the returned result after rounding may be either
the result immediately above or below the true result, but not
necessarily the closer of the two. Practically, exactly rounded
decimal fp arithmetic requires decimal fp hardware or complex and
slow multi-precision arithmetic.
For most php users, most of the time, exact arithmetic is quite good
enough. In particular, for money calculations, where exactness is
typically an overwhelming, and often legal, requirement, exactly
rounded results are largely irrelevant. Either calculations are simple
and guaranteed to be exact within 15 digits e.g. ledger arithmetic, or
they are very tightly specified to guarantee consistency and exactness
at each step, e.g. interest calculations.
This is in marked contrast to typical C users, e.g. coding complex
scientific and mathematical algorithms. They need the full IEEE
functionality including access to the exact flag, rounding
mode, precision control as well as exactly rounded results, (and
appropriate numerical analysis) to validate their results.
Unfortunately even just correct decimal rounding of C doubles is a
non-trivial matter as php's other rounding issues clearly illustrate.
B. Intel FPU Precision Problems
Although only specific to Intel platforms, fpu extended precision
errors in C double arithmetic, can easily confuse attempts to
correctly round decimals across platforms.
By default Intel FPUs operate in extended precision mode. On Win32
platforms the compiler forces them to double precision.
However, some *nix compilers e.g. GNU, leave the FPU at the extended
precision default.
An FPU in extended precision mode creates two problems for decimal
arithetic using C doubles:
Firstly, different results are returned depending on whether the
optimiser holds intermediate arithmetic results in extended precision
on the FPU registers or converts them to doubles on the way through
memory, (B1).
Secondly, (and much more rarely), a single arithmetic operation with
an inexact result can be vulnerable to errors from rounding twice.
Rounding once to extended precision during the actual calculation
and then again to double precision on return of the result from
the FPU can introduce new errors, (B2 and B3).
Both problems can be solved by a fast and thread-safe force/restore of
the FPU precision mode, (e.g. _FPU_GETCW(), _FPU_SETCW()) with no
impact on other code e.g. maths libraries.
On the other hand, the current solution for problem B1 which has
been implemented in php_round, i.e. addition of a "fuzz factor", can
itself create new errors, (B4, B5), and does nothing to address B2 or B3.
C. Algorithmic Failures on Inexact Powers of Ten
Normal double FPU arithmetic will typically return the correct
"nearest" C double representation of a decimal fp on division of a
rounded integer mantissa by an appropriate power of ten (i.e. the
algorithm used by php's round). However, this is only guaranteed to
work if the power of ten is itself exact i.e. between +/- 22 (1e-22 to 1e22).
For decimals with exponents outside this range e.g. 1e-23, 5e24 etc.,
php's rounding algorithm and all rounding algorithms which only use fp
arithmetic will sometimes fail to return the closest C double
representation. This becomes a problem, if, as is typically the case,
the platform's strtod() always returns the absolutely closest
representation e.g. by using multi-precision integer arithmetic.
The consequence is that 1e-24 is no longer equal to 1/1e24 or
round(1e-24,24). The simplest, but relatively expensive solution, is
to use the platform's own strtod(),sprintf() as the last part of the
rounding operation for decimal exponents greater than +/- 22 i.e. when
rounding operands with absolute value outside the range of 1e-8 to 1e23
using the maximum sensible precision of 15 digits. This is much more
expensive in terms of performance than the pure fp algorithm which can
be safely used for numbers such as 0.0000000123456789012345 but not
for 0.000000001.
D. Inconsistent Implicit Rounding
Although the explicit rounding functions, i.e. php_round and
number_format, now share the same rounding macro, the implictly
rounding functions i.e. the output/string cast functions and
the sprintf/printf family do not. This leads to a host of
inconsistencies depending on which function is used and on which
platform. sprintf/printf are not even consistent with their C
counterparts.
Much of the problem is related to the issue of whether to round
half-even (the fpu default/bankers rounding), or round half-up
(the common high-school and scientific rounding mode). Both
php's round function and the printf family attempt to round
half-up, the output/string converters stay with the fpu default
of round half-even.
One way to comprehensively resolve this issue is to include
mode-sensitive rounding using a common rounding routine as a
first stage of the string cast/printf routines. In that case,
not only do all php's rounding functions, implicit and explicit,
return consistent results, but it becomes possible to allow full
user control over rounding mode.
E. Functional Limitations
The absence of explicit control over rounding mode (F1) appears to have
led to much confusion even among php gurus.
Both round-half-up and round-half-even (US banker's rounding), have
legitimate uses. The absence of more user demand for rounding mode
control probably owes more to the general fp problems forcing users
into hand-crafted solutions, e.g. application-level fixed point shift,
rather than a genuine lack of interest.
The other obvious functional limitation with regard to rounding in
php is the requirement to code rounding to a specific precision
using php's round to places (F2), i.e. round(expression,precision) must
be coded as:
round(expression,precision+1+floor(log10(expression)))
This is not only clumsy but unnecessarily expensive in terms of
performance.
SOLUTIONS
- User Workarounds
Taken altogether, the issues in php's fp implementation listed above
present the php user with a daunting task if they wish to be confident
of their arithmetic results. Practically they have at least three
alternatives:
a) Do nothing
This is probably the most popular approach. Most of the time the
problems are succesfully masked by implicit or explicit output
conversions and problematic numbers such as 0.002877 and 0.235
simply don't turn up. The most common problem they will come across
is in comparisons for equality about which the php documentation
has clear warnings. On average, they will "get away with it".
The big problem is that they may not realise that their code is
intrinsically unsound and just waiting to meet a problem such as
A2 or A4. Unless they have are already aware of all the problems
testing is unlikely to be comprehensive. The danger is that problems
show up years later, perhaps because of scaling or platform changes.
Then diagnosis and correction of the code may have turned into
nightmares. It's easy to imagine a traumatised end-user when
they finally discover "that's how php arithmetic works" deciding
to take the "safe option" and go for a rewrite using .NET and its
decimal classes.
b) Avoid the fp numeric type
The experienced C/Open Source developer may well simply avoid php's
decimal arithmetic completely, preferring application-level fixed-point
shift i.e. integer arithmetic only. They will have to watch out for
php's automatic type-casting converting their integers back to floating
points, but since fp integer arithmetic is in any case exact, this is
not such a big problem.
They are most likely to be vulnerable to the coding errors that the
extra complexity and clumsiness of fixed-point shifts necessarily
entails e.g. confusion over where the point is fixed in data and code.
Unfortunately, however experienced they may be with single fixed-point
coding, they are increasingly likely to be faced with variable
fixed-point processing to cope with increasingly complex applications
e.g. multi-currency, multi-sales-tax, multi-tier-commission e-commerce.
True floating-point, (if results are exact), is much more suited to
generic, reusable coding.
c) Use "string" arithmetic
The php specialist may well prefer to keep their numbers "floating" but
to use a "string" arithmetic to ensure correct results. Whether they
opt for a string cast of fp arithmetic results or use of the bc
libraries the arithmetic should work fine. While bc is typically
faster, string casts are generally more flexible since no scale
control is required.
Of course, the problems of carrying out any required rounding on
the results remain, since bc truncates out-of-scale results and string
casts will try to round half-even. They also require greater code
discipline and certainly reduce code "readability". Nevertheless either
of these is probably a better user workaround than the other
alternatives currently available to the user.
- Developer Fixes
a) Replace php's fp numeric type
This is the conventional position of most languages from Cobol in the
1960s to .NET's decimal classes. They have specific decimal types for
which decimal arithmetic works. They are typically semi-fixed, (much
like bc with its scale control) and may or may not have the advantage
of dedicated hardware.
Although php could take the same approach, it has special problems:
firstly, of course is the loose typing and automatic type conversion
e.g. context or value triggered integer/fp/string conversions;
secondly there is the issue of compatibility with extensions and
external libraries, such as the C math library; thirdly there is
the problem of ensuring backward compatibility, (a major php strength);
and finally performance, although php's arithmetic may be a couple of
orders of magnitude slower than C's, it would be sad not to take
advantage of the blindingly fast fp hardware on nearly every platform.
In short, any change to move away from C doubles as the core number
type is fraught with problems.
b) Problem-by-problem fixups
This has historically been the php approach. Unfortunately the big
problem is that partial solutions, (such as the fuzzy fix for php_round),
can actually make things worse! The situation is already so complex for
the php user that even "good fixes" may simply confuse them further by
simply replacing one "gotcha" that they have coded around with the next
one on the list.
c) A Comprehensive Solution retaining C doubles
The good news is that the alternative of a comprehensive resolution
of all the issues is not only not impossible but also not much more
work than any one partial solution. It can also be implemented in
such a way as to not only retain full backward compatibility but to
greatly simplify the user documentation.
The key requirement is to have a safe, dependable but still fast
central rounding function. The rest is straightforward. It just
needs care over the precision used in rounding e.g. counting
significant lead zeros on differences.
Such a solution is outlined below:
OUTLINE SOLUTION
The solution described here in outline and with links to
proof-of-concept code resolves all the issues listed with minimal
confusion for the php user. The only visible changes for the user
are:
a) New "exact_decimal_arithmetic" configuration variable
When set to true, this enables fixes for all the fp issues,
i.e. arithmetic, correct rounding, rounding mode control and
consistent rounding. All the user needs to know is that while 15
or less digit results will always be correct, 16 or more digit
results will be rounded to a 15 digit decimal immediately above or
below the true result.
No coding changes are required to take advantage of the new feature,
and historically "unsafe" code becomes immediately "safe" when the
variable is set. It can be switched on and off at run-time at any
point in a calculation with no adverse consequences.
b) New "rounding_mode" configuration variable
When exact_decimal_arithmetic is on, the value of rounding_mode, either
half_up, (the default), or half_even controls rounding mode across the
whole of php. It sets the behaviour of both explicit rounding such as
round and number_format and implicit rounding in string/output
conversion, print/sprintf. It too can be changed at any point during
execution.
CODE CHANGES
As of 5.0.0, implementation of a solution along these lines requires
changes to three sources: Zend/zend_operators.c, ext/standard/math.c
and formatted_print.c as follows:
ext/standard/math.c
- Replace PHP_ROUND_WITH_FUZZ macro by central_round() function
with the following features:
- non-Win32 Intel FPUs forced to double-precision
- alternate rounding depending on rounding_mode setting
- sprintf/strtod when abs(places) >= 23
-
pow()
replaced by new getintpow10() function for performance
- New getintpow10() function (performance tweak)
- returns powers of 10 from 1 to 23 from compiler generated array
- New round_to_precision() function
- wrapper for central_round()
- sets "places" to match 15 digits of absolutely largest of up
to 3 non-zero double arguments
- New getintlog10() function (performance tweak)
- returns floor(log10(val))
- uses fast, hard-coded "if" tree for val between 1e-8 and 1e23
- Call to central_round replaces PHP_ROUND_WITH_FUZZ in:
- PHP_FUNCTION(round)
- PHPAPI _php_math_number_format
ext/standard/formatted_print.c
- Conditional call to round_central in:
- php_sprintf_appenddouble()
Zend/zend_operators.c
- Conditional calls to round_central in:
- _convert_to_string()
- zend_locale_sprintf_double()
- Conditional calls to round_to_precision (one non-zero arg) in:
- mul_function()
- div_function()
- Conditional calls to round_to_precision (3 non-zero args) in:
- add_function()
- sub_function()
- adjusts for potential "catastrophic loss of precision"
Links to Sample Code and Sample Test
http://www.whiffen.net/phprounding/math.c_fixed
http://www.whiffen.net/phprounding/formatted_print.c_fixed
http://www.whiffen.net/phprounding/zend_operators.c_fixed
http://www.whiffen.net/phprounding/test.php
http://www.whiffen.net/phprounding/test_results_unfixed.txt
http://www.whiffen.net/phprounding/test_results_fixed.txt
Notes on Code
-
The code is strictly proof-of-concept. The best that
can be said about it is that it works. (I make no claims
to be a C programmer let alone a developer of php). It would
surely need extensive re-working, probably by a core developer,
before submission for consideration. -
It's incomplete. The major missing element is the setup
of the two new configuration variables. Code stumps are
included to check the configuration variables but in the
sample code exact_decimal_arithmetic is always true and
rounding_mode is always half-up. -
Also omitted is the config.m4 code for establishing that
fpu switching is necessary. -
The code has only been tested under Debian Linux.
Performance
Performance impact is negligible for low exponent operands
(15 digit numbers between 1e-8 and 1e23).
php's round function is actually faster.
The string conversion/output, printf and sprintf functions are
already so slow that it's hard to detect any impact.
For arithmetic, where the extra load is detectable it is roughly
comparable to a single array or object member access i.e.
exact $a + $b is as fast as inexact $a->a + $b
The implication is that unless use of arrays and objects should be
deprecated on performance grounds, there can be no objection on
performance grounds to enabling exact arithmetic for the vast
majority of user calculations.
On the other hand, for high-exponent operands i.e. < 1e-8 or >= 1e23,
performance impact is considerable i.e. comparable to an additional
string cast.
Since high-exponent data is often scientific or engineering data
which is itself known to be inexact, users may well wish to leave
exact_decimal_arithmetic off for these kinds of applications.
FINAL NOTE
Hopefully, after appropriate corrections have been posted, we now
have a better overall description of php's fp/rounding issues on
record.
Furthermore it should be clear that something could be done about
all of them. Whether anything should be done is another question,
and perhaps ultimately a matter of opinion.
As you either already know or could reasonably guess, my personal
view is that simple, consistent, exact decimal arithmetic is
highly desirable for any development tool intended for either
novices or commercial use.
The fact that C does not offer decimal arithmetic is unsurprising
given its pedigree and key application areas. The fact that other
Open-Source tools such as tcl, Perl and MySQL have decimal problems
may owe to the difficulties of implementing solutions given their
code structure. The fact that php still has decimal problems given
its user base and the relative ease with which the problems can be
solved, is something I do not yet understand.
Awaiting enlightenment,
George Whiffen
George Whiffen wrote:
As you either already know or could reasonably guess, my personal
view is that simple, consistent, exact decimal arithmetic is highly
desirable for any development tool intended for either
novices or commercial use.
I use BCMath for critical (e.g. money) calculations as I'd never trust
floating point, with or without your fixes. Especially since I'm dealing
with big numbers.
- Chris
Christian Schneider wrote:
George Whiffen wrote:
As you either already know or could reasonably guess, my personal
view is that simple, consistent, exact decimal arithmetic is highly
desirable for any development tool intended for either
novices or commercial use.I use BCMath for critical (e.g. money) calculations as I'd never trust
floating point, with or without your fixes. Especially since I'm dealing
with big numbers.
- Chris
Chris,
Exactly my point. When coding php you need floating point you can trust
or it is useless. As I said, bc is probably the best currently
available approach.
Presumably you trust bc because either you've checked the code yourself
or you trust the guys who wrote it and say it works.
So what we need with my fixes is for you to check the code/theory
yourself or to have someone you trust to do it for you. So, who would
you trust?
If I'm wrong, and it's really only bc that can be trusted to add 0.1 and
0.1, then shouldn't we be using that internally, rather than pretending
to have a useable fp decimal arithmetic?
George
P.S. I'm interested that you have money numbers so big that 15 digits
are not enough. It can't be gross world income which still isn't up to
$1e15 and certainly is never quoted to the dollar. I can only guess
that it's global currency market volumes you are handling. Do they
really quote them to the dollar?
Hey George,
My point is: I'm not against improving accuracy of floating point
arithmetic in PHP (although it could lead to some weird compatibility
problems with older version) but I wouldn't advise people who really,
really have to rely on their numbers to be correct to use floats anyway.
George Whiffen wrote:
Exactly my point. When coding php you need floating point you can
trust or it is useless. As I said, bc is probably the best currently
You can never trust floating point if you need both big numbers and
precision at the same time. Floating point is always a trade off between
the two.
If I'm wrong, and it's really only bc that can be trusted to add 0.1
and 0.1, then shouldn't we be using that internally, rather than
pretending to have a useable fp decimal arithmetic?
Floating point (even the current PHP implementation) is fine if you're
happy enough with approximations which is the case 99% of the time.
P.S. I'm interested that you have money numbers so big that 15 digits
are not enough. It can't be gross world income which still isn't up
I store cent values as integer strings and I possibly have to deal with
values bigger than 214 million bucks (where PHP starts to convert ints
to float) so I decided to go for BC Math. I didn't even care to check
what the range/precision limits of floats are as I wanted something
absolutely predictable, no matter what numbers are put into the system
by our customers.
Just because you're paranoid doesn't mean they're not after you :-)
- Chris
Christian Schneider wrote:
I store cent values as integer strings and I possibly have to deal with
values bigger than 214 million bucks (where PHP starts to convert ints
Errr, that's 21 million bucks where the transition occurs :-)
- Chris
Christian,
You are paranoid!
You don't need integers AND bc arithmetic! One or the other will do fine.
C doubles are exact for integers a million times bigger than your
requirements (i.e. over 1e15). That means that all integer calculations
are also exact, even if php does store them as floats. If you really
want non-integer results to be truncated (as with bc and zero scale),
rather than approximated which is what the fp will do, then
floor()
/ceil() etc. will also work fine.
Alternatively, you could have left your data with "cents" i.e. 2 decimal
places and just used bc arithmetic and a scale of two or whatever you
need for your sub-cent calculations.
But if, as you say, you are really worried about a solution that works
whatever your future customers do, then you definitely want php decimal
arithmetic sorted out. Then you can handle multi-currencies without
code changes. You don't have to care "where the decimal point goes" as
long as there's no more than 15 digits altogether.
George
Christian Schneider wrote:
Hey George,
My point is: I'm not against improving accuracy of floating point
arithmetic in PHP (although it could lead to some weird compatibility
problems with older version) but I wouldn't advise people who really,
really have to rely on their numbers to be correct to use floats anyway.George Whiffen wrote:
Exactly my point. When coding php you need floating point you can
trust or it is useless. As I said, bc is probably the best currentlyYou can never trust floating point if you need both big numbers and
precision at the same time. Floating point is always a trade off between
the two.If I'm wrong, and it's really only bc that can be trusted to add 0.1
and 0.1, then shouldn't we be using that internally, rather than
pretending to have a useable fp decimal arithmetic?Floating point (even the current PHP implementation) is fine if you're
happy enough with approximations which is the case 99% of the time.P.S. I'm interested that you have money numbers so big that 15 digits
are not enough. It can't be gross world income which still isn't upI store cent values as integer strings and I possibly have to deal with
values bigger than 214 million bucks (where PHP starts to convert ints
to float) so I decided to go for BC Math. I didn't even care to check
what the range/precision limits of floats are as I wanted something
absolutely predictable, no matter what numbers are put into the system
by our customers.Just because you're paranoid doesn't mean they're not after you :-)
- Chris
Hi George,
Although a bit long, your email was an interesting read.
I am not quite sure how we should approach this problem as PHP's FP has
gone through a zillion patches in the past few years. Adding another INI
option is always something I try and keep away from when possible.
As in general, I don't trust C floating point numbers, I do agree with
Christian's approach of using BC Math. It might be because we don't know
enough about it.
I'd like to hear what other people here think. If it's worth messing with
this code.
By the way, can you please post your code changes as unified diffs? We
can't go through the final code files and see what was changed.
Thanks,
Andi
At 02:19 PM 7/23/2004 +0300, George Whiffen wrote:
For the record, herewith:
- definitive (?) list of php's fp/rounding/arithmetic issues
- analysis of the issues
- discussion of options for handling the issues
- description of a comprehensive and "painless" solution
- links to and discussion of proof-of-concept C code
Apologies for the length of the post, but the subject matter is no more
suited to piece by piece discussion, than it is to piece
by piece resolution.N.B. Throughout this post "decimal digits" refers to "significant
decimal figures", and should not be confused with "decimal places".PHP FP/DECIMAL/ROUNDING ISSUES
A. General Arithmetic Errors
A1: Simple Decimal Arithmetic Failure
0.8 == 0.7 + 0.1 ==> falseA2: Composite Decimal Arithmetic Failure
print floor(8.2 - 0.2) ==> 7A3: Difference Errors
print (0.5001 - 0.5) ==> 9.9999999999989E-05A4: Cumulative Arithmetic Errors
for ($i=1;$i<=1000000;++$i){$sum += 0.1;}print $sum ==> 100000.00000133B. Intel (non-Win32) Specific Errors
B1: Round (unfixed by fuzziness)
Intel GNU with FUZZ: round(12345678.0255,3) ==>12345678.025B2: Exact Arithmetic Errors
Intel GNU: 0.002877 == 2877/1000000 ==> falseB3: Round (fpu rounding twice)
Intel GNU with FUZZ: 0.002877 == round(0.002877,6) ==> falseB4: Round - inexact double - created by fuzziness
Intel GNU with FUZZ: round(0.9499999999999,1) ==> 1.0B5: Round - exact double - created by fuzziness
Intel GNU with FUZZ: round(144999999999999,-13) ==> 1.5e14C. High Exponent Round Algorithm Failure
C1: Cross-Platform Rounding
Most (All?) Platforms: 1e-24 == round(1e-24,24) ==> falseC2: Platform Specific Variations
Intel GNU: 1e-23 == round(1e-23,23) ==> false
Win32: 1e-23 == round(1e-23,23) ==> true
Intel GNU: 5e-23 == round(5e-23,23) ==> true
Win32: 5e-23 = round(5e-23,23) ==> falseD. Inconsistent Rounding
round(0.255,2) ==> 0.26
sprintf("%.2f",0.255) ==> 0.25
ini_set('precision',2);print 0.255 ==> 0.26round(0.235,2) ==> 0.24
sprintf("%.2f",0.235) ==> 0.23
ini_set('precision',2);print 0.235 ==> 0.23C: printf("%.2f",0.245) ==> 0.24
php: printf("%.2f",0.245) ==> 0.25E. Functional Limitations
E1: No half-even (bankers) rounding option
See Rasmus Lerdorf's comments on Bug #24142E2: No round to precision
ANALYSIS
A. Arithmetic Errors
php implements its decimal arithmetic using C doubles, i.e. IEEE754
64bit binary representations. While the IEEE standard guarantees exact
binary arithmetic results within range and exactly rounded outside of
range, php's decimal arithmetic results are neither exactly rounded nor
exact within range.C double representations of decimal numbers (e.g. 0.1) are typically
inexact. These "representation" errors, which only effect users of
C doubles for "other-base" arithmetic, can only be corrected by
rounding of the C double result in the appropriate base, base ten for
php. php does not do this, therefore arithmetic results may be in error.Often the incorrect results are masked by subsequent implicit rounding
e.g. at output/string conversion. However, comparisons for equality,
(A1), integer conversions (A2) and difference operations (A3) can
reveal them immediately. If the results are used for further
arithmetic operations the errors can accumulate ad infinitum (A4).The errors are strictly bounded. Even with maximal errors in
representation and an inexact final result, the returned result when
correctly rounded to 15 decimal digits will be the same as the true
decimal result, if it is of 15 digits or less. This is particularly
significant since 15 digits is the limit for consistent representation
of decimals by 64-bit binaries (C doubles). (Not all 16 decimal digit
numbers have distinct C double representations e.g. 0.9999999999999995
and 0.9999999999999994 have the same C double representation).Therefore, with the addition of appropriate rounding, exact decimal
arithmetic to 15 significant figures is perfectly possible with php's
current fp implementation.What is not possible, is exactly rounded decimal arithmetic, (the basic
IEEE854 requirement) which requires the result be calculated to "infinite"
precision and rounded to the best approximation. Decimally rounded php
arithmetic can only guarantee that when the true result is
of 16 or more digits, the returned result after rounding may be either the
result immediately above or below the true result, but not necessarily the
closer of the two. Practically, exactly rounded decimal fp arithmetic
requires decimal fp hardware or complex and slow multi-precision arithmetic.For most php users, most of the time, exact arithmetic is quite good
enough. In particular, for money calculations, where exactness is
typically an overwhelming, and often legal, requirement, exactly rounded
results are largely irrelevant. Either calculations are simple
and guaranteed to be exact within 15 digits e.g. ledger arithmetic, or
they are very tightly specified to guarantee consistency and exactness
at each step, e.g. interest calculations.This is in marked contrast to typical C users, e.g. coding complex
scientific and mathematical algorithms. They need the full IEEE
functionality including access to the exact flag, rounding
mode, precision control as well as exactly rounded results, (and
appropriate numerical analysis) to validate their results.Unfortunately even just correct decimal rounding of C doubles is a
non-trivial matter as php's other rounding issues clearly illustrate.B. Intel FPU Precision Problems
Although only specific to Intel platforms, fpu extended precision
errors in C double arithmetic, can easily confuse attempts to
correctly round decimals across platforms.By default Intel FPUs operate in extended precision mode. On Win32
platforms the compiler forces them to double precision.
However, some *nix compilers e.g. GNU, leave the FPU at the extended
precision default.An FPU in extended precision mode creates two problems for decimal
arithetic using C doubles:Firstly, different results are returned depending on whether the
optimiser holds intermediate arithmetic results in extended precision
on the FPU registers or converts them to doubles on the way through
memory, (B1).Secondly, (and much more rarely), a single arithmetic operation with
an inexact result can be vulnerable to errors from rounding twice.
Rounding once to extended precision during the actual calculation
and then again to double precision on return of the result from
the FPU can introduce new errors, (B2 and B3).Both problems can be solved by a fast and thread-safe force/restore of
the FPU precision mode, (e.g. _FPU_GETCW(), _FPU_SETCW()) with no
impact on other code e.g. maths libraries.On the other hand, the current solution for problem B1 which has been
implemented in php_round, i.e. addition of a "fuzz factor", can itself
create new errors, (B4, B5), and does nothing to address B2 or B3.C. Algorithmic Failures on Inexact Powers of Ten
Normal double FPU arithmetic will typically return the correct
"nearest" C double representation of a decimal fp on division of a
rounded integer mantissa by an appropriate power of ten (i.e. the
algorithm used by php's round). However, this is only guaranteed to work
if the power of ten is itself exact i.e. between +/- 22 (1e-22 to 1e22).For decimals with exponents outside this range e.g. 1e-23, 5e24 etc.,
php's rounding algorithm and all rounding algorithms which only use fp
arithmetic will sometimes fail to return the closest C double
representation. This becomes a problem, if, as is typically the case,
the platform's strtod() always returns the absolutely closest
representation e.g. by using multi-precision integer arithmetic.
The consequence is that 1e-24 is no longer equal to 1/1e24 or
round(1e-24,24). The simplest, but relatively expensive solution, is
to use the platform's own strtod(),sprintf() as the last part of the
rounding operation for decimal exponents greater than +/- 22 i.e. when
rounding operands with absolute value outside the range of 1e-8 to 1e23
using the maximum sensible precision of 15 digits. This is much more
expensive in terms of performance than the pure fp algorithm which can
be safely used for numbers such as 0.0000000123456789012345 but not for
0.000000001.D. Inconsistent Implicit Rounding
Although the explicit rounding functions, i.e. php_round and
number_format, now share the same rounding macro, the implictly
rounding functions i.e. the output/string cast functions and
the sprintf/printf family do not. This leads to a host of
inconsistencies depending on which function is used and on which
platform. sprintf/printf are not even consistent with their C
counterparts.Much of the problem is related to the issue of whether to round
half-even (the fpu default/bankers rounding), or round half-up
(the common high-school and scientific rounding mode). Both
php's round function and the printf family attempt to round
half-up, the output/string converters stay with the fpu default
of round half-even.One way to comprehensively resolve this issue is to include
mode-sensitive rounding using a common rounding routine as a
first stage of the string cast/printf routines. In that case,
not only do all php's rounding functions, implicit and explicit,
return consistent results, but it becomes possible to allow full
user control over rounding mode.E. Functional Limitations
The absence of explicit control over rounding mode (F1) appears to have
led to much confusion even among php gurus.Both round-half-up and round-half-even (US banker's rounding), have
legitimate uses. The absence of more user demand for rounding mode
control probably owes more to the general fp problems forcing users
into hand-crafted solutions, e.g. application-level fixed point shift,
rather than a genuine lack of interest.The other obvious functional limitation with regard to rounding in
php is the requirement to code rounding to a specific precision
using php's round to places (F2), i.e. round(expression,precision) must
be coded as:round(expression,precision+1+floor(log10(expression)))
This is not only clumsy but unnecessarily expensive in terms of
performance.SOLUTIONS
- User Workarounds
Taken altogether, the issues in php's fp implementation listed above
present the php user with a daunting task if they wish to be confident
of their arithmetic results. Practically they have at least three
alternatives:a) Do nothing
This is probably the most popular approach. Most of the time the
problems are succesfully masked by implicit or explicit output
conversions and problematic numbers such as 0.002877 and 0.235
simply don't turn up. The most common problem they will come across
is in comparisons for equality about which the php documentation
has clear warnings. On average, they will "get away with it".The big problem is that they may not realise that their code is
intrinsically unsound and just waiting to meet a problem such as
A2 or A4. Unless they have are already aware of all the problems
testing is unlikely to be comprehensive. The danger is that problems
show up years later, perhaps because of scaling or platform changes.
Then diagnosis and correction of the code may have turned into
nightmares. It's easy to imagine a traumatised end-user when
they finally discover "that's how php arithmetic works" deciding to take
the "safe option" and go for a rewrite using .NET and its decimal classes.b) Avoid the fp numeric type
The experienced C/Open Source developer may well simply avoid php's
decimal arithmetic completely, preferring application-level fixed-point
shift i.e. integer arithmetic only. They will have to watch out for
php's automatic type-casting converting their integers back to floating
points, but since fp integer arithmetic is in any case exact, this is
not such a big problem.They are most likely to be vulnerable to the coding errors that the
extra complexity and clumsiness of fixed-point shifts necessarily
entails e.g. confusion over where the point is fixed in data and code.
Unfortunately, however experienced they may be with single fixed-point
coding, they are increasingly likely to be faced with variable fixed-point
processing to cope with increasingly complex applications e.g.
multi-currency, multi-sales-tax, multi-tier-commission e-commerce.True floating-point, (if results are exact), is much more suited to
generic, reusable coding.c) Use "string" arithmetic
The php specialist may well prefer to keep their numbers "floating" but
to use a "string" arithmetic to ensure correct results. Whether they
opt for a string cast of fp arithmetic results or use of the bc
libraries the arithmetic should work fine. While bc is typically faster,
string casts are generally more flexible since no scale control is required.Of course, the problems of carrying out any required rounding on the
results remain, since bc truncates out-of-scale results and string casts
will try to round half-even. They also require greater code discipline
and certainly reduce code "readability". Nevertheless either
of these is probably a better user workaround than the other alternatives
currently available to the user.
- Developer Fixes
a) Replace php's fp numeric type
This is the conventional position of most languages from Cobol in the
1960s to .NET's decimal classes. They have specific decimal types for
which decimal arithmetic works. They are typically semi-fixed, (much
like bc with its scale control) and may or may not have the advantage
of dedicated hardware.
Although php could take the same approach, it has special problems:
firstly, of course is the loose typing and automatic type conversion
e.g. context or value triggered integer/fp/string conversions; secondly
there is the issue of compatibility with extensions and external
libraries, such as the C math library; thirdly there is
the problem of ensuring backward compatibility, (a major php strength);
and finally performance, although php's arithmetic may be a couple of
orders of magnitude slower than C's, it would be sad not to take
advantage of the blindingly fast fp hardware on nearly every platform.In short, any change to move away from C doubles as the core number
type is fraught with problems.b) Problem-by-problem fixups
This has historically been the php approach. Unfortunately the big
problem is that partial solutions, (such as the fuzzy fix for php_round),
can actually make things worse! The situation is already so complex for
the php user that even "good fixes" may simply confuse them further by
simply replacing one "gotcha" that they have coded around with the next
one on the list.c) A Comprehensive Solution retaining C doubles
The good news is that the alternative of a comprehensive resolution
of all the issues is not only not impossible but also not much more work
than any one partial solution. It can also be implemented in
such a way as to not only retain full backward compatibility but to
greatly simplify the user documentation.
The key requirement is to have a safe, dependable but still fast
central rounding function. The rest is straightforward. It just needs
care over the precision used in rounding e.g. counting significant lead
zeros on differences.Such a solution is outlined below:
OUTLINE SOLUTION
The solution described here in outline and with links to proof-of-concept
code resolves all the issues listed with minimal confusion for the php
user. The only visible changes for the user are:a) New "exact_decimal_arithmetic" configuration variable
When set to true, this enables fixes for all the fp issues,
i.e. arithmetic, correct rounding, rounding mode control and
consistent rounding. All the user needs to know is that while 15
or less digit results will always be correct, 16 or more digit
results will be rounded to a 15 digit decimal immediately above or
below the true result.No coding changes are required to take advantage of the new feature, and
historically "unsafe" code becomes immediately "safe" when the variable is
set. It can be switched on and off at run-time at any point in a
calculation with no adverse consequences.b) New "rounding_mode" configuration variable
When exact_decimal_arithmetic is on, the value of rounding_mode, either
half_up, (the default), or half_even controls rounding mode across the
whole of php. It sets the behaviour of both explicit rounding such as
round and number_format and implicit rounding in string/output conversion,
print/sprintf. It too can be changed at any point during execution.CODE CHANGES
As of 5.0.0, implementation of a solution along these lines requires
changes to three sources: Zend/zend_operators.c, ext/standard/math.c
and formatted_print.c as follows:ext/standard/math.c
- Replace PHP_ROUND_WITH_FUZZ macro by central_round() function
with the following features:
- non-Win32 Intel FPUs forced to double-precision
- alternate rounding depending on rounding_mode setting
- sprintf/strtod when abs(places) >= 23
pow()
replaced by new getintpow10() function for performance
- New getintpow10() function (performance tweak)
- returns powers of 10 from 1 to 23 from compiler generated array
- New round_to_precision() function
- wrapper for central_round()
- sets "places" to match 15 digits of absolutely largest of up
to 3 non-zero double arguments
- New getintlog10() function (performance tweak)
- returns floor(log10(val))
- uses fast, hard-coded "if" tree for val between 1e-8 and 1e23
- Call to central_round replaces PHP_ROUND_WITH_FUZZ in:
- PHP_FUNCTION(round)
- PHPAPI _php_math_number_format
ext/standard/formatted_print.c
- Conditional call to round_central in:
- php_sprintf_appenddouble()
Zend/zend_operators.c
- Conditional calls to round_central in:
- _convert_to_string()
- zend_locale_sprintf_double()
- Conditional calls to round_to_precision (one non-zero arg) in:
- mul_function()
- div_function()
- Conditional calls to round_to_precision (3 non-zero args) in:
- add_function()
- sub_function()
- adjusts for potential "catastrophic loss of precision"
Links to Sample Code and Sample Test
http://www.whiffen.net/phprounding/math.c_fixed
http://www.whiffen.net/phprounding/formatted_print.c_fixed
http://www.whiffen.net/phprounding/zend_operators.c_fixed
http://www.whiffen.net/phprounding/test.php
http://www.whiffen.net/phprounding/test_results_unfixed.txt
http://www.whiffen.net/phprounding/test_results_fixed.txtNotes on Code
The code is strictly proof-of-concept. The best that
can be said about it is that it works. (I make no claims
to be a C programmer let alone a developer of php). It would
surely need extensive re-working, probably by a core developer,
before submission for consideration.It's incomplete. The major missing element is the setup
of the two new configuration variables. Code stumps are included to check
the configuration variables but in the sample code
exact_decimal_arithmetic is always true and rounding_mode is always half-up.Also omitted is the config.m4 code for establishing that
fpu switching is necessary.The code has only been tested under Debian Linux.
Performance
Performance impact is negligible for low exponent operands
(15 digit numbers between 1e-8 and 1e23).php's round function is actually faster.
The string conversion/output, printf and sprintf functions are already so
slow that it's hard to detect any impact.For arithmetic, where the extra load is detectable it is roughly
comparable to a single array or object member access i.e.
exact $a + $b is as fast as inexact $a->a + $bThe implication is that unless use of arrays and objects should be
deprecated on performance grounds, there can be no objection on
performance grounds to enabling exact arithmetic for the vast majority of
user calculations.
On the other hand, for high-exponent operands i.e. < 1e-8 or >= 1e23,
performance impact is considerable i.e. comparable to an additional string
cast.Since high-exponent data is often scientific or engineering data
which is itself known to be inexact, users may well wish to leave
exact_decimal_arithmetic off for these kinds of applications.FINAL NOTE
Hopefully, after appropriate corrections have been posted, we now
have a better overall description of php's fp/rounding issues on
record.Furthermore it should be clear that something could be done about
all of them. Whether anything should be done is another question,
and perhaps ultimately a matter of opinion.As you either already know or could reasonably guess, my personal
view is that simple, consistent, exact decimal arithmetic is highly
desirable for any development tool intended for either
novices or commercial use.The fact that C does not offer decimal arithmetic is unsurprising given
its pedigree and key application areas. The fact that other Open-Source
tools such as tcl, Perl and MySQL have decimal problems
may owe to the difficulties of implementing solutions given their
code structure. The fact that php still has decimal problems given
its user base and the relative ease with which the problems can be solved,
is something I do not yet understand.
Awaiting enlightenment,George Whiffen
Andi Gutmans wrote:
Hi George,
Although a bit long, your email was an interesting read.
Thanks.
I am not quite sure how we should approach this problem as PHP's FP has
gone through a zillion patches in the past few years.
Yeahh. Yet another partial solution can do as much harm as good. It
needs to really work...
Adding another INI
option is always something I try and keep away from when possible.
What we could do is skip the exact_decimal_arithmetic option and just
have "rounding_mode" with an extra (default) option "legacy" which meant
nothing changed. We would only attempt exact arithmetic if rounding
mode was explicitly half-up or half-even. Then the whole shooting match
could be presented as genuinely new functionality. Later on, when we
are more confident it all works, we could just change the default to
half-up. I'd rather we were more confident and up-front than this but
it would be better than nothing and fairly "safe".
As in general, I don't trust C floating point numbers, I do agree with
Christian's approach of using BC Math. It might be because we don't know
enough about it.
Who does trust C floating point? I certainly never expected the
2877/100000 problem for example.
For safety, it probably needs proper numeric analysts/fp gurus to go
over it, which I've kind of tried to. But we are asking for some real
work on their part so it would be much easier if the requests for help
came from a "lead" php developer. The other potential problem I've found
is that anyone really expert enough to be confident the solution works,
also thinks that we should be implementing an exactly-rounded fp decimal
arithmetic. For the purist, what I'm proposing is still a kludge, even
if it does work.
I'd like to hear what other people here think. If it's worth messing
with this code.
I would love to hear what other people think. At the end of the day,
there's not much point in me doing any more. This really needs a "new"
brain and probably a lead developer to be involved. I appreciate
that's asking a lot, but I think the payoff could be more than enough to
justify the effort.
By the way, can you please post your code changes as unified diffs? We
can't go through the final code files and see what was changed.
Thanks,
Andi
Sure.
http://www.whiffen.net/phprounding/math.udiff
http://www.whiffen.net/phprounding/formatted_print.udiff
http://www.whiffen.net/phprounding/zend_operators.udiff
At 02:19 PM 7/23/2004 +0300, George Whiffen wrote:
For the record, herewith:
- definitive (?) list of php's fp/rounding/arithmetic issues
- analysis of the issues
- discussion of options for handling the issues
- description of a comprehensive and "painless" solution
- links to and discussion of proof-of-concept C code
Apologies for the length of the post, but the subject matter is no
more suited to piece by piece discussion, than it is to piece
by piece resolution.N.B. Throughout this post "decimal digits" refers to "significant
decimal figures", and should not be confused with "decimal places".PHP FP/DECIMAL/ROUNDING ISSUES
A. General Arithmetic Errors
A1: Simple Decimal Arithmetic Failure
0.8 == 0.7 + 0.1 ==> falseA2: Composite Decimal Arithmetic Failure
print floor(8.2 - 0.2) ==> 7A3: Difference Errors
print (0.5001 - 0.5) ==> 9.9999999999989E-05A4: Cumulative Arithmetic Errors
for ($i=1;$i<=1000000;++$i){$sum += 0.1;}print $sum ==> 100000.00000133B. Intel (non-Win32) Specific Errors
B1: Round (unfixed by fuzziness)
Intel GNU with FUZZ: round(12345678.0255,3) ==>12345678.025B2: Exact Arithmetic Errors
Intel GNU: 0.002877 == 2877/1000000 ==> falseB3: Round (fpu rounding twice)
Intel GNU with FUZZ: 0.002877 == round(0.002877,6) ==> falseB4: Round - inexact double - created by fuzziness
Intel GNU with FUZZ: round(0.9499999999999,1) ==> 1.0B5: Round - exact double - created by fuzziness
Intel GNU with FUZZ: round(144999999999999,-13) ==> 1.5e14C. High Exponent Round Algorithm Failure
C1: Cross-Platform Rounding
Most (All?) Platforms: 1e-24 == round(1e-24,24) ==> falseC2: Platform Specific Variations
Intel GNU: 1e-23 == round(1e-23,23) ==> false
Win32: 1e-23 == round(1e-23,23) ==> true
Intel GNU: 5e-23 == round(5e-23,23) ==> true
Win32: 5e-23 = round(5e-23,23) ==> falseD. Inconsistent Rounding
round(0.255,2) ==> 0.26
sprintf("%.2f",0.255) ==> 0.25
ini_set('precision',2);print 0.255 ==> 0.26round(0.235,2) ==> 0.24
sprintf("%.2f",0.235) ==> 0.23
ini_set('precision',2);print 0.235 ==> 0.23C: printf("%.2f",0.245) ==> 0.24
php: printf("%.2f",0.245) ==> 0.25E. Functional Limitations
E1: No half-even (bankers) rounding option
See Rasmus Lerdorf's comments on Bug #24142E2: No round to precision
ANALYSIS
A. Arithmetic Errors
php implements its decimal arithmetic using C doubles, i.e. IEEE754
64bit binary representations. While the IEEE standard guarantees
exact binary arithmetic results within range and exactly rounded
outside of range, php's decimal arithmetic results are neither exactly
rounded nor exact within range.C double representations of decimal numbers (e.g. 0.1) are typically
inexact. These "representation" errors, which only effect users of
C doubles for "other-base" arithmetic, can only be corrected by
rounding of the C double result in the appropriate base, base ten for
php. php does not do this, therefore arithmetic results may be in error.Often the incorrect results are masked by subsequent implicit rounding
e.g. at output/string conversion. However, comparisons for equality,
(A1), integer conversions (A2) and difference operations (A3) can
reveal them immediately. If the results are used for further
arithmetic operations the errors can accumulate ad infinitum (A4).The errors are strictly bounded. Even with maximal errors in
representation and an inexact final result, the returned result when
correctly rounded to 15 decimal digits will be the same as the true
decimal result, if it is of 15 digits or less. This is particularly
significant since 15 digits is the limit for consistent representation
of decimals by 64-bit binaries (C doubles). (Not all 16 decimal digit
numbers have distinct C double representations e.g. 0.9999999999999995
and 0.9999999999999994 have the same C double representation).Therefore, with the addition of appropriate rounding, exact decimal
arithmetic to 15 significant figures is perfectly possible with php's
current fp implementation.What is not possible, is exactly rounded decimal arithmetic, (the basic
IEEE854 requirement) which requires the result be calculated to
"infinite"
precision and rounded to the best approximation. Decimally rounded php
arithmetic can only guarantee that when the true result is
of 16 or more digits, the returned result after rounding may be either
the result immediately above or below the true result, but not
necessarily the closer of the two. Practically, exactly rounded
decimal fp arithmetic requires decimal fp hardware or complex and slow
multi-precision arithmetic.For most php users, most of the time, exact arithmetic is quite good
enough. In particular, for money calculations, where exactness is
typically an overwhelming, and often legal, requirement, exactly
rounded results are largely irrelevant. Either calculations are simple
and guaranteed to be exact within 15 digits e.g. ledger arithmetic, or
they are very tightly specified to guarantee consistency and exactness
at each step, e.g. interest calculations.This is in marked contrast to typical C users, e.g. coding complex
scientific and mathematical algorithms. They need the full IEEE
functionality including access to the exact flag, rounding
mode, precision control as well as exactly rounded results, (and
appropriate numerical analysis) to validate their results.Unfortunately even just correct decimal rounding of C doubles is a
non-trivial matter as php's other rounding issues clearly illustrate.B. Intel FPU Precision Problems
Although only specific to Intel platforms, fpu extended precision
errors in C double arithmetic, can easily confuse attempts to
correctly round decimals across platforms.By default Intel FPUs operate in extended precision mode. On Win32
platforms the compiler forces them to double precision.
However, some *nix compilers e.g. GNU, leave the FPU at the extended
precision default.An FPU in extended precision mode creates two problems for decimal
arithetic using C doubles:Firstly, different results are returned depending on whether the
optimiser holds intermediate arithmetic results in extended precision
on the FPU registers or converts them to doubles on the way through
memory, (B1).Secondly, (and much more rarely), a single arithmetic operation with
an inexact result can be vulnerable to errors from rounding twice.
Rounding once to extended precision during the actual calculation
and then again to double precision on return of the result from
the FPU can introduce new errors, (B2 and B3).Both problems can be solved by a fast and thread-safe force/restore of
the FPU precision mode, (e.g. _FPU_GETCW(), _FPU_SETCW()) with no
impact on other code e.g. maths libraries.On the other hand, the current solution for problem B1 which has been
implemented in php_round, i.e. addition of a "fuzz factor", can itself
create new errors, (B4, B5), and does nothing to address B2 or B3.C. Algorithmic Failures on Inexact Powers of Ten
Normal double FPU arithmetic will typically return the correct
"nearest" C double representation of a decimal fp on division of a
rounded integer mantissa by an appropriate power of ten (i.e. the
algorithm used by php's round). However, this is only guaranteed to
work if the power of ten is itself exact i.e. between +/- 22 (1e-22 to
1e22).For decimals with exponents outside this range e.g. 1e-23, 5e24 etc.,
php's rounding algorithm and all rounding algorithms which only use fp
arithmetic will sometimes fail to return the closest C double
representation. This becomes a problem, if, as is typically the case,
the platform's strtod() always returns the absolutely closest
representation e.g. by using multi-precision integer arithmetic.
The consequence is that 1e-24 is no longer equal to 1/1e24 or
round(1e-24,24). The simplest, but relatively expensive solution, is
to use the platform's own strtod(),sprintf() as the last part of the
rounding operation for decimal exponents greater than +/- 22 i.e. when
rounding operands with absolute value outside the range of 1e-8 to 1e23
using the maximum sensible precision of 15 digits. This is much more
expensive in terms of performance than the pure fp algorithm which can
be safely used for numbers such as 0.0000000123456789012345 but not
for 0.000000001.D. Inconsistent Implicit Rounding
Although the explicit rounding functions, i.e. php_round and
number_format, now share the same rounding macro, the implictly
rounding functions i.e. the output/string cast functions and
the sprintf/printf family do not. This leads to a host of
inconsistencies depending on which function is used and on which
platform. sprintf/printf are not even consistent with their C
counterparts.Much of the problem is related to the issue of whether to round
half-even (the fpu default/bankers rounding), or round half-up
(the common high-school and scientific rounding mode). Both
php's round function and the printf family attempt to round
half-up, the output/string converters stay with the fpu default
of round half-even.One way to comprehensively resolve this issue is to include
mode-sensitive rounding using a common rounding routine as a
first stage of the string cast/printf routines. In that case,
not only do all php's rounding functions, implicit and explicit,
return consistent results, but it becomes possible to allow full
user control over rounding mode.E. Functional Limitations
The absence of explicit control over rounding mode (F1) appears to have
led to much confusion even among php gurus.Both round-half-up and round-half-even (US banker's rounding), have
legitimate uses. The absence of more user demand for rounding mode
control probably owes more to the general fp problems forcing users
into hand-crafted solutions, e.g. application-level fixed point shift,
rather than a genuine lack of interest.The other obvious functional limitation with regard to rounding in
php is the requirement to code rounding to a specific precision
using php's round to places (F2), i.e. round(expression,precision) must
be coded as:round(expression,precision+1+floor(log10(expression)))
This is not only clumsy but unnecessarily expensive in terms of
performance.SOLUTIONS
- User Workarounds
Taken altogether, the issues in php's fp implementation listed above
present the php user with a daunting task if they wish to be confident
of their arithmetic results. Practically they have at least three
alternatives:a) Do nothing
This is probably the most popular approach. Most of the time the
problems are succesfully masked by implicit or explicit output
conversions and problematic numbers such as 0.002877 and 0.235
simply don't turn up. The most common problem they will come across
is in comparisons for equality about which the php documentation
has clear warnings. On average, they will "get away with it".The big problem is that they may not realise that their code is
intrinsically unsound and just waiting to meet a problem such as
A2 or A4. Unless they have are already aware of all the problems
testing is unlikely to be comprehensive. The danger is that problems
show up years later, perhaps because of scaling or platform changes.
Then diagnosis and correction of the code may have turned into
nightmares. It's easy to imagine a traumatised end-user when
they finally discover "that's how php arithmetic works" deciding to
take the "safe option" and go for a rewrite using .NET and its decimal
classes.b) Avoid the fp numeric type
The experienced C/Open Source developer may well simply avoid php's
decimal arithmetic completely, preferring application-level fixed-point
shift i.e. integer arithmetic only. They will have to watch out for
php's automatic type-casting converting their integers back to floating
points, but since fp integer arithmetic is in any case exact, this is
not such a big problem.They are most likely to be vulnerable to the coding errors that the
extra complexity and clumsiness of fixed-point shifts necessarily
entails e.g. confusion over where the point is fixed in data and code.
Unfortunately, however experienced they may be with single fixed-point
coding, they are increasingly likely to be faced with variable
fixed-point processing to cope with increasingly complex applications
e.g. multi-currency, multi-sales-tax, multi-tier-commission e-commerce.True floating-point, (if results are exact), is much more suited to
generic, reusable coding.c) Use "string" arithmetic
The php specialist may well prefer to keep their numbers "floating" but
to use a "string" arithmetic to ensure correct results. Whether they
opt for a string cast of fp arithmetic results or use of the bc
libraries the arithmetic should work fine. While bc is typically
faster, string casts are generally more flexible since no scale
control is required.Of course, the problems of carrying out any required rounding on the
results remain, since bc truncates out-of-scale results and string
casts will try to round half-even. They also require greater code
discipline and certainly reduce code "readability". Nevertheless either
of these is probably a better user workaround than the other
alternatives currently available to the user.
- Developer Fixes
a) Replace php's fp numeric type
This is the conventional position of most languages from Cobol in the
1960s to .NET's decimal classes. They have specific decimal types for
which decimal arithmetic works. They are typically semi-fixed, (much
like bc with its scale control) and may or may not have the advantage
of dedicated hardware.
Although php could take the same approach, it has special problems:
firstly, of course is the loose typing and automatic type conversion
e.g. context or value triggered integer/fp/string conversions;
secondly there is the issue of compatibility with extensions and
external libraries, such as the C math library; thirdly there is
the problem of ensuring backward compatibility, (a major php strength);
and finally performance, although php's arithmetic may be a couple of
orders of magnitude slower than C's, it would be sad not to take
advantage of the blindingly fast fp hardware on nearly every platform.In short, any change to move away from C doubles as the core number
type is fraught with problems.b) Problem-by-problem fixups
This has historically been the php approach. Unfortunately the big
problem is that partial solutions, (such as the fuzzy fix for php_round),
can actually make things worse! The situation is already so complex for
the php user that even "good fixes" may simply confuse them further by
simply replacing one "gotcha" that they have coded around with the next
one on the list.c) A Comprehensive Solution retaining C doubles
The good news is that the alternative of a comprehensive resolution
of all the issues is not only not impossible but also not much more
work than any one partial solution. It can also be implemented in
such a way as to not only retain full backward compatibility but to
greatly simplify the user documentation.
The key requirement is to have a safe, dependable but still fast
central rounding function. The rest is straightforward. It just
needs care over the precision used in rounding e.g. counting
significant lead zeros on differences.Such a solution is outlined below:
OUTLINE SOLUTION
The solution described here in outline and with links to
proof-of-concept code resolves all the issues listed with minimal
confusion for the php user. The only visible changes for the user are:a) New "exact_decimal_arithmetic" configuration variable
When set to true, this enables fixes for all the fp issues,
i.e. arithmetic, correct rounding, rounding mode control and
consistent rounding. All the user needs to know is that while 15
or less digit results will always be correct, 16 or more digit
results will be rounded to a 15 digit decimal immediately above or
below the true result.No coding changes are required to take advantage of the new feature,
and historically "unsafe" code becomes immediately "safe" when the
variable is set. It can be switched on and off at run-time at any
point in a calculation with no adverse consequences.b) New "rounding_mode" configuration variable
When exact_decimal_arithmetic is on, the value of rounding_mode, either
half_up, (the default), or half_even controls rounding mode across the
whole of php. It sets the behaviour of both explicit rounding such as
round and number_format and implicit rounding in string/output
conversion, print/sprintf. It too can be changed at any point during
execution.CODE CHANGES
As of 5.0.0, implementation of a solution along these lines requires
changes to three sources: Zend/zend_operators.c, ext/standard/math.c
and formatted_print.c as follows:ext/standard/math.c
- Replace PHP_ROUND_WITH_FUZZ macro by central_round() function
with the following features:
- non-Win32 Intel FPUs forced to double-precision
- alternate rounding depending on rounding_mode setting
- sprintf/strtod when abs(places) >= 23
pow()
replaced by new getintpow10() function for performance
- New getintpow10() function (performance tweak)
- returns powers of 10 from 1 to 23 from compiler generated array
- New round_to_precision() function
- wrapper for central_round()
- sets "places" to match 15 digits of absolutely largest of up
to 3 non-zero double arguments
- New getintlog10() function (performance tweak)
- returns floor(log10(val))
- uses fast, hard-coded "if" tree for val between 1e-8 and 1e23
- Call to central_round replaces PHP_ROUND_WITH_FUZZ in:
- PHP_FUNCTION(round)
- PHPAPI _php_math_number_format
ext/standard/formatted_print.c
- Conditional call to round_central in:
- php_sprintf_appenddouble()
Zend/zend_operators.c
- Conditional calls to round_central in:
- _convert_to_string()
- zend_locale_sprintf_double()
- Conditional calls to round_to_precision (one non-zero arg) in:
- mul_function()
- div_function()
- Conditional calls to round_to_precision (3 non-zero args) in:
- add_function()
- sub_function()
- adjusts for potential "catastrophic loss of precision"
Links to Sample Code and Sample Test
http://www.whiffen.net/phprounding/math.c_fixed
http://www.whiffen.net/phprounding/formatted_print.c_fixed
http://www.whiffen.net/phprounding/zend_operators.c_fixed
http://www.whiffen.net/phprounding/test.php
http://www.whiffen.net/phprounding/test_results_unfixed.txt
http://www.whiffen.net/phprounding/test_results_fixed.txtNotes on Code
The code is strictly proof-of-concept. The best that
can be said about it is that it works. (I make no claims
to be a C programmer let alone a developer of php). It would
surely need extensive re-working, probably by a core developer,
before submission for consideration.It's incomplete. The major missing element is the setup
of the two new configuration variables. Code stumps are included to
check the configuration variables but in the sample code
exact_decimal_arithmetic is always true and rounding_mode is always
half-up.Also omitted is the config.m4 code for establishing that
fpu switching is necessary.The code has only been tested under Debian Linux.
Performance
Performance impact is negligible for low exponent operands
(15 digit numbers between 1e-8 and 1e23).php's round function is actually faster.
The string conversion/output, printf and sprintf functions are already
so slow that it's hard to detect any impact.For arithmetic, where the extra load is detectable it is roughly
comparable to a single array or object member access i.e.
exact $a + $b is as fast as inexact $a->a + $bThe implication is that unless use of arrays and objects should be
deprecated on performance grounds, there can be no objection on
performance grounds to enabling exact arithmetic for the vast majority
of user calculations.
On the other hand, for high-exponent operands i.e. < 1e-8 or >= 1e23,
performance impact is considerable i.e. comparable to an additional
string cast.Since high-exponent data is often scientific or engineering data
which is itself known to be inexact, users may well wish to leave
exact_decimal_arithmetic off for these kinds of applications.FINAL NOTE
Hopefully, after appropriate corrections have been posted, we now
have a better overall description of php's fp/rounding issues on
record.Furthermore it should be clear that something could be done about
all of them. Whether anything should be done is another question,
and perhaps ultimately a matter of opinion.As you either already know or could reasonably guess, my personal
view is that simple, consistent, exact decimal arithmetic is highly
desirable for any development tool intended for either
novices or commercial use.The fact that C does not offer decimal arithmetic is unsurprising
given its pedigree and key application areas. The fact that other
Open-Source tools such as tcl, Perl and MySQL have decimal problems
may owe to the difficulties of implementing solutions given their
code structure. The fact that php still has decimal problems given
its user base and the relative ease with which the problems can be
solved, is something I do not yet understand.
Awaiting enlightenment,George Whiffen