Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:123026
Precedence: bulk
MIME-Version: 1.0
References: <40553F28-2EC2-475A-BD8E-1D6517AA2A51@rwec.co.uk>
 <2B518F62-B774-45C9-82A2-EF6653AAE34E@sakiot.com> <A703D145-1A10-4C7E-9E0C-1F0BEF94CF99@rwec.co.uk>
 <CAMrTa2G+_v1aO_g7NLiLsTwquJaf9=Zj5bp2ODGzVSB3iGr6pw@mail.gmail.com> <0f3d0f89-3064-4d56-9fb2-801bb0cda8a5@rwec.co.uk>
In-Reply-To: <0f3d0f89-3064-4d56-9fb2-801bb0cda8a5@rwec.co.uk>
Date: Sun, 7 Apr 2024 15:50:39 -0700
Message-ID: <CAMrTa2HgSSTT1wJPz0x4X=nW2ijQDmaGhdNGKx1rLePdBWGfwQ@mail.gmail.com>
Subject: Re: [PHP-DEV] Native decimal scalar support and object types in
 BcMath - do we want both?
To: "Rowan Tommins [IMSoP]" <imsop.php@rwec.co.uk>
Cc: internals@lists.php.net
Content-Type: multipart/alternative; boundary="0000000000006d4ef006158985e8"
From: jordan.ledoux@gmail.com (Jordan LeDoux)

--0000000000006d4ef006158985e8
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sun, Apr 7, 2024 at 2:45=E2=80=AFPM Rowan Tommins [IMSoP] <imsop.php@rwe=
c.co.uk>
wrote:

> On 07/04/2024 20:55, Jordan LeDoux wrote:
>
> > I have been doing small bits of work, research, and investigation into
> > an MPDec or MPFR implementation for years, and I'm likely to continue
> > doing my research on that regardless of whatever is discussed in this
> > thread.
>
>
> I absolutely encourage you to do that. What I'm hoping is that you can
> share some of what you already know now, so that while we're discussing
> BCMath\Number, we can think ahead a bit to what other similar APIs we
> might build in the future. The below seems to be exactly that.
>
>
>
> > Yes. BCMath uses fixed-scale, all the other libraries use
> > fixed-precision. That is, the other libraries use a fixed number of
> > significant digits, while BCMath uses a fixed number of digits after
> > the decimal point.
>
>
> That seems like a significant difference indeed, and one that is
> potentially far more important than whether we build an OO wrapper or a
> "scalar" one.
>
>
By a "scalar" value I mean a value that has the same semantics for reading,
writing, copying, passing-by-value, passing-by-reference, and
passing-by-pointer (how objects behave) as the integer, float, or boolean
types. As I mentioned in the discussion about a "scalar arbitrary precision
type", the idea of a scalar in this meaning is a non-trivial challenge, as
the zval can only store a value that is treated in this way of 64 bits or
smaller. However, the actual numerical value that is used by every single
one of these libraries is not guaranteed to be 64 bits or smaller, and for
some of them is in fact guaranteed to be larger.

The pointer for this value would fit in the 64 bits, which is how objects
work, but that's also why objects have different semantics for scope than
integers. Objects are potentially very large in memory, so we refcount them
and pass the pointer into child scopes, instead of copying the value like
is done with integers.

Both this and the precision/scale question are pretty significant design
questions and choices. While the arbitrary precision values of these
libraries will not fit inside a zval, they are on average smaller than PHP
objects in memory, so it may not be a significant problem to eagerly copy
them like we do with integers. However, if that is not the route that is
taken, they could end up having scoping semantics that are similar to
objects, even if we don't give them a full class entry with a constructor,
properties, etc. This is part of the reason that, for example, the
ext-decimal implementation which uses the MPDec library represents these
numbers as an object with a fluent interface.


>
> > So, for instance, it would not actually be possible without manual
> > rounding in the PHP implementation to force exactly 2 decimal digits
> > of accuracy in the result and no more with MPDec.
>
>
> The current BCMath proposal is to mostly choose the scale calculations
> automatically, and to give precise control of rounding. Neither of those
> are implemented in libbcmath, which requires an explicit scale, and
> simply truncates the result at that point.
>
> That's why I said that the proposal isn't really about "an OO wrapper
> for BCMath" any more, it's a fairly generic Number API, with libbcmath
> as the back-end which we currently have available. So thinking about
> what other back-ends we might build with the same or similar wrappers is
> useful and relevant.
>
>
In general I would say that libbcmath is different enough from other
backends that we should not expect any work on a BCMath implementation to
be utilized in other implementations. It *could* be that we are able to do
that, but it should not be something people *expect* to happen because of
the technical differences.

Some of the broader language design choices would be transferable though.
For instance, the standard names of various calculation functions/methods
are something that would remain independent, even with the differences in
the implementation.


>
> > The idea of money, for instance, wanting exactly two digits would
> > require the implementation to round, because something like 0.00000013
> > has two digits of *precision*, which is what MPDec uses, but it has 8
> > digits of scale which is what BCMath uses.
>
>
> This brings us back to what the use cases are we're trying to cover with
> these wrappers.
>
> The example of fixed-scale money is not just a small niche that I happen
> to know about: brick/money has 16k stars on GitHub, and 18 million
> installs on Packagist; moneyphp/money has 4.5k stars and 45 million
> installs; one has implementations based on plain PHP, GMP, and BCMath;
> the other has a hard dependency on BCMath.
>
> Presumably, there are other use cases where working with precision
> rather than scale is essential, maybe just as popular (or that could be
> just as popular, if they could be implemented better).
>
> In which case, should we be designing a NumberInterface that provides
> both, with BCMath having a custom (and maybe slow) implementation for
> round-to-precision, and MPDec/MPFR having a custom (and maybe slow)
> implementation for round-to-scale?
>
> Or, should we abandon the idea of having one preferred number-handling
> API (whether that's NumberInterface or a core decimal type), because no
> implementation could handle both use cases?
>
>
The implementation for round-to-precision for BCMath would be much slower
than the implementation for round-to-scale for MPDec/MPFR, even if the
underlying calculations were done at the same performance. The main
challenge for the precision vs. scale issue is that precision *also*
includes the integer part for some implementations, while scale does not.
But in general, it is easier to over-calculate using precision and then
round/truncate to scale, then it is to calculate with scale not knowing
until the calculation has been completed what your precision will be (for
some kinds of calculations).

The actual underlying math of the library is easier with scale than it is
with precision. So, for instance, with a scale of 3, the minimum meaningful
difference between two values is 0.001, so you can simply continue your
calculation until the calculated error is less than this value.
Fortunately, using libraries means that these underlying mathematical
implementations do not need to be struggled with in whatever PHP
implementation we do for either.

My intuition at the moment is that a single number-handling API would be
challenging to do without an actual proposed implementation on the table
for MPDec/MPFR. The best we can do at the moment is probably reference
Rudi's implementation in ext-decimal.

For money calculations, scale is always likely to be a more useful
configuration. For mathematical calculations (such as machine learning
applications, which I would say is the other very large use case for this
kind of capability), precision is likely to be the more useful
configuration. Other applications that I have personally encountered
include: simulation and modeling, statistical distributions, and data
analysis. Most of these can be done with fair accuracy without arbitrary
precision, but there are certainly types of applications that would benefit
from or even require arbitrary precision in these spaces.

PHP at the moment is not a language that has many applications in these
spaces, with most of the actual use cases being money. My view is that this
is driven by the language features, not because PHP is ill-suited to these
applications or that developers using PHP have no applications that would
benefit from these other capabilities. There are an array of features that
are available readily in Python that are heavily used for these sorts of
applications that PHP lacks, however PHP offers several things that might
make it the more attractive option if it had similar mathematical features.
PHP is, in general, faster than Python and more performant in the areas
where the language is being used directly. However, Python allows direct
access to some very low level capabilities that are implemented directly in
C, and in those areas it certainly has an edge.

For instance, Python has modules that allow for direct off-loading to GPUs.
It has an extensive and performant set of mathematical libraries in NumPy
and SciPy that make interacting with complex mathematics seamless and fast.
PHP actually also has an extension to allow off-loading to a GPU, though
even with extensions it has very little mathematical library support. The
ext-decimal extension is probably the best example, and it has almost
nothing beyond basic arithmetic. Python also has userspace operator
overloading for its objects, which are extremely useful for all of these
spaces, but that's an entirely different discussion than what we are
talking about here.

But even with these extensions available in PHP, they are barely used by
developers at all because (at least in part) of the enormous difference
between PECL and PIP. For PHP, I do not think that extensions are an
adequate substitute like PIP modules are for Python.

This is, essentially, the thesis of the research and work that I have done
in the space since joining the internals mailing list.

Jordan

--0000000000006d4ef006158985e8
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Sun, Apr 7, 2024 at 2:45=E2=80=AFP=
M Rowan Tommins [IMSoP] &lt;<a href=3D"mailto:imsop.php@rwec.co.uk">imsop.p=
hp@rwec.co.uk</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddi=
ng-left:1ex">On 07/04/2024 20:55, Jordan LeDoux wrote:<br>
<br>
&gt; I have been doing small bits of work, research, and investigation into=
 <br>
&gt; an MPDec or MPFR implementation for years, and I&#39;m likely to conti=
nue <br>
&gt; doing my research on that regardless of whatever is discussed in this =
<br>
&gt; thread.<br>
<br>
<br>
I absolutely encourage you to do that. What I&#39;m hoping is that you can =
<br>
share some of what you already know now, so that while we&#39;re discussing=
 <br>
BCMath\Number, we can think ahead a bit to what other similar APIs we <br>
might build in the future. The below seems to be exactly that.<br>
<br>
<br>
<br>
&gt; Yes. BCMath uses fixed-scale, all the other libraries use <br>
&gt; fixed-precision. That is, the other libraries use a fixed number of <b=
r>
&gt; significant digits, while BCMath uses a fixed number of digits after <=
br>
&gt; the decimal point.<br>
<br>
<br>
That seems like a significant difference indeed, and one that is <br>
potentially far more important than whether we build an OO wrapper or a <br=
>
&quot;scalar&quot; one.<br>
<br></blockquote><div><br></div><div>By a &quot;scalar&quot; value I mean a=
 value that has the same semantics for reading, writing, copying, passing-b=
y-value, passing-by-reference, and passing-by-pointer (how objects behave) =
as the integer, float, or boolean types. As I mentioned in the discussion a=
bout a &quot;scalar arbitrary precision type&quot;, the idea of a scalar in=
 this meaning is a non-trivial challenge, as the zval can only store a valu=
e that is treated in this way of 64 bits or smaller. However, the actual nu=
merical value that is used by every single one of these libraries is not gu=
aranteed to be 64 bits or smaller, and for some of them is in fact guarante=
ed to be larger.</div><div><br></div><div>The pointer for this value would =
fit in the 64 bits, which is how objects work, but that&#39;s also why obje=
cts have different semantics for scope than integers. Objects are potential=
ly very large in memory, so we refcount them and pass the pointer into chil=
d scopes, instead of copying the value like is done with integers.</div><di=
v><br></div><div>Both this and the precision/scale question are pretty sign=
ificant design questions and choices. While the arbitrary precision values =
of these libraries will not fit inside a zval, they are on average smaller =
than PHP objects in memory, so it may not be a significant problem to eager=
ly copy them like we do with integers. However, if that is not the route th=
at is taken, they could end up having scoping semantics that are similar to=
 objects, even if we don&#39;t give them a full class entry with a construc=
tor, properties, etc. This is part of the reason that, for example, the ext=
-decimal implementation which uses the MPDec library represents these numbe=
rs as an object with a fluent interface.<br></div><div>=C2=A0</div><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px=
 solid rgb(204,204,204);padding-left:1ex">
<br>
&gt; So, for instance, it would not actually be possible without manual <br=
>
&gt; rounding in the PHP implementation to force exactly 2 decimal digits <=
br>
&gt; of accuracy in the result and no more with MPDec.<br>
<br>
<br>
The current BCMath proposal is to mostly choose the scale calculations <br>
automatically, and to give precise control of rounding. Neither of those <b=
r>
are implemented in libbcmath, which requires an explicit scale, and <br>
simply truncates the result at that point.<br>
<br>
That&#39;s why I said that the proposal isn&#39;t really about &quot;an OO =
wrapper <br>
for BCMath&quot; any more, it&#39;s a fairly generic Number API, with libbc=
math <br>
as the back-end which we currently have available. So thinking about <br>
what other back-ends we might build with the same or similar wrappers is <b=
r>
useful and relevant.<br>
<br></blockquote><div><br></div><div>In general I would say that libbcmath =
is different enough from other backends that we should not expect any work =
on a BCMath implementation to be utilized in other implementations. It *cou=
ld* be that we are able to do that, but it should not be something people *=
expect* to happen because of the technical differences.</div><div><br></div=
><div>Some of the broader language design choices would be transferable tho=
ugh. For instance, the standard names of various calculation functions/meth=
ods are something that would remain independent, even with the differences =
in the implementation.<br></div><div>=C2=A0</div><blockquote class=3D"gmail=
_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204=
,204);padding-left:1ex">
<br>
&gt; The idea of money, for instance, wanting exactly two digits would <br>
&gt; require the implementation to round, because something like 0.00000013=
 <br>
&gt; has two digits of *precision*, which is what MPDec uses, but it has 8 =
<br>
&gt; digits of scale which is what BCMath uses.<br>
<br>
<br>
This brings us back to what the use cases are we&#39;re trying to cover wit=
h <br>
these wrappers.<br>
<br>
The example of fixed-scale money is not just a small niche that I happen <b=
r>
to know about: brick/money has 16k stars on GitHub, and 18 million <br>
installs on Packagist; moneyphp/money has 4.5k stars and 45 million <br>
installs; one has implementations based on plain PHP, GMP, and BCMath; <br>
the other has a hard dependency on BCMath.<br>
<br>
Presumably, there are other use cases where working with precision <br>
rather than scale is essential, maybe just as popular (or that could be <br=
>
just as popular, if they could be implemented better).<br>
<br>
In which case, should we be designing a NumberInterface that provides <br>
both, with BCMath having a custom (and maybe slow) implementation for <br>
round-to-precision, and MPDec/MPFR having a custom (and maybe slow) <br>
implementation for round-to-scale?<br>
<br>
Or, should we abandon the idea of having one preferred number-handling <br>
API (whether that&#39;s NumberInterface or a core decimal type), because no=
 <br>
implementation could handle both use cases?<br>
<br></blockquote><div><br></div><div>The implementation for round-to-precis=
ion for BCMath would be much slower than the implementation for round-to-sc=
ale for MPDec/MPFR, even if the underlying calculations were done at the sa=
me performance. The main challenge for the precision vs. scale issue is tha=
t precision *also* includes the integer part for some implementations, whil=
e scale does not. But in general, it is easier to over-calculate using prec=
ision and then round/truncate to scale, then it is to calculate with scale =
not knowing until the calculation has been completed what your precision wi=
ll be (for some kinds of calculations).<br></div><div><br></div><div>The ac=
tual underlying math of the library is easier with scale than it is with pr=
ecision. So, for instance, with a scale of 3, the minimum meaningful differ=
ence between two values is 0.001, so you can simply continue your calculati=
on until the calculated error is less than this value. Fortunately, using l=
ibraries means that these underlying mathematical implementations do not ne=
ed to be struggled with in whatever PHP implementation we do for either.</d=
iv><div><br></div><div>My intuition at the moment is that a single number-h=
andling API would be challenging to do without an actual proposed implement=
ation on the table for MPDec/MPFR. The best we can do at the moment is prob=
ably reference Rudi&#39;s implementation in ext-decimal.</div><div><br></di=
v><div>For money calculations, scale is always likely to be a more useful c=
onfiguration. For mathematical calculations (such as machine learning appli=
cations, which I would say is the other very large use case for this kind o=
f capability), precision is likely to be the more useful configuration. Oth=
er applications that I have personally encountered include: simulation and =
modeling, statistical distributions, and data analysis. Most of these can b=
e done with fair accuracy without arbitrary precision, but there are certai=
nly types of applications that would benefit from or even require arbitrary=
 precision in these spaces.</div><div><br></div><div>PHP at the moment is n=
ot a language that has many applications in these spaces, with most of the =
actual use cases being money. My view is that this is driven by the languag=
e features, not because PHP is ill-suited to these applications or that dev=
elopers using PHP have no applications that would benefit from these other =
capabilities. There are an array of features that are available readily in =
Python that are heavily used for these sorts of applications that PHP lacks=
, however PHP offers several things that might make it the more attractive =
option if it had similar mathematical features. PHP is, in general, faster =
than Python and more performant in the areas where the language is being us=
ed directly. However, Python allows direct access to some very low level ca=
pabilities that are implemented directly in C, and in those areas it certai=
nly has an edge.</div><div><br></div><div>For instance, Python has modules =
that allow for direct off-loading to GPUs. It has an extensive and performa=
nt set of mathematical libraries in NumPy and SciPy that make interacting w=
ith complex mathematics seamless and fast. PHP actually also has an extensi=
on to allow off-loading to a GPU, though even with extensions it has very l=
ittle mathematical library support. The ext-decimal extension is probably t=
he best example, and it has almost nothing beyond basic arithmetic. Python =
also has userspace operator overloading for its objects, which are extremel=
y useful for all of these spaces, but that&#39;s an entirely different disc=
ussion than what we are talking about here.</div><div><br></div><div>But ev=
en with these extensions available in PHP, they are barely used by develope=
rs at all because (at least in part) of the enormous difference between PEC=
L and PIP. For PHP, I do not think that extensions are an adequate substitu=
te like PIP modules are for Python.<br></div><div><br></div><div>This is, e=
ssentially, the thesis of the research and work that I have done in the spa=
ce since joining the internals mailing list.</div><div><br></div><div>Jorda=
n<br></div></div></div>

--0000000000006d4ef006158985e8--