Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:123026 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 9084B1A009C for ; Sun, 7 Apr 2024 22:50:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1712530287; bh=ppK9GXZ4Y37XNipk3MeJXW+YiDNBxduBzDV8fsMaAyk=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=IeTKYPWIX/1EWNrDVOZ+Qz0LzRb/i1eW3apjtq3gvic3YyOfHRn1xTjqB+lexj74r euvwvLVK3toCewMp4xUXw1189DOWBoVG8ySMKooOJKseNfRJfb0D3Ew1/Uk7Xu8Svb FePWSKQkPwAnf8a42KQNppDAphMU7t0diEeUYCQCo2EsJ5bNOqHH4+WrXbahN5nYdp PjVWOXhOUoQRQfr/ZCjc9H1BNjt0YTE0gfuHYtv/kZxXkCZEElHBCDXAo4RCcaPHI2 E5ww8BHfLm0Npkz9wDgGIdE9pVJfdMVszRQTRTeK3iR00s8e8TmNNTE12HdbgZAuNx huuWlVkxO0fcQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 085D7180660 for ; Sun, 7 Apr 2024 22:51:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-oo1-f47.google.com (mail-oo1-f47.google.com [209.85.161.47]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 7 Apr 2024 22:51:25 +0000 (UTC) Received: by mail-oo1-f47.google.com with SMTP id 006d021491bc7-5aa362cc2ccso591296eaf.3 for ; Sun, 07 Apr 2024 15:50:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712530253; x=1713135053; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ppK9GXZ4Y37XNipk3MeJXW+YiDNBxduBzDV8fsMaAyk=; b=nFw0Nqu0pdWLsN56SOdgJYrY710Uu0+XM15hBW7SZOfnJLOIKMBk2bo7Y+FY1q0tw6 2D8/eL4bvN2y3yVtOEw1VPCj2rZYLo9qnrvt7sQE70ehvSiRDBeA4hFoeFvelwtbLG5E uOHUvzn4mOSJfwrIIy/kvW87mvI8yXUCTPG9OJf3WcLICBdWew/f83t9sjE6F0AJLlwO tthyNVh45xADCsBSbJsxBOznB9vUZKOurEkuY5RSogeQ36S/dUwMLN/8App9+QQQ/fVz VSGdTCfXMrlAV4uNzMtsVDlD+2RIMD05jx9KQifblVXFA85iOHLIxYvpX/TNUb3cNDUe i93Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712530253; x=1713135053; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ppK9GXZ4Y37XNipk3MeJXW+YiDNBxduBzDV8fsMaAyk=; b=QSa55BgKdknH/YRwCK01nGFAZHCHuzA9ihDcOizqj7uTuP9dLpduPNgfBMhjAAWC7S Ds0ZxZbD7YE80Ax3LsZGvtB/hQnz3NxM0apUTd4JUMqMThg5Jj0rz3Ziaf4whxJ7UQ3g PIlfCSeU+HtBRohs6+Ezz/9kyqyqQ2s6Yl2+G/NAz0/ckfSm1uX4pV8Dbpy94k3JYU4d NGdpyjdsDN96maz8hxIxiBVUPg84UzaeV3Jt2E4xZD9AaLfG1wxTT63P7YVDIiGmAW3R E+XyjI/yS72OXmEfxJMVNje7ozz+ChP2x1OQ6U9Oc4hrsvkd/2mlgZpXhvygrCLb/btl AF9A== X-Gm-Message-State: AOJu0YyzCtQ7zAayxMLX9YQknN1Nd4/pvQxxTzJxkOIVR5BIPdyOGvLo Da/C8dt1rLMiCEfqrIZZY+OAAJcFXX1tgyhkQZH1EQ+OkuGEv8co/blYT0Y6D3moPIiiZOuty5f oArRL3nIqE7h1l0YKiqyfieWguDDp9N+a X-Google-Smtp-Source: AGHT+IGJVz/wW12gOBcjsh97ij6G4N5ikwufbU8Ocg6jRy9KHg1Zq91i+R2RdwGJI+wuncWkrPEuUHvzmaY2AumQNWE= X-Received: by 2002:a05:6358:ed1a:b0:183:55a6:b240 with SMTP id hy26-20020a056358ed1a00b0018355a6b240mr8566790rwb.4.1712530253497; Sun, 07 Apr 2024 15:50:53 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: <40553F28-2EC2-475A-BD8E-1D6517AA2A51@rwec.co.uk> <2B518F62-B774-45C9-82A2-EF6653AAE34E@sakiot.com> <0f3d0f89-3064-4d56-9fb2-801bb0cda8a5@rwec.co.uk> In-Reply-To: <0f3d0f89-3064-4d56-9fb2-801bb0cda8a5@rwec.co.uk> Date: Sun, 7 Apr 2024 15:50:39 -0700 Message-ID: Subject: Re: [PHP-DEV] Native decimal scalar support and object types in BcMath - do we want both? To: "Rowan Tommins [IMSoP]" Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary="0000000000006d4ef006158985e8" From: jordan.ledoux@gmail.com (Jordan LeDoux) --0000000000006d4ef006158985e8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Apr 7, 2024 at 2:45=E2=80=AFPM Rowan Tommins [IMSoP] wrote: > On 07/04/2024 20:55, Jordan LeDoux wrote: > > > I have been doing small bits of work, research, and investigation into > > an MPDec or MPFR implementation for years, and I'm likely to continue > > doing my research on that regardless of whatever is discussed in this > > thread. > > > I absolutely encourage you to do that. What I'm hoping is that you can > share some of what you already know now, so that while we're discussing > BCMath\Number, we can think ahead a bit to what other similar APIs we > might build in the future. The below seems to be exactly that. > > > > > Yes. BCMath uses fixed-scale, all the other libraries use > > fixed-precision. That is, the other libraries use a fixed number of > > significant digits, while BCMath uses a fixed number of digits after > > the decimal point. > > > That seems like a significant difference indeed, and one that is > potentially far more important than whether we build an OO wrapper or a > "scalar" one. > > By a "scalar" value I mean a value that has the same semantics for reading, writing, copying, passing-by-value, passing-by-reference, and passing-by-pointer (how objects behave) as the integer, float, or boolean types. As I mentioned in the discussion about a "scalar arbitrary precision type", the idea of a scalar in this meaning is a non-trivial challenge, as the zval can only store a value that is treated in this way of 64 bits or smaller. However, the actual numerical value that is used by every single one of these libraries is not guaranteed to be 64 bits or smaller, and for some of them is in fact guaranteed to be larger. The pointer for this value would fit in the 64 bits, which is how objects work, but that's also why objects have different semantics for scope than integers. Objects are potentially very large in memory, so we refcount them and pass the pointer into child scopes, instead of copying the value like is done with integers. Both this and the precision/scale question are pretty significant design questions and choices. While the arbitrary precision values of these libraries will not fit inside a zval, they are on average smaller than PHP objects in memory, so it may not be a significant problem to eagerly copy them like we do with integers. However, if that is not the route that is taken, they could end up having scoping semantics that are similar to objects, even if we don't give them a full class entry with a constructor, properties, etc. This is part of the reason that, for example, the ext-decimal implementation which uses the MPDec library represents these numbers as an object with a fluent interface. > > > So, for instance, it would not actually be possible without manual > > rounding in the PHP implementation to force exactly 2 decimal digits > > of accuracy in the result and no more with MPDec. > > > The current BCMath proposal is to mostly choose the scale calculations > automatically, and to give precise control of rounding. Neither of those > are implemented in libbcmath, which requires an explicit scale, and > simply truncates the result at that point. > > That's why I said that the proposal isn't really about "an OO wrapper > for BCMath" any more, it's a fairly generic Number API, with libbcmath > as the back-end which we currently have available. So thinking about > what other back-ends we might build with the same or similar wrappers is > useful and relevant. > > In general I would say that libbcmath is different enough from other backends that we should not expect any work on a BCMath implementation to be utilized in other implementations. It *could* be that we are able to do that, but it should not be something people *expect* to happen because of the technical differences. Some of the broader language design choices would be transferable though. For instance, the standard names of various calculation functions/methods are something that would remain independent, even with the differences in the implementation. > > > The idea of money, for instance, wanting exactly two digits would > > require the implementation to round, because something like 0.00000013 > > has two digits of *precision*, which is what MPDec uses, but it has 8 > > digits of scale which is what BCMath uses. > > > This brings us back to what the use cases are we're trying to cover with > these wrappers. > > The example of fixed-scale money is not just a small niche that I happen > to know about: brick/money has 16k stars on GitHub, and 18 million > installs on Packagist; moneyphp/money has 4.5k stars and 45 million > installs; one has implementations based on plain PHP, GMP, and BCMath; > the other has a hard dependency on BCMath. > > Presumably, there are other use cases where working with precision > rather than scale is essential, maybe just as popular (or that could be > just as popular, if they could be implemented better). > > In which case, should we be designing a NumberInterface that provides > both, with BCMath having a custom (and maybe slow) implementation for > round-to-precision, and MPDec/MPFR having a custom (and maybe slow) > implementation for round-to-scale? > > Or, should we abandon the idea of having one preferred number-handling > API (whether that's NumberInterface or a core decimal type), because no > implementation could handle both use cases? > > The implementation for round-to-precision for BCMath would be much slower than the implementation for round-to-scale for MPDec/MPFR, even if the underlying calculations were done at the same performance. The main challenge for the precision vs. scale issue is that precision *also* includes the integer part for some implementations, while scale does not. But in general, it is easier to over-calculate using precision and then round/truncate to scale, then it is to calculate with scale not knowing until the calculation has been completed what your precision will be (for some kinds of calculations). The actual underlying math of the library is easier with scale than it is with precision. So, for instance, with a scale of 3, the minimum meaningful difference between two values is 0.001, so you can simply continue your calculation until the calculated error is less than this value. Fortunately, using libraries means that these underlying mathematical implementations do not need to be struggled with in whatever PHP implementation we do for either. My intuition at the moment is that a single number-handling API would be challenging to do without an actual proposed implementation on the table for MPDec/MPFR. The best we can do at the moment is probably reference Rudi's implementation in ext-decimal. For money calculations, scale is always likely to be a more useful configuration. For mathematical calculations (such as machine learning applications, which I would say is the other very large use case for this kind of capability), precision is likely to be the more useful configuration. Other applications that I have personally encountered include: simulation and modeling, statistical distributions, and data analysis. Most of these can be done with fair accuracy without arbitrary precision, but there are certainly types of applications that would benefit from or even require arbitrary precision in these spaces. PHP at the moment is not a language that has many applications in these spaces, with most of the actual use cases being money. My view is that this is driven by the language features, not because PHP is ill-suited to these applications or that developers using PHP have no applications that would benefit from these other capabilities. There are an array of features that are available readily in Python that are heavily used for these sorts of applications that PHP lacks, however PHP offers several things that might make it the more attractive option if it had similar mathematical features. PHP is, in general, faster than Python and more performant in the areas where the language is being used directly. However, Python allows direct access to some very low level capabilities that are implemented directly in C, and in those areas it certainly has an edge. For instance, Python has modules that allow for direct off-loading to GPUs. It has an extensive and performant set of mathematical libraries in NumPy and SciPy that make interacting with complex mathematics seamless and fast. PHP actually also has an extension to allow off-loading to a GPU, though even with extensions it has very little mathematical library support. The ext-decimal extension is probably the best example, and it has almost nothing beyond basic arithmetic. Python also has userspace operator overloading for its objects, which are extremely useful for all of these spaces, but that's an entirely different discussion than what we are talking about here. But even with these extensions available in PHP, they are barely used by developers at all because (at least in part) of the enormous difference between PECL and PIP. For PHP, I do not think that extensions are an adequate substitute like PIP modules are for Python. This is, essentially, the thesis of the research and work that I have done in the space since joining the internals mailing list. Jordan --0000000000006d4ef006158985e8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Sun, Apr 7, 2024 at 2:45=E2=80=AFP= M Rowan Tommins [IMSoP] <imsop.p= hp@rwec.co.uk> wrote:
On 07/04/2024 20:55, Jordan LeDoux wrote:

> I have been doing small bits of work, research, and investigation into=
> an MPDec or MPFR implementation for years, and I'm likely to conti= nue
> doing my research on that regardless of whatever is discussed in this =
> thread.


I absolutely encourage you to do that. What I'm hoping is that you can =
share some of what you already know now, so that while we're discussing=
BCMath\Number, we can think ahead a bit to what other similar APIs we
might build in the future. The below seems to be exactly that.



> Yes. BCMath uses fixed-scale, all the other libraries use
> fixed-precision. That is, the other libraries use a fixed number of > significant digits, while BCMath uses a fixed number of digits after <= br> > the decimal point.


That seems like a significant difference indeed, and one that is
potentially far more important than whether we build an OO wrapper or a "scalar" one.


By a "scalar" value I mean a= value that has the same semantics for reading, writing, copying, passing-b= y-value, passing-by-reference, and passing-by-pointer (how objects behave) = as the integer, float, or boolean types. As I mentioned in the discussion a= bout a "scalar arbitrary precision type", the idea of a scalar in= this meaning is a non-trivial challenge, as the zval can only store a valu= e that is treated in this way of 64 bits or smaller. However, the actual nu= merical value that is used by every single one of these libraries is not gu= aranteed to be 64 bits or smaller, and for some of them is in fact guarante= ed to be larger.

The pointer for this value would = fit in the 64 bits, which is how objects work, but that's also why obje= cts have different semantics for scope than integers. Objects are potential= ly very large in memory, so we refcount them and pass the pointer into chil= d scopes, instead of copying the value like is done with integers.

Both this and the precision/scale question are pretty sign= ificant design questions and choices. While the arbitrary precision values = of these libraries will not fit inside a zval, they are on average smaller = than PHP objects in memory, so it may not be a significant problem to eager= ly copy them like we do with integers. However, if that is not the route th= at is taken, they could end up having scoping semantics that are similar to= objects, even if we don't give them a full class entry with a construc= tor, properties, etc. This is part of the reason that, for example, the ext= -decimal implementation which uses the MPDec library represents these numbe= rs as an object with a fluent interface.
=C2=A0

> So, for instance, it would not actually be possible without manual > rounding in the PHP implementation to force exactly 2 decimal digits <= br> > of accuracy in the result and no more with MPDec.


The current BCMath proposal is to mostly choose the scale calculations
automatically, and to give precise control of rounding. Neither of those are implemented in libbcmath, which requires an explicit scale, and
simply truncates the result at that point.

That's why I said that the proposal isn't really about "an OO = wrapper
for BCMath" any more, it's a fairly generic Number API, with libbc= math
as the back-end which we currently have available. So thinking about
what other back-ends we might build with the same or similar wrappers is useful and relevant.


In general I would say that libbcmath = is different enough from other backends that we should not expect any work = on a BCMath implementation to be utilized in other implementations. It *cou= ld* be that we are able to do that, but it should not be something people *= expect* to happen because of the technical differences.

Some of the broader language design choices would be transferable tho= ugh. For instance, the standard names of various calculation functions/meth= ods are something that would remain independent, even with the differences = in the implementation.
=C2=A0

> The idea of money, for instance, wanting exactly two digits would
> require the implementation to round, because something like 0.00000013=
> has two digits of *precision*, which is what MPDec uses, but it has 8 =
> digits of scale which is what BCMath uses.


This brings us back to what the use cases are we're trying to cover wit= h
these wrappers.

The example of fixed-scale money is not just a small niche that I happen to know about: brick/money has 16k stars on GitHub, and 18 million
installs on Packagist; moneyphp/money has 4.5k stars and 45 million
installs; one has implementations based on plain PHP, GMP, and BCMath;
the other has a hard dependency on BCMath.

Presumably, there are other use cases where working with precision
rather than scale is essential, maybe just as popular (or that could be just as popular, if they could be implemented better).

In which case, should we be designing a NumberInterface that provides
both, with BCMath having a custom (and maybe slow) implementation for
round-to-precision, and MPDec/MPFR having a custom (and maybe slow)
implementation for round-to-scale?

Or, should we abandon the idea of having one preferred number-handling
API (whether that's NumberInterface or a core decimal type), because no=
implementation could handle both use cases?


The implementation for round-to-precis= ion for BCMath would be much slower than the implementation for round-to-sc= ale for MPDec/MPFR, even if the underlying calculations were done at the sa= me performance. The main challenge for the precision vs. scale issue is tha= t precision *also* includes the integer part for some implementations, whil= e scale does not. But in general, it is easier to over-calculate using prec= ision and then round/truncate to scale, then it is to calculate with scale = not knowing until the calculation has been completed what your precision wi= ll be (for some kinds of calculations).

The ac= tual underlying math of the library is easier with scale than it is with pr= ecision. So, for instance, with a scale of 3, the minimum meaningful differ= ence between two values is 0.001, so you can simply continue your calculati= on until the calculated error is less than this value. Fortunately, using l= ibraries means that these underlying mathematical implementations do not ne= ed to be struggled with in whatever PHP implementation we do for either.

My intuition at the moment is that a single number-h= andling API would be challenging to do without an actual proposed implement= ation on the table for MPDec/MPFR. The best we can do at the moment is prob= ably reference Rudi's implementation in ext-decimal.

For money calculations, scale is always likely to be a more useful c= onfiguration. For mathematical calculations (such as machine learning appli= cations, which I would say is the other very large use case for this kind o= f capability), precision is likely to be the more useful configuration. Oth= er applications that I have personally encountered include: simulation and = modeling, statistical distributions, and data analysis. Most of these can b= e done with fair accuracy without arbitrary precision, but there are certai= nly types of applications that would benefit from or even require arbitrary= precision in these spaces.

PHP at the moment is n= ot a language that has many applications in these spaces, with most of the = actual use cases being money. My view is that this is driven by the languag= e features, not because PHP is ill-suited to these applications or that dev= elopers using PHP have no applications that would benefit from these other = capabilities. There are an array of features that are available readily in = Python that are heavily used for these sorts of applications that PHP lacks= , however PHP offers several things that might make it the more attractive = option if it had similar mathematical features. PHP is, in general, faster = than Python and more performant in the areas where the language is being us= ed directly. However, Python allows direct access to some very low level ca= pabilities that are implemented directly in C, and in those areas it certai= nly has an edge.

For instance, Python has modules = that allow for direct off-loading to GPUs. It has an extensive and performa= nt set of mathematical libraries in NumPy and SciPy that make interacting w= ith complex mathematics seamless and fast. PHP actually also has an extensi= on to allow off-loading to a GPU, though even with extensions it has very l= ittle mathematical library support. The ext-decimal extension is probably t= he best example, and it has almost nothing beyond basic arithmetic. Python = also has userspace operator overloading for its objects, which are extremel= y useful for all of these spaces, but that's an entirely different disc= ussion than what we are talking about here.

But ev= en with these extensions available in PHP, they are barely used by develope= rs at all because (at least in part) of the enormous difference between PEC= L and PIP. For PHP, I do not think that extensions are an adequate substitu= te like PIP modules are for Python.

This is, e= ssentially, the thesis of the research and work that I have done in the spa= ce since joining the internals mailing list.

Jorda= n
--0000000000006d4ef006158985e8--