Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:123046
Precedence: bulk
MIME-Version: 1.0
References: <40553F28-2EC2-475A-BD8E-1D6517AA2A51@rwec.co.uk>
 <2B518F62-B774-45C9-82A2-EF6653AAE34E@sakiot.com> <A703D145-1A10-4C7E-9E0C-1F0BEF94CF99@rwec.co.uk>
 <CAMrTa2G+_v1aO_g7NLiLsTwquJaf9=Zj5bp2ODGzVSB3iGr6pw@mail.gmail.com>
 <0f3d0f89-3064-4d56-9fb2-801bb0cda8a5@rwec.co.uk> <CAMrTa2HgSSTT1wJPz0x4X=nW2ijQDmaGhdNGKx1rLePdBWGfwQ@mail.gmail.com>
 <c3663fb3-1a78-4077-80b9-8431119fbb96@rwec.co.uk>
In-Reply-To: <c3663fb3-1a78-4077-80b9-8431119fbb96@rwec.co.uk>
Date: Mon, 8 Apr 2024 13:51:46 -0700
Message-ID: <CAMrTa2FeSTGE9rRdeMVzULkmn4C-LQ0GC8upDtEx1JEo6A0LFA@mail.gmail.com>
Subject: Re: [PHP-DEV] Native decimal scalar support and object types in
 BcMath - do we want both?
To: "Rowan Tommins [IMSoP]" <imsop.php@rwec.co.uk>
Cc: internals@lists.php.net
Content-Type: multipart/alternative; boundary="000000000000379eb306159bfaf9"
From: jordan.ledoux@gmail.com (Jordan LeDoux)

--000000000000379eb306159bfaf9
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, Apr 8, 2024 at 12:23=E2=80=AFPM Rowan Tommins [IMSoP] <imsop.php@rw=
ec.co.uk>
wrote:

>
> As I mentioned in the discussion about a "scalar arbitrary precision
> type", the idea of a scalar in this meaning is a non-trivial challenge, a=
s
> the zval can only store a value that is treated in this way of 64 bits or
> smaller.
>
>
> Fortunately, that's not true. If you think about it, that would rule out
> not only arrays, but any string longer than 8 bytes long!
>
> The way PHP handles this is called "copy-on-write" (COW), where multiple
> variables can point to the same zval until one of them needs to write to
> it, at which point a copy is transparently created.
>
>
> The pointer for this value would fit in the 64 bits, which is how objects
> work, but that's also why objects have different semantics for scope than
> integers. Objects are potentially very large in memory, so we refcount th=
em
> and pass the pointer into child scopes, instead of copying the value like
> is done with integers.
>
>
> Objects are not the only thing that is refcounted. In fact, in PHP 4.x an=
d
> 5.x, *every* zval used a refcount and COW approach; changing some types t=
o
> be eagerly copied instead was one of the major performance improvements i=
n
> the "PHP NG" project which formed the basis of PHP 7.0. You can actually
> see this in action here: https://3v4l.org/oPgr4
>
> This is all completely transparent to the user, as are a bunch of other
> memory/speed optimisations, like interned string literals, packed arrays,
> etc.
>
> So, there may be performance gains if we can squeeze values into the zval
> memory, but it doesn't need to affect the semantics of the new type.
>
I have mentioned before that my understanding of the deeper aspects of how
zvals work is very lacking compared to some others, so this is very
helpful. I was of course aware that strings and arrays can be larger than
64 bits, but was under the impression that the hashtable structure in part
was responsible for those being somewhat different. I confess that I do not
understand the technical intricacies of the interned strings and packed
arrays, I just understand that the zval structure for these arbitrary
precision values would probably be non-trivial, and from what I was able to
research and determine that was in part related to the 64bit zval limit.
But thank you for the clarity and the added detail, it's always good to
learn places where you are mistaken, and this is all extremely helpful to
know.

This probably relates quite closely to Arvid's point that for a lot of
> uses, we don't actually need arbitrary precision, just something that can
> represent small-to-medium decimal numbers without the inaccuracies of
> binary floating point. That some libraries can be used for both purposes =
is
> not necessarily evidence that we could ever "bless" one for both use case=
s
> and make it a single native type.


Honestly, if you need a scale of less than about 15 and simply want FP
error free decimals, BCMath is perfectly adequate for that in most of the
use cases I described. The larger issue for a lot of these applications is
not that they need to calculate 50 digits of accuracy and BCMath is too
slow, it's that they need non-arithmetic operations, such as sin(), cos(),
exp(), vector multiplication, dot products, etc., while maintaining that
low to medium decimal accuracy. libbcmath just doesn't support those
things, and creating your own implementation of say the sin() function that
maintains arbitrary precision is... challenging. It compounds the
performance deficiencies of BCMath exponentially, as you have to break it
into many different arithmetic operations.

To me, while being 100x to 1000x more performant at arithmetic is certainly
reason enough on its own, the fact that MPFR (for example) has C
implementations for more complex operations that can be utilized is the
real selling point. The ext-stats extension hasn't been maintained since
7.4. And trig is critical for a lot of stats functions. A fairly common use
of stats, even in applications you might not expect it, is to generate a
Gaussian Random Number. That is, generate a random number where if you
continued generating random numbers from the same generator, they would
form a normal distribution (a bell curve), so the random number is weighted
according to the distribution.

The simplest way to do that is with the sin() and cos() functions (picking
a point on a circle). But a lot of really useful such mathematics are
mainly provided by libraries that ALSO provide arbitrary precision. So for
instance, the Gamma Function is another very common function in statistics.
To me, implementing a bundled or core type that utilizes MPFR (or something
similar) is as much about getting access to THESE mathematical functions as
it is the arbitrary precision aspect.

Jordan

--000000000000379eb306159bfaf9
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Mon, Apr 8, 2024 at 12:23=E2=80=AF=
PM Rowan Tommins [IMSoP] &lt;<a href=3D"mailto:imsop.php@rwec.co.uk">imsop.=
php@rwec.co.uk</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padd=
ing-left:1ex"><u></u>

 =20
   =20
 =20
  <div><br><blockquote type=3D"cite">
      <div dir=3D"ltr">
        <div class=3D"gmail_quote">
          <div>As I mentioned in the discussion about a &quot;scalar
            arbitrary precision type&quot;, the idea of a scalar in this
            meaning is a non-trivial challenge, as the zval can only
            store a value that is treated in this way of 64 bits or
            smaller.</div>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p>Fortunately, that&#39;s not true. If you think about it, that would
      rule out not only arrays, but any string longer than 8 bytes
      long!=C2=A0</p>
    <p>The way PHP handles this is called &quot;copy-on-write&quot; (COW), =
where
      multiple variables can point to the same zval until one of them
      needs to write to it, at which point a copy is transparently
      created.</p>
    <p><br>
    </p>
    <blockquote type=3D"cite">
      <div dir=3D"ltr">
        <div class=3D"gmail_quote">
          <div>The pointer for this value would fit in the 64 bits,
            which is how objects work, but that&#39;s also why objects have
            different semantics for scope than integers. Objects are
            potentially very large in memory, so we refcount them and
            pass the pointer into child scopes, instead of copying the
            value like is done with integers.</div>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p>Objects are not the only thing that is refcounted. In fact, in
      PHP 4.x and 5.x, *every* zval used a refcount and COW approach;
      changing some types to be eagerly copied instead was one of the
      major performance improvements in the &quot;PHP NG&quot; project whic=
h
      formed the basis of PHP 7.0. You can actually see this in action
      here: <a href=3D"https://3v4l.org/oPgr4" target=3D"_blank">https://3v=
4l.org/oPgr4</a></p>
    <p>This is all completely transparent to the user, as are a bunch of
      other memory/speed optimisations, like interned string literals,
      packed arrays, etc.</p>
    <p>So, there may be performance gains if we can squeeze values into
      the zval memory, but it doesn&#39;t need to affect the semantics of
      the new type.<br>
    </p></div></blockquote><div>I have mentioned before that my understandi=
ng of the deeper aspects of how zvals work is very lacking compared to some=
 others, so this is very helpful. I was of course aware that strings and ar=
rays can be larger than 64 bits, but was under the impression that the hash=
table structure in part was responsible for those being somewhat different.=
 I confess that I do not understand the technical intricacies of the intern=
ed strings and packed arrays, I just understand that the zval structure for=
 these arbitrary precision values would probably be non-trivial, and from w=
hat I was able to research and determine that was in part related to the 64=
bit zval limit. But thank you for the clarity and the added detail, it&#39;=
s always good to learn places where you are mistaken, and this is all extre=
mely helpful to know. <br></div><div><br></div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2=
04);padding-left:1ex">
This probably relates quite closely to Arvid&#39;s point that for a
      lot of uses, we don&#39;t actually need arbitrary precision, just
      something that can represent small-to-medium decimal numbers
      without the inaccuracies of binary floating point. That some
      libraries can be used for both purposes is not necessarily
      evidence that we could ever &quot;bless&quot; one for both use cases =
and
      make it a single native type.

</blockquote><div><br></div><div>Honestly, if you need a scale of less than=
 about 15 and simply want FP error free decimals, BCMath is perfectly adequ=
ate for that in most of the use cases I described. The larger issue for a l=
ot of these applications is not that they need to calculate 50 digits of ac=
curacy and BCMath is too slow, it&#39;s that they need non-arithmetic opera=
tions, such as sin(), cos(), exp(), vector multiplication, dot products, et=
c., while maintaining that low to medium decimal accuracy. libbcmath just d=
oesn&#39;t support those things, and creating your own implementation of sa=
y the sin() function that maintains arbitrary precision is... challenging. =
It compounds the performance deficiencies of BCMath exponentially, as you h=
ave to break it into many different arithmetic operations.</div><div><br></=
div><div>To me, while being 100x to 1000x more performant at arithmetic is =
certainly reason enough on its own, the fact that MPFR (for example) has C =
implementations for more complex operations that can be utilized is the rea=
l selling point. The ext-stats extension hasn&#39;t been maintained since 7=
.4. And trig is critical for a lot of stats functions. A fairly common use =
of stats, even in applications you might not expect it, is to generate a Ga=
ussian Random Number. That is, generate a random number where if you contin=
ued generating random numbers from the same generator, they would form a no=
rmal distribution (a bell curve), so the random number is weighted accordin=
g to the distribution.</div><div><br></div><div>The simplest way to do that=
 is with the sin() and cos() functions (picking a point on a circle). But a=
 lot of really useful such mathematics are mainly provided by libraries tha=
t ALSO provide arbitrary precision. So for instance, the Gamma Function is =
another very common function in statistics. To me, implementing a bundled o=
r core type that utilizes MPFR (or something similar) is as much about gett=
ing access to THESE mathematical functions as it is the arbitrary precision=
 aspect.</div><div><br></div><div>Jordan<br></div></div></div>

--000000000000379eb306159bfaf9--