Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:123046 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 4C43C1A009C for ; Mon, 8 Apr 2024 20:52:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1712609559; bh=cOpyR0jikqYfapHCdFgKdy3VIoJFYf3Z8JL4jusUCW0=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=baFOgXvMWUYZSfxn1ePCnqB88iGoNoRC3oBfLfmoPuhv8xnJiDxvphzSM4kjdMsRe jny2wGqGjLrwn0OoIzVSePgs8uWO1RoVG1b8oIq5dkkkZcCBiUwGIqtq6qg9C8GYNr QfQ836N0IgKRK86NqJV2srfSTxYdp/l8YSv4e6u3B0NButnPgi5a0dWhDaa3VWuU0D Tsyq1kfI/enLpVa8FC0AwR7wZ21MEM7SzoQmm9eObdrHUkcT5lqAuS3eQIWINtzuVW xBbBsUpLiKQ3gnbCZMByGIcbY+HsjmcMzdVU9VERft4Xz2PdZ0jGM52+niN1Lvkrdy +lkoaBOiuN+sA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id CCE3318006E for ; Mon, 8 Apr 2024 20:52:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 8 Apr 2024 20:52:35 +0000 (UTC) Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-6ecf1bb7f38so3271173b3a.0 for ; Mon, 08 Apr 2024 13:52:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712609522; x=1713214322; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=87Jn8k/rl4kQCEXuzZSv5QKyp+nPlaNUtVtCknxLNl4=; b=Y/BPxnq4s+sU31ewHH3iY8qts0Af5XzEc/sZs+FUgWueRMjWFyLAl1D2X6l/Q5YGW+ fojeK/yVswsuI599RONuLligi6Emj1q3o/mbTy+fuKJFoG17jkHoq7UQZQMjFBIicHrq vmt+64OiP+JLaaR5SwW3/sjAhj1/xfakwQJXnLmS/5IUcil0TcRInpjF9O2dOxaWHpkK 5XoFyw8/oFesRVjqvMaxHCSmbXH5yDNiWAq7ufkg43UHHeij/tWNOiaen8lFpsBj1e/4 KUBMc1S7VXbrrBUppMHPcahsjFkQkUUwHruCYwgNZJu3sokwlfXfjdLSywhFj6sm9QLf 78vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712609522; x=1713214322; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=87Jn8k/rl4kQCEXuzZSv5QKyp+nPlaNUtVtCknxLNl4=; b=ngWi3ql9Vlcl4uA1Ac04bYlD48c+IWsLjVoDqM/I27qwRctiaFWmPJm/ALwBvAzxUW gUvZJ+1Hk2hUF23Nu3hcim3lOy+rRLaL4Wras1o/8TVjDltbySoifSriYzhLo8sWYqby YPgSQMLwSXlEz+JBCc/OwLVJh3wqhhRa5P4GHKO3g2cARx9irn/hNU3bR4xnSVdivDPo 920xppL8EYqYJqL98oQr6TlGRbJcurlgSCRhed/jNSmr1i8kh0LcInF0NBnbe6fXCEwS +uUvOQq2jr5hsvv0/Zr5deovsnE/FW2OqXOFa0nADjLOdSCVsF2lCC04oK+poaCCU4uX XsCw== X-Gm-Message-State: AOJu0YydhX9Kq3jrdBoHOgpX43qXm+b9FSedceu+9wxI361s+i7KvLt5 26m/WaooARNmZ5zgHNJu0fX6Y4mFJw8XNRMXIS8AtKvJI7eWoW8gjNgmyDWwxJipdUWZgzWQwNc UmpVM+KjJg+0Oveg/garVPLKo+Yqdz6bFpPk= X-Google-Smtp-Source: AGHT+IEcnVIeoSY4zhy8/kvSHR/uRLnwr6ZRrE4u/hYfEIdtRdXBWPr1zslMp9xZxvv+hIwbjUNZeX4Aqtm8+/0cCfw= X-Received: by 2002:a05:6a20:72a1:b0:1a7:7ccf:2f96 with SMTP id o33-20020a056a2072a100b001a77ccf2f96mr3476391pzk.43.1712609522324; Mon, 08 Apr 2024 13:52:02 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: <40553F28-2EC2-475A-BD8E-1D6517AA2A51@rwec.co.uk> <2B518F62-B774-45C9-82A2-EF6653AAE34E@sakiot.com> <0f3d0f89-3064-4d56-9fb2-801bb0cda8a5@rwec.co.uk> In-Reply-To: Date: Mon, 8 Apr 2024 13:51:46 -0700 Message-ID: Subject: Re: [PHP-DEV] Native decimal scalar support and object types in BcMath - do we want both? To: "Rowan Tommins [IMSoP]" Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary="000000000000379eb306159bfaf9" From: jordan.ledoux@gmail.com (Jordan LeDoux) --000000000000379eb306159bfaf9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Apr 8, 2024 at 12:23=E2=80=AFPM Rowan Tommins [IMSoP] wrote: > > As I mentioned in the discussion about a "scalar arbitrary precision > type", the idea of a scalar in this meaning is a non-trivial challenge, a= s > the zval can only store a value that is treated in this way of 64 bits or > smaller. > > > Fortunately, that's not true. If you think about it, that would rule out > not only arrays, but any string longer than 8 bytes long! > > The way PHP handles this is called "copy-on-write" (COW), where multiple > variables can point to the same zval until one of them needs to write to > it, at which point a copy is transparently created. > > > The pointer for this value would fit in the 64 bits, which is how objects > work, but that's also why objects have different semantics for scope than > integers. Objects are potentially very large in memory, so we refcount th= em > and pass the pointer into child scopes, instead of copying the value like > is done with integers. > > > Objects are not the only thing that is refcounted. In fact, in PHP 4.x an= d > 5.x, *every* zval used a refcount and COW approach; changing some types t= o > be eagerly copied instead was one of the major performance improvements i= n > the "PHP NG" project which formed the basis of PHP 7.0. You can actually > see this in action here: https://3v4l.org/oPgr4 > > This is all completely transparent to the user, as are a bunch of other > memory/speed optimisations, like interned string literals, packed arrays, > etc. > > So, there may be performance gains if we can squeeze values into the zval > memory, but it doesn't need to affect the semantics of the new type. > I have mentioned before that my understanding of the deeper aspects of how zvals work is very lacking compared to some others, so this is very helpful. I was of course aware that strings and arrays can be larger than 64 bits, but was under the impression that the hashtable structure in part was responsible for those being somewhat different. I confess that I do not understand the technical intricacies of the interned strings and packed arrays, I just understand that the zval structure for these arbitrary precision values would probably be non-trivial, and from what I was able to research and determine that was in part related to the 64bit zval limit. But thank you for the clarity and the added detail, it's always good to learn places where you are mistaken, and this is all extremely helpful to know. This probably relates quite closely to Arvid's point that for a lot of > uses, we don't actually need arbitrary precision, just something that can > represent small-to-medium decimal numbers without the inaccuracies of > binary floating point. That some libraries can be used for both purposes = is > not necessarily evidence that we could ever "bless" one for both use case= s > and make it a single native type. Honestly, if you need a scale of less than about 15 and simply want FP error free decimals, BCMath is perfectly adequate for that in most of the use cases I described. The larger issue for a lot of these applications is not that they need to calculate 50 digits of accuracy and BCMath is too slow, it's that they need non-arithmetic operations, such as sin(), cos(), exp(), vector multiplication, dot products, etc., while maintaining that low to medium decimal accuracy. libbcmath just doesn't support those things, and creating your own implementation of say the sin() function that maintains arbitrary precision is... challenging. It compounds the performance deficiencies of BCMath exponentially, as you have to break it into many different arithmetic operations. To me, while being 100x to 1000x more performant at arithmetic is certainly reason enough on its own, the fact that MPFR (for example) has C implementations for more complex operations that can be utilized is the real selling point. The ext-stats extension hasn't been maintained since 7.4. And trig is critical for a lot of stats functions. A fairly common use of stats, even in applications you might not expect it, is to generate a Gaussian Random Number. That is, generate a random number where if you continued generating random numbers from the same generator, they would form a normal distribution (a bell curve), so the random number is weighted according to the distribution. The simplest way to do that is with the sin() and cos() functions (picking a point on a circle). But a lot of really useful such mathematics are mainly provided by libraries that ALSO provide arbitrary precision. So for instance, the Gamma Function is another very common function in statistics. To me, implementing a bundled or core type that utilizes MPFR (or something similar) is as much about getting access to THESE mathematical functions as it is the arbitrary precision aspect. Jordan --000000000000379eb306159bfaf9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Mon, Apr 8, 2024 at 12:23=E2=80=AF= PM Rowan Tommins [IMSoP] <imsop.= php@rwec.co.uk> wrote:
=20 =20 =20

As I mentioned in the discussion about a "scalar arbitrary precision type", the idea of a scalar in this meaning is a non-trivial challenge, as the zval can only store a value that is treated in this way of 64 bits or smaller.


Fortunately, that's not true. If you think about it, that would rule out not only arrays, but any string longer than 8 bytes long!=C2=A0

The way PHP handles this is called "copy-on-write" (COW), = where multiple variables can point to the same zval until one of them needs to write to it, at which point a copy is transparently created.


The pointer for this value would fit in the 64 bits, which is how objects work, but that's also why objects have different semantics for scope than integers. Objects are potentially very large in memory, so we refcount them and pass the pointer into child scopes, instead of copying the value like is done with integers.


Objects are not the only thing that is refcounted. In fact, in PHP 4.x and 5.x, *every* zval used a refcount and COW approach; changing some types to be eagerly copied instead was one of the major performance improvements in the "PHP NG" project whic= h formed the basis of PHP 7.0. You can actually see this in action here: https://3v= 4l.org/oPgr4

This is all completely transparent to the user, as are a bunch of other memory/speed optimisations, like interned string literals, packed arrays, etc.

So, there may be performance gains if we can squeeze values into the zval memory, but it doesn't need to affect the semantics of the new type.

I have mentioned before that my understandi= ng of the deeper aspects of how zvals work is very lacking compared to some= others, so this is very helpful. I was of course aware that strings and ar= rays can be larger than 64 bits, but was under the impression that the hash= table structure in part was responsible for those being somewhat different.= I confess that I do not understand the technical intricacies of the intern= ed strings and packed arrays, I just understand that the zval structure for= these arbitrary precision values would probably be non-trivial, and from w= hat I was able to research and determine that was in part related to the 64= bit zval limit. But thank you for the clarity and the added detail, it'= s always good to learn places where you are mistaken, and this is all extre= mely helpful to know.

This probably relates quite closely to Arvid's point that for a lot of uses, we don't actually need arbitrary precision, just something that can represent small-to-medium decimal numbers without the inaccuracies of binary floating point. That some libraries can be used for both purposes is not necessarily evidence that we could ever "bless" one for both use cases = and make it a single native type.

Honestly, if you need a scale of less than= about 15 and simply want FP error free decimals, BCMath is perfectly adequ= ate for that in most of the use cases I described. The larger issue for a l= ot of these applications is not that they need to calculate 50 digits of ac= curacy and BCMath is too slow, it's that they need non-arithmetic opera= tions, such as sin(), cos(), exp(), vector multiplication, dot products, et= c., while maintaining that low to medium decimal accuracy. libbcmath just d= oesn't support those things, and creating your own implementation of sa= y the sin() function that maintains arbitrary precision is... challenging. = It compounds the performance deficiencies of BCMath exponentially, as you h= ave to break it into many different arithmetic operations.

To me, while being 100x to 1000x more performant at arithmetic is = certainly reason enough on its own, the fact that MPFR (for example) has C = implementations for more complex operations that can be utilized is the rea= l selling point. The ext-stats extension hasn't been maintained since 7= .4. And trig is critical for a lot of stats functions. A fairly common use = of stats, even in applications you might not expect it, is to generate a Ga= ussian Random Number. That is, generate a random number where if you contin= ued generating random numbers from the same generator, they would form a no= rmal distribution (a bell curve), so the random number is weighted accordin= g to the distribution.

The simplest way to do that= is with the sin() and cos() functions (picking a point on a circle). But a= lot of really useful such mathematics are mainly provided by libraries tha= t ALSO provide arbitrary precision. So for instance, the Gamma Function is = another very common function in statistics. To me, implementing a bundled o= r core type that utilizes MPFR (or something similar) is as much about gett= ing access to THESE mathematical functions as it is the arbitrary precision= aspect.

Jordan
--000000000000379eb306159bfaf9--