Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:123045 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 84A931A009C for ; Mon, 8 Apr 2024 19:21:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1712604138; bh=aJg2iC6CjSCtYLuasHxL/w2yHZHPJtvKjOma7Zg2p78=; h=Date:Subject:To:References:From:In-Reply-To:From; b=ggfcueJgtUBMTlUJNkviIhr8bbG8sR4b0ptgjnCM3w52d2/9sO7sr1FPOhgCJHBTJ /tbz1aEOzeJDZbIMXFXBzm6+RzDUBbxz2NDG9n+lK1eu7eVKSzje0BPG9RVYaqnmrh MvhM5kRzrfVj29Vc2I0tWFMofrUn5veIrc65sv2hBY1MULs2dJGsAoePV025ti4E9L pYtLm/HwD+TjGE85xe7jewXo4WnAxzxapxJmLZUN+CGHDfK7vJxUZ5DNQWQ3Y7vdLw FaeorKkzN+oRLYiz2LwNNAWdxfjUWoYLIqpx9iyB3cK4r6KERIZ9AfF1RqZAL4EsaV bOd0F1zkr4I7w== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 1A9D8180088 for ; Mon, 8 Apr 2024 19:22:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from fhigh3-smtp.messagingengine.com (fhigh3-smtp.messagingengine.com [103.168.172.154]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 8 Apr 2024 19:22:16 +0000 (UTC) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailfhigh.nyi.internal (Postfix) with ESMTP id 3DAF0114011F for ; Mon, 8 Apr 2024 15:21:44 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Mon, 08 Apr 2024 15:21:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rwec.co.uk; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1712604104; x=1712690504; bh=ZTaUAGgpc3 3SMt4fNcvX/p95LQNcZy3sNwRagTieloE=; b=BMiVyiGpW9ZGGAjjPZucKPu2mF xJ6IHHdzwjPq8mkoTS0xUp07JJ5itk9boUEE0JsI8TnDtVVYbwNyjshL2RlYPkSJ qouFdOUpTar6VopzH5GhIiw8peMjtsNdOlARee86Ce8lygqew4tyo3l1n5oWnllk EvLYjX3t27GCEV6jbPGt1zjxRrcELVqJhPDxer3R6LmlN0tJajtEBHr2UXn3ehil TSTiQYc/5Fo9oRp/nC5H5ANvjDI/OWyFddtZOlhD2eueIew/0s4aiLI2Eygr8KTC jVQ2F0ru4MZHjywrdn5VqLdDaUXMsB9w/j2TPMQal/jtYXywO+/s2mQ8i2kQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1712604104; x=1712690504; bh=ZTaUAGgpc33SMt4fNcvX/p95LQNc Zy3sNwRagTieloE=; b=mx4gfs6ENQ+Yno9Rz3t5SLAfGslQqg+hG10B15PoMC1L YE2gHrTatULNHE3T4uMRtybirq/hCsZ3Nj/QBi7tB3dcK5SMS7DTl+uR5J2iWs9T PHGhsAvNhPsewPnRaXrw9rmFfLXv1c0kTHGuXoik1xUBQv2nzlEQKBzIIBkHqNPk BpUZGhxFkPFa+Yg3AsEDqo4kVQ00VFmJZn4tnxsza4LsT/7YO6NwUs7BTuWwZzJ0 uqCEaWwxP0LQ9MDXCWYcZWPgZULciQqdojHEcj9aoL7LuZclWvWjETIyvdhYYmxh taSmq0aUh0KiD8kxKCThP5ERlg30wd0353/8WTC+tQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudegiedgudefhecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecuogfuuhhsphgvtghtffhomhgrihhnucdlgeelmd enucfjughrpegtkfffgggfuffvfhfhjgesrgdtreertddvjeenucfhrhhomhepfdftohif rghnucfvohhmmhhinhhsucglkffoufhorfgnfdcuoehimhhsohhprdhphhhpsehrfigvtg drtghordhukheqnecuggftrfgrthhtvghrnhepveekueeuhfevvdfhieehgfdtgfegveff tdethfeuieejueffvdekueduleehheegnecuffhomhgrihhnpegvgihtvghrnhgrlhhsrd hiohdpfehvgehlrdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehm rghilhhfrhhomhepihhmshhophdrphhhphesrhifvggtrdgtohdruhhk X-ME-Proxy: Feedback-ID: id5114917:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Mon, 8 Apr 2024 15:21:43 -0400 (EDT) Content-Type: multipart/alternative; boundary="------------VlTkDBWUHXAv9ovfv8imiw0m" Message-ID: Date: Mon, 8 Apr 2024 20:21:38 +0100 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PHP-DEV] Native decimal scalar support and object types in BcMath - do we want both? To: internals@lists.php.net References: <40553F28-2EC2-475A-BD8E-1D6517AA2A51@rwec.co.uk> <2B518F62-B774-45C9-82A2-EF6653AAE34E@sakiot.com> <0f3d0f89-3064-4d56-9fb2-801bb0cda8a5@rwec.co.uk> Content-Language: en-GB In-Reply-To: From: imsop.php@rwec.co.uk ("Rowan Tommins [IMSoP]") This is a multi-part message in MIME format. --------------VlTkDBWUHXAv9ovfv8imiw0m Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 07/04/2024 23:50, Jordan LeDoux wrote: > By a "scalar" value I mean a value that has the same semantics for > reading, writing, copying, passing-by-value, passing-by-reference, and > passing-by-pointer (how objects behave) as the integer, float, or > boolean types. Right, in that case, it might be more accurate to talk about "value types", since arrays are not generally considered "scalar", but have those same behaviours. And Ilija recently posted a draft proposal for "data classes", which would be object, but also value types: https://externals.io/message/122845 > As I mentioned in the discussion about a "scalar arbitrary precision > type", the idea of a scalar in this meaning is a non-trivial > challenge, as the zval can only store a value that is treated in this > way of 64 bits or smaller. Fortunately, that's not true. If you think about it, that would rule out not only arrays, but any string longer than 8 bytes long! The way PHP handles this is called "copy-on-write" (COW), where multiple variables can point to the same zval until one of them needs to write to it, at which point a copy is transparently created. > The pointer for this value would fit in the 64 bits, which is how > objects work, but that's also why objects have different semantics for > scope than integers. Objects are potentially very large in memory, so > we refcount them and pass the pointer into child scopes, instead of > copying the value like is done with integers. Objects are not the only thing that is refcounted. In fact, in PHP 4.x and 5.x, *every* zval used a refcount and COW approach; changing some types to be eagerly copied instead was one of the major performance improvements in the "PHP NG" project which formed the basis of PHP 7.0. You can actually see this in action here: https://3v4l.org/oPgr4 This is all completely transparent to the user, as are a bunch of other memory/speed optimisations, like interned string literals, packed arrays, etc. So, there may be performance gains if we can squeeze values into the zval memory, but it doesn't need to affect the semantics of the new type. > In general I would say that libbcmath is different enough from other > backends that we should not expect any work on a BCMath implementation > to be utilized in other implementations. It *could* be that we are > able to do that, but it should not be something people *expect* to > happen because of the technical differences. > > Some of the broader language design choices would be transferable > though. For instance, the standard names of various calculation > functions/methods are something that would remain independent, even > with the differences in the implementation. Yes, that makes sense. Even if we don't have an interface, it would be annoying if one class provided $foo->div($bar), and another provided $foo->dividedBy($bar) > For money calculations, scale is always likely to be a more useful > configuration. For mathematical calculations (such as machine learning > applications, which I would say is the other very large use case for > this kind of capability), precision is likely to be the more useful > configuration. Other applications that I have personally encountered > include: simulation and modeling, statistical distributions, and data > analysis. Most of these can be done with fair accuracy without > arbitrary precision, but there are certainly types of applications > that would benefit from or even require arbitrary precision in these > spaces. This probably relates quite closely to Arvid's point that for a lot of uses, we don't actually need arbitrary precision, just something that can represent small-to-medium decimal numbers without the inaccuracies of binary floating point. That some libraries can be used for both purposes is not necessarily evidence that we could ever "bless" one for both use cases and make it a single native type. > My intuition at the moment is that a single number-handling API would > be challenging to do without an actual proposed implementation on the > table for MPDec/MPFR. I think it would certainly be wise to experiment with how each library can interface to the language as an extension, before spending the extra time needed to integrate it as a new zval type. > But even with these extensions available in PHP, they are barely used > by developers at all because (at least in part) of the enormous > difference between PECL and PIP. For PHP, I do not think that > extensions are an adequate substitute like PIP modules are for Python. Yes, this is something of a problem. On the plus side, a library doesn't need to be incorporated into the language to be widely installed, because we have the concept of "bundled" extensions; and in practice, Linux distributions add a few "popular" PECL extensions to their list of installable binary packages. On the minus side, even making it into the "bundled" list doesn't mean it's installed by default everywhere, and userland libraries spend a lot of effort polyfilling things which would ideally be available by default. > This is, essentially, the thesis of the research and work that I have > done in the space since joining the internals mailing list. Thanks, there's some really useful perspective there. Regards, -- Rowan Tommins [IMSoP] --------------VlTkDBWUHXAv9ovfv8imiw0m Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit
On 07/04/2024 23:50, Jordan LeDoux wrote:
By a "scalar" value I mean a value that has the same semantics for reading, writing, copying, passing-by-value, passing-by-reference, and passing-by-pointer (how objects behave) as the integer, float, or boolean types.


Right, in that case, it might be more accurate to talk about "value types", since arrays are not generally considered "scalar", but have those same behaviours. And Ilija recently posted a draft proposal for "data classes", which would be object, but also value types: https://externals.io/message/122845


As I mentioned in the discussion about a "scalar arbitrary precision type", the idea of a scalar in this meaning is a non-trivial challenge, as the zval can only store a value that is treated in this way of 64 bits or smaller.


Fortunately, that's not true. If you think about it, that would rule out not only arrays, but any string longer than 8 bytes long! 

The way PHP handles this is called "copy-on-write" (COW), where multiple variables can point to the same zval until one of them needs to write to it, at which point a copy is transparently created.


The pointer for this value would fit in the 64 bits, which is how objects work, but that's also why objects have different semantics for scope than integers. Objects are potentially very large in memory, so we refcount them and pass the pointer into child scopes, instead of copying the value like is done with integers.


Objects are not the only thing that is refcounted. In fact, in PHP 4.x and 5.x, *every* zval used a refcount and COW approach; changing some types to be eagerly copied instead was one of the major performance improvements in the "PHP NG" project which formed the basis of PHP 7.0. You can actually see this in action here: https://3v4l.org/oPgr4

This is all completely transparent to the user, as are a bunch of other memory/speed optimisations, like interned string literals, packed arrays, etc.

So, there may be performance gains if we can squeeze values into the zval memory, but it doesn't need to affect the semantics of the new type.



In general I would say that libbcmath is different enough from other backends that we should not expect any work on a BCMath implementation to be utilized in other implementations. It *could* be that we are able to do that, but it should not be something people *expect* to happen because of the technical differences.

Some of the broader language design choices would be transferable though. For instance, the standard names of various calculation functions/methods are something that would remain independent, even with the differences in the implementation.


Yes, that makes sense. Even if we don't have an interface, it would be annoying if one class provided $foo->div($bar), and another provided $foo->dividedBy($bar)


For money calculations, scale is always likely to be a more useful configuration. For mathematical calculations (such as machine learning applications, which I would say is the other very large use case for this kind of capability), precision is likely to be the more useful configuration. Other applications that I have personally encountered include: simulation and modeling, statistical distributions, and data analysis. Most of these can be done with fair accuracy without arbitrary precision, but there are certainly types of applications that would benefit from or even require arbitrary precision in these spaces.


This probably relates quite closely to Arvid's point that for a lot of uses, we don't actually need arbitrary precision, just something that can represent small-to-medium decimal numbers without the inaccuracies of binary floating point. That some libraries can be used for both purposes is not necessarily evidence that we could ever "bless" one for both use cases and make it a single native type.


My intuition at the moment is that a single number-handling API would be challenging to do without an actual proposed implementation on the table for MPDec/MPFR.


I think it would certainly be wise to experiment with how each library can interface to the language as an extension, before spending the extra time needed to integrate it as a new zval type.


But even with these extensions available in PHP, they are barely used by developers at all because (at least in part) of the enormous difference between PECL and PIP. For PHP, I do not think that extensions are an adequate substitute like PIP modules are for Python.


Yes, this is something of a problem. On the plus side, a library doesn't need to be incorporated into the language to be widely installed, because we have the concept of "bundled" extensions; and in practice, Linux distributions add a few "popular" PECL extensions to their list of installable binary packages. On the minus side, even making it into the "bundled" list doesn't mean it's installed by default everywhere, and userland libraries spend a lot of effort polyfilling things which would ideally be available by default.


This is, essentially, the thesis of the research and work that I have done in the space since joining the internals mailing list.


Thanks, there's some really useful perspective there.

Regards,

-- 
Rowan Tommins
[IMSoP]
--------------VlTkDBWUHXAv9ovfv8imiw0m--