Hello,
There are currently two proposals being discussed - /native decimal
scalar type support/ and /Support object type in BCMath/
I've been getting involved in the discussion for the BCMath proposal,
but not paying as much attention to the native decimal thread.
But these seem like very similar things, so I'm wondering whether or not
it makes sense to do both at once. They both seem like ways to represent
and calculate with arbitrary precision decimal numbers.
I'm not sure if they have distinct use cases. Are there some tasks where
people would likely prefer one, and different tasks where they would
prefer the other? Or should PHP internals choose just one of these
options instead of potentially releasing both? It doesn't seem like a
good idea to have two directly competing features for the same use case
in one PHP release, unless there's a reason to favor each one in a
different situation.
Best wishes,
Barney
Hi Barney,
There are currently two proposals being discussed - native decimal scalar type support and Support object type in BCMath
I've been getting involved in the discussion for the BCMath proposal, but not paying as much attention to the native decimal thread.
But these seem like very similar things, so I'm wondering whether or not it makes sense to do both at once. They both seem like ways to represent and calculate with arbitrary precision decimal numbers.
I'm not sure if they have distinct use cases. Are there some tasks where people would likely prefer one, and different tasks where they would prefer the other? Or should PHP internals choose just one of these options instead of potentially releasing both? It doesn't seem like a good idea to have two directly competing features for the same use case in one PHP release, unless there's a reason to favor each one in a different situation.
(I'm the proposer on the BCMath thread, so my opinion may be a bit biased.)
The "areas" being discussed are certainly close. However, I believe that the goals of the proposals and the time and effort required to realize them will vary greatly.
Regards.
Saki
On Sat, Apr 6, 2024 at 4:07 AM Barney Laurance barney@redmagic.org.uk
wrote:
Hello,
There are currently two proposals being discussed - native decimal
scalar type support and Support object type in BCMathI've been getting involved in the discussion for the BCMath proposal, but
not paying as much attention to the native decimal thread.But these seem like very similar things, so I'm wondering whether or not
it makes sense to do both at once. They both seem like ways to represent
and calculate with arbitrary precision decimal numbers.I'm not sure if they have distinct use cases. Are there some tasks where
people would likely prefer one, and different tasks where they would prefer
the other? Or should PHP internals choose just one of these options instead
of potentially releasing both? It doesn't seem like a good idea to have two
directly competing features for the same use case in one PHP release,
unless there's a reason to favor each one in a different situation.Best wishes,
Barney
The scalar arbitrary precision discussion is for an implementation that
would be in the range of 100x to 1000x faster than BCMath. No matter what
improvements are made to BCMath, there will still be strong arguments for
it, and until someone actually puts together an RFC, the BCMath library is
the only thing around.
Internals is just volunteers. The people working on BCMath are doing that
because they want to, the people working on scalar decimal stuff are doing
that because they want to, and there's no project planning to tell one
group to stop. That's not how internals works (to the extent it works).
Jordan
Internals is just volunteers. The people working on BCMath are doing that
because they want to, the people working on scalar decimal stuff are doing
that because they want to, and there's no project planning to tell one
group to stop. That's not how internals works (to the extent it works).
I kind of disagree. You're absolutely right the detailed effort is almost always put in by people working on things that interest them, and I want to make clear up front that I'm extremely grateful to the amount of effort people do volunteer, given how few are paid to work on any of this.
However, the goal of the Internals community as a whole is to choose what changes to make to a language which is used by millions of people. That absolutely involves project planning, because there isn't a marketplace of PHP forks with different competing features, and once a feature is added it's very hard to remove it or change its design.
If - and I stress I'm not saying this is true - IF these two features have such an overlap that we would only want to release one, then we shouldn't just accept whichever is ready first, we should choose which is the better solution overall. And if that was the case, why would we wait for a polished implementation of both, then tell one group of volunteers that all their hard work had been a waste of time?
So I think the question is very valid: do these two features have distinct use cases, such that even if we had one, we would still want to spend time on the other? Or, should we decide a strategy for both groups to work together towards a single goal?
That's not about "telling one group to stop", it's about working together for the benefit of both users and the people volunteering their effort, to whom I am extremely grateful.
Regards,
Rowan Tommins
[IMSoP]
Internals is just volunteers. The people working on BCMath are doing that
because they want to, the people working on scalar decimal stuff are doing
that because they want to, and there's no project planning to tell one
group to stop. That's not how internals works (to the extent it works).
I kind of disagree. You're absolutely right the detailed effort is almost always put in by people working on things that interest them, and I want to make clear up front that I'm extremely grateful to the amount of effort people do volunteer, given how few are paid to work on any of this.However, the goal of the Internals community as a whole is to choose what changes to make to a language which is used by millions of people. That absolutely involves project planning, because there isn't a marketplace of PHP forks with different competing features, and once a feature is added it's very hard to remove it or change its design.
If - and I stress I'm not saying this is true - IF these two features have such an overlap that we would only want to release one, then we shouldn't just accept whichever is ready first, we should choose which is the better solution overall. And if that was the case, why would we wait for a polished implementation of both, then tell one group of volunteers that all their hard work had been a waste of time?
So I think the question is very valid: do these two features have distinct use cases, such that even if we had one, we would still want to spend time on the other? Or, should we decide a strategy for both groups to work together towards a single goal?
That's not about "telling one group to stop", it's about working together for the benefit of both users and the people volunteering their effort, to whom I am extremely grateful.
Yes, I was going to say the same thing as Rowan. But also Jordan has
shown that there's at least one advantage to each proposal - one would
be much more performant, one would might be releasable a lot sooner.
That's a possible reason to keep both.
Hi Rowan,
Internals is just volunteers. The people working on BCMath are doing that
because they want to, the people working on scalar decimal stuff are doing
that because they want to, and there's no project planning to tell one
group to stop. That's not how internals works (to the extent it works).I kind of disagree. You're absolutely right the detailed effort is almost always put in by people working on things that interest them, and I want to make clear up front that I'm extremely grateful to the amount of effort people do volunteer, given how few are paid to work on any of this.
However, the goal of the Internals community as a whole is to choose what changes to make to a language which is used by millions of people. That absolutely involves project planning, because there isn't a marketplace of PHP forks with different competing features, and once a feature is added it's very hard to remove it or change its design.
If - and I stress I'm not saying this is true - IF these two features have such an overlap that we would only want to release one, then we shouldn't just accept whichever is ready first, we should choose which is the better solution overall. And if that was the case, why would we wait for a polished implementation of both, then tell one group of volunteers that all their hard work had been a waste of time?
So I think the question is very valid: do these two features have distinct use cases, such that even if we had one, we would still want to spend time on the other? Or, should we decide a strategy for both groups to work together towards a single goal?
That's not about "telling one group to stop", it's about working together for the benefit of both users and the people volunteering their effort, to whom I am extremely grateful.
Regards,
Rowan Tommins
[IMSoP]
I don't think the two threads can be combined because they have different goals. If one side of the argument was, "How about to add BCMath?" then perhaps we should merge the discussion. But BCMath already exists and the agenda is to add an OOP API.
In other words, one is about adding new features, and the other is about improving existing features.
I agree that it would be wise to merge issues that can be merged.
Regards.
Saki
I don't think the two threads can be combined because they have different goals. If one side of the argument was, "How about to add BCMath?" then perhaps we should merge the discussion. But BCMath already exists and the agenda is to add an OOP API.
In other words, one is about adding new features, and the other is about improving existing features.
While I appreciate that that was the original aim, a lot of the discussion at the moment isn't really about BCMath at all, it's about how to define a fixed-precision number type. For instance, how to specify precision and rounding for operations like division. I haven't seen anywhere in the discussion where the answer was "that's how it already works, and we're not adding new features".
Is there anything in the proposal which would actually be different if it was based on a different library, and if not, should we be designing a NumberInterface which multiple extensions could implement? Then Jordan's search for a library with better performance could lead to new extensions implementing that interface, even if they have portability or licensing problems that make them awkward to bundle in core.
Finally, there's the separate discussion about making a new "scalar type". As I said in a previous email, I'm not really sure what "scalar" means in this context, so maybe "integrating the type more directly into the language" is a better description? That includes memory/copying optimisation (potentially linked to Ilija's work on data classes), initialisation syntax (which could be a general feature), and accepting the type in existing functions (something frequently requested for custom array-like types).
In other words, looking at how the efforts overlap doesn't have to mean abandoning one of them, it can mean finding how one can benefit the other.
Regards,
Rowan Tommins
[IMSoP]
Hi Rowan,
While I appreciate that that was the original aim, a lot of the discussion at the moment isn't really about BCMath at all, it's about how to define a fixed-precision number type. For instance, how to specify precision and rounding for operations like division. I haven't seen anywhere in the discussion where the answer was "that's how it already works, and we're not adding new features".
Is there anything in the proposal which would actually be different if it was based on a different library, and if not, should we be designing a NumberInterface which multiple extensions could implement? Then Jordan's search for a library with better performance could lead to new extensions implementing that interface, even if they have portability or licensing problems that make them awkward to bundle in core.
Finally, there's the separate discussion about making a new "scalar type". As I said in a previous email, I'm not really sure what "scalar" means in this context, so maybe "integrating the type more directly into the language" is a better description? That includes memory/copying optimisation (potentially linked to Ilija's work on data classes), initialisation syntax (which could be a general feature), and accepting the type in existing functions (something frequently requested for custom array-like types).
In other words, looking at how the efforts overlap doesn't have to mean abandoning one of them, it can mean finding how one can benefit the other.
I agree that the essence of the debate is as you say.
However, an argument must always reach a conclusion based on its purpose, and combining two arguments with different purposes can make it unclear how to reach a conclusion.
If we were to merge these two debates, what should be on our agenda? It would probably be reasonable to have a limited joint discussion on the common point between the two arguments, namely, "how to express numbers," and then return to each of their own arguments.
However, it is not desirable for the venue for discussion to change depending on the content of the discussion, so I think it will be difficult to integrate them.
Your hope is probably that by combining the discussions, better ideas will emerge. IMHO, that should really be a "new discussion", perhaps in this thread where we're talking now.
Regards.
Saki
In other words, looking at how the efforts overlap doesn't have to mean abandoning one of them, it can mean finding how one can benefit the other.
I agree that the essence of the debate is as you say.
However, an argument must always reach a conclusion based on its purpose, and combining two arguments with different purposes can make it unclear how to reach a conclusion.
Well, that's the original question: are they actually different purposes, from the point of view of a user?
I just gave a concrete suggestion, which didn't involve "combining two arguments", it involved splitting them up into three projects which all complement each other.
It feels like both you and Jordan feel the need to defend the work you've put in so far, which is a shame; as a neutral party, I want to benefit from both of your efforts. It really doesn't matter to me how many mailing list threads that requires, as long as there aren't two teams making conflicting designs for the same feature.
Regards,
Rowan Tommins
[IMSoP]
On Sun, Apr 7, 2024 at 8:27 AM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
In other words, looking at how the efforts overlap doesn't have to mean
abandoning one of them, it can mean finding how one can benefit the other.I agree that the essence of the debate is as you say.
However, an argument must always reach a conclusion based on its purpose,
and combining two arguments with different purposes can make it unclear how
to reach a conclusion.Well, that's the original question: are they actually different purposes,
from the point of view of a user?I just gave a concrete suggestion, which didn't involve "combining two
arguments", it involved splitting them up into three projects which all
complement each other.It feels like both you and Jordan feel the need to defend the work you've
put in so far, which is a shame; as a neutral party, I want to benefit from
both of your efforts. It really doesn't matter to me how many mailing
list threads that requires, as long as there aren't two teams making
conflicting designs for the same feature.Regards,
Rowan Tommins
[IMSoP]
Eh, my first reply wasn't really about defending anything. It was to
inform. I have been doing small bits of work, research, and investigation
into an MPDec or MPFR implementation for years, and I'm likely to continue
doing my research on that regardless of whatever is discussed in this
thread.
Rowan, my point wasn't so much that a discussion like this one is
pointless, it was that MOST of the people who actually vote on RFCs don't
reply at all to internals, so a discussion like this actually does not help
anyone understand the opinion of MOST of the people that actually need to
be convinced. We hope the discussion is representative of the people who do
not engage with it, but we don't know for sure.
In any case, an alternative implementation using MPDec/MPFR probably can't
be done until 9.0 at the earliest, but Saki's improvements to BCMath are
ready to merge now, essentially.
Is there anything in the proposal which would actually be different if
it was based on a different library
Yes. BCMath uses fixed-scale, all the other libraries use fixed-precision.
That is, the other libraries use a fixed number of significant digits,
while BCMath uses a fixed number of digits after the decimal point. So, for
instance, it would not actually be possible without manual rounding in the
PHP implementation to force exactly 2 decimal digits of accuracy in the
result and no more with MPDec. The idea of money, for instance, wanting
exactly two digits would require the implementation to round, because
something like 0.00000013 has two digits of precision, which is what
MPDec uses, but it has 8 digits of scale which is what BCMath uses.
Jordan
I have been doing small bits of work, research, and investigation into
an MPDec or MPFR implementation for years, and I'm likely to continue
doing my research on that regardless of whatever is discussed in this
thread.
I absolutely encourage you to do that. What I'm hoping is that you can
share some of what you already know now, so that while we're discussing
BCMath\Number, we can think ahead a bit to what other similar APIs we
might build in the future. The below seems to be exactly that.
Yes. BCMath uses fixed-scale, all the other libraries use
fixed-precision. That is, the other libraries use a fixed number of
significant digits, while BCMath uses a fixed number of digits after
the decimal point.
That seems like a significant difference indeed, and one that is
potentially far more important than whether we build an OO wrapper or a
"scalar" one.
So, for instance, it would not actually be possible without manual
rounding in the PHP implementation to force exactly 2 decimal digits
of accuracy in the result and no more with MPDec.
The current BCMath proposal is to mostly choose the scale calculations
automatically, and to give precise control of rounding. Neither of those
are implemented in libbcmath, which requires an explicit scale, and
simply truncates the result at that point.
That's why I said that the proposal isn't really about "an OO wrapper
for BCMath" any more, it's a fairly generic Number API, with libbcmath
as the back-end which we currently have available. So thinking about
what other back-ends we might build with the same or similar wrappers is
useful and relevant.
The idea of money, for instance, wanting exactly two digits would
require the implementation to round, because something like 0.00000013
has two digits of precision, which is what MPDec uses, but it has 8
digits of scale which is what BCMath uses.
This brings us back to what the use cases are we're trying to cover with
these wrappers.
The example of fixed-scale money is not just a small niche that I happen
to know about: brick/money has 16k stars on GitHub, and 18 million
installs on Packagist; moneyphp/money has 4.5k stars and 45 million
installs; one has implementations based on plain PHP, GMP, and BCMath;
the other has a hard dependency on BCMath.
Presumably, there are other use cases where working with precision
rather than scale is essential, maybe just as popular (or that could be
just as popular, if they could be implemented better).
In which case, should we be designing a NumberInterface that provides
both, with BCMath having a custom (and maybe slow) implementation for
round-to-precision, and MPDec/MPFR having a custom (and maybe slow)
implementation for round-to-scale?
Or, should we abandon the idea of having one preferred number-handling
API (whether that's NumberInterface or a core decimal type), because no
implementation could handle both use cases?
Regards,
--
Rowan Tommins
[IMSoP]
On Sun, Apr 7, 2024 at 2:45 PM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
I have been doing small bits of work, research, and investigation into
an MPDec or MPFR implementation for years, and I'm likely to continue
doing my research on that regardless of whatever is discussed in this
thread.I absolutely encourage you to do that. What I'm hoping is that you can
share some of what you already know now, so that while we're discussing
BCMath\Number, we can think ahead a bit to what other similar APIs we
might build in the future. The below seems to be exactly that.Yes. BCMath uses fixed-scale, all the other libraries use
fixed-precision. That is, the other libraries use a fixed number of
significant digits, while BCMath uses a fixed number of digits after
the decimal point.That seems like a significant difference indeed, and one that is
potentially far more important than whether we build an OO wrapper or a
"scalar" one.
By a "scalar" value I mean a value that has the same semantics for reading,
writing, copying, passing-by-value, passing-by-reference, and
passing-by-pointer (how objects behave) as the integer, float, or boolean
types. As I mentioned in the discussion about a "scalar arbitrary precision
type", the idea of a scalar in this meaning is a non-trivial challenge, as
the zval can only store a value that is treated in this way of 64 bits or
smaller. However, the actual numerical value that is used by every single
one of these libraries is not guaranteed to be 64 bits or smaller, and for
some of them is in fact guaranteed to be larger.
The pointer for this value would fit in the 64 bits, which is how objects
work, but that's also why objects have different semantics for scope than
integers. Objects are potentially very large in memory, so we refcount them
and pass the pointer into child scopes, instead of copying the value like
is done with integers.
Both this and the precision/scale question are pretty significant design
questions and choices. While the arbitrary precision values of these
libraries will not fit inside a zval, they are on average smaller than PHP
objects in memory, so it may not be a significant problem to eagerly copy
them like we do with integers. However, if that is not the route that is
taken, they could end up having scoping semantics that are similar to
objects, even if we don't give them a full class entry with a constructor,
properties, etc. This is part of the reason that, for example, the
ext-decimal implementation which uses the MPDec library represents these
numbers as an object with a fluent interface.
So, for instance, it would not actually be possible without manual
rounding in the PHP implementation to force exactly 2 decimal digits
of accuracy in the result and no more with MPDec.The current BCMath proposal is to mostly choose the scale calculations
automatically, and to give precise control of rounding. Neither of those
are implemented in libbcmath, which requires an explicit scale, and
simply truncates the result at that point.That's why I said that the proposal isn't really about "an OO wrapper
for BCMath" any more, it's a fairly generic Number API, with libbcmath
as the back-end which we currently have available. So thinking about
what other back-ends we might build with the same or similar wrappers is
useful and relevant.
In general I would say that libbcmath is different enough from other
backends that we should not expect any work on a BCMath implementation to
be utilized in other implementations. It could be that we are able to do
that, but it should not be something people expect to happen because of
the technical differences.
Some of the broader language design choices would be transferable though.
For instance, the standard names of various calculation functions/methods
are something that would remain independent, even with the differences in
the implementation.
The idea of money, for instance, wanting exactly two digits would
require the implementation to round, because something like 0.00000013
has two digits of precision, which is what MPDec uses, but it has 8
digits of scale which is what BCMath uses.This brings us back to what the use cases are we're trying to cover with
these wrappers.The example of fixed-scale money is not just a small niche that I happen
to know about: brick/money has 16k stars on GitHub, and 18 million
installs on Packagist; moneyphp/money has 4.5k stars and 45 million
installs; one has implementations based on plain PHP, GMP, and BCMath;
the other has a hard dependency on BCMath.Presumably, there are other use cases where working with precision
rather than scale is essential, maybe just as popular (or that could be
just as popular, if they could be implemented better).In which case, should we be designing a NumberInterface that provides
both, with BCMath having a custom (and maybe slow) implementation for
round-to-precision, and MPDec/MPFR having a custom (and maybe slow)
implementation for round-to-scale?Or, should we abandon the idea of having one preferred number-handling
API (whether that's NumberInterface or a core decimal type), because no
implementation could handle both use cases?
The implementation for round-to-precision for BCMath would be much slower
than the implementation for round-to-scale for MPDec/MPFR, even if the
underlying calculations were done at the same performance. The main
challenge for the precision vs. scale issue is that precision also
includes the integer part for some implementations, while scale does not.
But in general, it is easier to over-calculate using precision and then
round/truncate to scale, then it is to calculate with scale not knowing
until the calculation has been completed what your precision will be (for
some kinds of calculations).
The actual underlying math of the library is easier with scale than it is
with precision. So, for instance, with a scale of 3, the minimum meaningful
difference between two values is 0.001, so you can simply continue your
calculation until the calculated error is less than this value.
Fortunately, using libraries means that these underlying mathematical
implementations do not need to be struggled with in whatever PHP
implementation we do for either.
My intuition at the moment is that a single number-handling API would be
challenging to do without an actual proposed implementation on the table
for MPDec/MPFR. The best we can do at the moment is probably reference
Rudi's implementation in ext-decimal.
For money calculations, scale is always likely to be a more useful
configuration. For mathematical calculations (such as machine learning
applications, which I would say is the other very large use case for this
kind of capability), precision is likely to be the more useful
configuration. Other applications that I have personally encountered
include: simulation and modeling, statistical distributions, and data
analysis. Most of these can be done with fair accuracy without arbitrary
precision, but there are certainly types of applications that would benefit
from or even require arbitrary precision in these spaces.
PHP at the moment is not a language that has many applications in these
spaces, with most of the actual use cases being money. My view is that this
is driven by the language features, not because PHP is ill-suited to these
applications or that developers using PHP have no applications that would
benefit from these other capabilities. There are an array of features that
are available readily in Python that are heavily used for these sorts of
applications that PHP lacks, however PHP offers several things that might
make it the more attractive option if it had similar mathematical features.
PHP is, in general, faster than Python and more performant in the areas
where the language is being used directly. However, Python allows direct
access to some very low level capabilities that are implemented directly in
C, and in those areas it certainly has an edge.
For instance, Python has modules that allow for direct off-loading to GPUs.
It has an extensive and performant set of mathematical libraries in NumPy
and SciPy that make interacting with complex mathematics seamless and fast.
PHP actually also has an extension to allow off-loading to a GPU, though
even with extensions it has very little mathematical library support. The
ext-decimal extension is probably the best example, and it has almost
nothing beyond basic arithmetic. Python also has userspace operator
overloading for its objects, which are extremely useful for all of these
spaces, but that's an entirely different discussion than what we are
talking about here.
But even with these extensions available in PHP, they are barely used by
developers at all because (at least in part) of the enormous difference
between PECL and PIP. For PHP, I do not think that extensions are an
adequate substitute like PIP modules are for Python.
This is, essentially, the thesis of the research and work that I have done
in the space since joining the internals mailing list.
Jordan
By a "scalar" value I mean a value that has the same semantics for
reading, writing, copying, passing-by-value, passing-by-reference, and
passing-by-pointer (how objects behave) as the integer, float, or
boolean types.
Right, in that case, it might be more accurate to talk about "value
types", since arrays are not generally considered "scalar", but have
those same behaviours. And Ilija recently posted a draft proposal for
"data classes", which would be object, but also value types:
https://externals.io/message/122845
As I mentioned in the discussion about a "scalar arbitrary precision
type", the idea of a scalar in this meaning is a non-trivial
challenge, as the zval can only store a value that is treated in this
way of 64 bits or smaller.
Fortunately, that's not true. If you think about it, that would rule out
not only arrays, but any string longer than 8 bytes long!
The way PHP handles this is called "copy-on-write" (COW), where multiple
variables can point to the same zval until one of them needs to write to
it, at which point a copy is transparently created.
The pointer for this value would fit in the 64 bits, which is how
objects work, but that's also why objects have different semantics for
scope than integers. Objects are potentially very large in memory, so
we refcount them and pass the pointer into child scopes, instead of
copying the value like is done with integers.
Objects are not the only thing that is refcounted. In fact, in PHP 4.x
and 5.x, every zval used a refcount and COW approach; changing some
types to be eagerly copied instead was one of the major performance
improvements in the "PHP NG" project which formed the basis of PHP 7.0.
You can actually see this in action here: https://3v4l.org/oPgr4
This is all completely transparent to the user, as are a bunch of other
memory/speed optimisations, like interned string literals, packed
arrays, etc.
So, there may be performance gains if we can squeeze values into the
zval memory, but it doesn't need to affect the semantics of the new type.
In general I would say that libbcmath is different enough from other
backends that we should not expect any work on a BCMath implementation
to be utilized in other implementations. It could be that we are
able to do that, but it should not be something people expect to
happen because of the technical differences.Some of the broader language design choices would be transferable
though. For instance, the standard names of various calculation
functions/methods are something that would remain independent, even
with the differences in the implementation.
Yes, that makes sense. Even if we don't have an interface, it would be
annoying if one class provided $foo->div($bar), and another provided
$foo->dividedBy($bar)
For money calculations, scale is always likely to be a more useful
configuration. For mathematical calculations (such as machine learning
applications, which I would say is the other very large use case for
this kind of capability), precision is likely to be the more useful
configuration. Other applications that I have personally encountered
include: simulation and modeling, statistical distributions, and data
analysis. Most of these can be done with fair accuracy without
arbitrary precision, but there are certainly types of applications
that would benefit from or even require arbitrary precision in these
spaces.
This probably relates quite closely to Arvid's point that for a lot of
uses, we don't actually need arbitrary precision, just something that
can represent small-to-medium decimal numbers without the inaccuracies
of binary floating point. That some libraries can be used for both
purposes is not necessarily evidence that we could ever "bless" one for
both use cases and make it a single native type.
My intuition at the moment is that a single number-handling API would
be challenging to do without an actual proposed implementation on the
table for MPDec/MPFR.
I think it would certainly be wise to experiment with how each library
can interface to the language as an extension, before spending the extra
time needed to integrate it as a new zval type.
But even with these extensions available in PHP, they are barely used
by developers at all because (at least in part) of the enormous
difference between PECL and PIP. For PHP, I do not think that
extensions are an adequate substitute like PIP modules are for Python.
Yes, this is something of a problem. On the plus side, a library doesn't
need to be incorporated into the language to be widely installed,
because we have the concept of "bundled" extensions; and in practice,
Linux distributions add a few "popular" PECL extensions to their list of
installable binary packages. On the minus side, even making it into the
"bundled" list doesn't mean it's installed by default everywhere, and
userland libraries spend a lot of effort polyfilling things which would
ideally be available by default.
This is, essentially, the thesis of the research and work that I have
done in the space since joining the internals mailing list.
Thanks, there's some really useful perspective there.
Regards,
--
Rowan Tommins
[IMSoP]
On Mon, Apr 8, 2024 at 12:23 PM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
As I mentioned in the discussion about a "scalar arbitrary precision
type", the idea of a scalar in this meaning is a non-trivial challenge, as
the zval can only store a value that is treated in this way of 64 bits or
smaller.Fortunately, that's not true. If you think about it, that would rule out
not only arrays, but any string longer than 8 bytes long!The way PHP handles this is called "copy-on-write" (COW), where multiple
variables can point to the same zval until one of them needs to write to
it, at which point a copy is transparently created.The pointer for this value would fit in the 64 bits, which is how objects
work, but that's also why objects have different semantics for scope than
integers. Objects are potentially very large in memory, so we refcount them
and pass the pointer into child scopes, instead of copying the value like
is done with integers.Objects are not the only thing that is refcounted. In fact, in PHP 4.x and
5.x, every zval used a refcount and COW approach; changing some types to
be eagerly copied instead was one of the major performance improvements in
the "PHP NG" project which formed the basis of PHP 7.0. You can actually
see this in action here: https://3v4l.org/oPgr4This is all completely transparent to the user, as are a bunch of other
memory/speed optimisations, like interned string literals, packed arrays,
etc.So, there may be performance gains if we can squeeze values into the zval
memory, but it doesn't need to affect the semantics of the new type.
I have mentioned before that my understanding of the deeper aspects of how
zvals work is very lacking compared to some others, so this is very
helpful. I was of course aware that strings and arrays can be larger than
64 bits, but was under the impression that the hashtable structure in part
was responsible for those being somewhat different. I confess that I do not
understand the technical intricacies of the interned strings and packed
arrays, I just understand that the zval structure for these arbitrary
precision values would probably be non-trivial, and from what I was able to
research and determine that was in part related to the 64bit zval limit.
But thank you for the clarity and the added detail, it's always good to
learn places where you are mistaken, and this is all extremely helpful to
know.
This probably relates quite closely to Arvid's point that for a lot of
uses, we don't actually need arbitrary precision, just something that can
represent small-to-medium decimal numbers without the inaccuracies of
binary floating point. That some libraries can be used for both purposes is
not necessarily evidence that we could ever "bless" one for both use cases
and make it a single native type.
Honestly, if you need a scale of less than about 15 and simply want FP
error free decimals, BCMath is perfectly adequate for that in most of the
use cases I described. The larger issue for a lot of these applications is
not that they need to calculate 50 digits of accuracy and BCMath is too
slow, it's that they need non-arithmetic operations, such as sin()
, cos()
,
exp()
, vector multiplication, dot products, etc., while maintaining that
low to medium decimal accuracy. libbcmath just doesn't support those
things, and creating your own implementation of say the sin()
function that
maintains arbitrary precision is... challenging. It compounds the
performance deficiencies of BCMath exponentially, as you have to break it
into many different arithmetic operations.
To me, while being 100x to 1000x more performant at arithmetic is certainly
reason enough on its own, the fact that MPFR (for example) has C
implementations for more complex operations that can be utilized is the
real selling point. The ext-stats extension hasn't been maintained since
7.4. And trig is critical for a lot of stats functions. A fairly common use
of stats, even in applications you might not expect it, is to generate a
Gaussian Random Number. That is, generate a random number where if you
continued generating random numbers from the same generator, they would
form a normal distribution (a bell curve), so the random number is weighted
according to the distribution.
The simplest way to do that is with the sin()
and cos()
functions (picking
a point on a circle). But a lot of really useful such mathematics are
mainly provided by libraries that ALSO provide arbitrary precision. So for
instance, the Gamma Function is another very common function in statistics.
To me, implementing a bundled or core type that utilizes MPFR (or something
similar) is as much about getting access to THESE mathematical functions as
it is the arbitrary precision aspect.
Jordan
Hi Jordan,
To me, while being 100x to 1000x more performant at arithmetic is certainly reason enough on its own, the fact that MPFR (for example) has C implementations for more complex operations that can be utilized is the real selling point. The ext-stats extension hasn't been maintained since 7.4. And trig is critical for a lot of stats functions. A fairly common use of stats, even in applications you might not expect it, is to generate a Gaussian Random Number. That is, generate a random number where if you continued generating random numbers from the same generator, they would form a normal distribution (a bell curve), so the random number is weighted according to the distribution.
The simplest way to do that is with the
sin()
andcos()
functions (picking a point on a circle). But a lot of really useful such mathematics are mainly provided by libraries that ALSO provide arbitrary precision. So for instance, the Gamma Function is another very common function in statistics. To me, implementing a bundled or core type that utilizes MPFR (or something similar) is as much about getting access to THESE mathematical functions as it is the arbitrary precision aspect.
As you say, BCMath is really barebones and slower than other libraries. It would be nice if there was a universal math extension that could handle all use cases, but unfortunately there isn't one today.
The biggest problem is right there: there are several math functions, and they have slightly different characteristics.
If could combine all of these to create a new math function without sacrificing the benefits of each, do you think that would be possible? (It doesn't matter what libraries or technologies use internally.)
To be honest, whenever I bring up the topic of BCMath on the mailing list, there are always references to speed and other libraries, so many people probably want that, but unfortunately, we probably don't have a common idea about the specifics.
If what I write is off-topic and not appropriate for this thread, I can start a new thread.
Regards.
Saki
I have mentioned before that my understanding of the deeper aspects of how
zvals work is very lacking compared to some others, so this is very
helpful.
My own knowledge definitely has gaps and errors, and comes mostly from introductions like https://www.phpinternalsbook.com/ and in this case Nikita's blog articles about the changes in 7.0: https://www.npopov.com/2015/05/05/Internal-value-representation-in-PHP-7-part-1.html
I confess that I do not
understand the technical intricacies of the interned strings and packed
arrays, I just understand that the zval structure for these arbitrary
precision values would probably be non-trivial, and from what I was able to
research and determine that was in part related to the 64bit zval limit.
From previous discussions, I gather that the hardest part of implementing a new zval type is probably not the memory structure itself - that will mostly be handled in a few key functions and macros - but the sheer number of places that do something different with each zval type and will need updating. Searching for Z_TYPE_P, which is just one of the macros used for that purpose, shows over 200 lines to check: https://heap.space/search?project=php-src&full=Z_TYPE_P&defs=&refs=&path=&hist=&type=c
That's why it's so much easier to wrap a new type in an object, because then all of those code paths are considered for you, you just have a fixed set of handlers to implement. If Ilija's "data classes" proposal progresses, you'll be able to have copy-on-write for free as well.
Regards,
Rowan Tommins
[IMSoP]
On Tue, 9 Apr 2024 at 09:57, Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
On 8 April 2024 21:51:46 BST, Jordan LeDoux jordan.ledoux@gmail.com
wrote:I have mentioned before that my understanding of the deeper aspects of how
zvals work is very lacking compared to some others, so this is very
helpful.My own knowledge definitely has gaps and errors, and comes mostly from
introductions like https://www.phpinternalsbook.com/ and in this case
Nikita's blog articles about the changes in 7.0:
https://www.npopov.com/2015/05/05/Internal-value-representation-in-PHP-7-part-1.htmlI confess that I do not
understand the technical intricacies of the interned strings and packed
arrays, I just understand that the zval structure for these arbitrary
precision values would probably be non-trivial, and from what I was able
to
research and determine that was in part related to the 64bit zval limit.From previous discussions, I gather that the hardest part of implementing
a new zval type is probably not the memory structure itself - that will
mostly be handled in a few key functions and macros - but the sheer number
of places that do something different with each zval type and will need
updating. Searching for Z_TYPE_P, which is just one of the macros used for
that purpose, shows over 200 lines to check:
https://heap.space/search?project=php-src&full=Z_TYPE_P&defs=&refs=&path=&hist=&type=cThat's why it's so much easier to wrap a new type in an object, because
then all of those code paths are considered for you, you just have a fixed
set of handlers to implement. If Ilija's "data classes" proposal
progresses, you'll be able to have copy-on-write for free as well.Regards,
Rowan Tommins
[IMSoP]
So I'd like to conclude this thread since we have dedicated threads for
each of the topics here.
In my opinion, we should go with both. Both topics cover quite different
things. Personally, I'm not really interested in BCMath part as much
because I do not see me needing it (never did before, don't force me
getting into an industry where it would be required to have numbers that
large). But I am interested in a native decimal that would cover the vast
majority of the uses and be on part with integer and float number types.
If my stance makes sense, I then shall join the native decimal thread and
continue there.
--
Arvīds Godjuks
+371 26 851 664
arvids.godjuks@gmail.com
Telegram: @psihius https://t.me/psihius
Hi Rowan,
Well, that's the original question: are they actually different purposes, from the point of view of a user?
The purpose of both is different from a detailed perspective. Addition of native types and addition of OOP API. But if think about the purpose from a broader perspective, it's probably the same. However, such discussions with different perspectives should be separate discussions.
I just gave a concrete suggestion, which didn't involve "combining two arguments", it involved splitting them up into three projects which all complement each other.
Oh, excuse me. I thought I was talking about the case of "integrating" the discussion, so I didn't go into that. If the discussion diverges and a new one starts, I'll be happy to join it.
I don't think NumberInterface should be provided. This is because the signature required by the internal implementation of a class may vary from library to library. If prepare such an interface at this time, it may cause problems when adding classes of the same family in the future.
It feels like both you and Jordan feel the need to defend the work you've put in so far, which is a shame; as a neutral party, I want to benefit from both of your efforts. It really doesn't matter to me how many mailing list threads that requires, as long as there aren't two teams making conflicting designs for the same feature.
The point is that for me the purposes of the two arguments are different, and for you they are the same. Although I am opposed to mixing up discussions with different objectives, I am in favor of starting new and separate discussions on common topics. So perhaps you feel like I'm "defend" the work.
If decimal is somehow introduced into php, whether BCMath will survive or be deprecated is another discussion. If it's deprecated, it's probably just unbundled, not suddenly gone. In other words, no matter what the conclusion of the decimal debate is, BCMath will continue to exist in some form due to compatibility issues, so there is no reason not to improve it, at least for now.
Regards.
Saki
Hello everyone, I've been following the discussion threads and forming my
own opinion to share since I have done a bunch of financial stuff
throughout my career: I did the integers only at the application level and
DECIMAL(20,8) in the database due to handling Bitcoin, Litecoin, etc.
My feeling on the discussion is that it got seriously sidetracked from the
core tenet of what is actually needed from the PHP engine/core to
discussing developing/upgrading a library that can handle money, scientific
calculations, yada yada yada. Basically, in my view, the discussion has
been catastrophically scope-creeped to a point where nobody could agree on
anything and discuss things that were irrelevant to the initial scope.
To me, the BCMath library stuff is just that - a BCMath library. It's a
tool that can handle any size number. It's a specialized tool. And,
frankly, for the vast majority of use cases, it's complete overkill.
If we are talking about implementing a Decimal type into the language as a
first-class citizen alongside int/float/etc, do we really need it to handle
numbers outside 64-bit space? Ints have a size limit (64 bits), floats also
have a defined range. Why not have decimal be represented as 2 64-bit ints
at the engine level, and similarly to floating-point numbers, you can have
a php.ini setting where you can define the precision you want? Floats have
a default of 14 positions. Why not have a setting that defines precision
for the decimal type, set it to the same 14 positions, and you can have a
decimal type that has 114 bits for the integer part and 14 bits for the
floating-point part? In the vast majority of cases, for all practical
intents and purposes, that would be enough. For the rest - you have
ext/decimal, BCMath, GMP extensions (and by all means, improve those as
much as needed, make them as powerful as Python's math libs). This approach
has some major benefits: if done right, it's just another type that is
compatible with float, but does integer precision math, and having the
precision of 14 in the vast majority of needs is basically overkill
already. Ideally, you should be able to just replace your float and ints
with decimal type hints and just do the roundings/formatting via the usual
means of round/ceil/floor/number_format. Normal math just works. If any
part of the expression has a decimal type, the result is a decimal number.
The only sticking point I see is how to define a decimal type variable
since we do not have var/let for/const; we can only define types on class
properties and their constants. Do we add a function decimal(int|float
$value): decimal? Or do we need to do prep work to be able to define
variables with type? Another idea I have is to just do $decimal =
(decimal)10.054 when instantiating a variable. Actually, that's not that
uncommon to do it like that already when you want to ensure the result is
of a certain type, PSL library does a lot of that and I do quite like it.
Long story short, give people a tool that's simple and works, things like
scale and all that stuff we can just handle in userland code ourselves
because everyone has different needs, different scales, and so on. It's the
same as right now with integers - if you require an integer bigger than 64
bits, you use GMP/BCMath/etc. You are also not going to have fun with
databases and PDO because there are going to be some shenanigans there too.
Basically, at that point, you are running against various other PHP engine
limitations and when software has to be written with those considerations
in mind anyway in literally any language to begin with. Some are easier
than others.
Sorry for it being a bit long, I'm happy to clarify/expand on any parts you
have questions about.
Arvīds Godjuks
+371 26 851 664
arvids.godjuks@gmail.com
Telegram: @psihius https://t.me/psihius
Why not have decimal be represented as 2 64-bit ints at the engine
level
Just to clarify this point, what's the formula to convert back and
forth between a decimal and two integers? Are you thinking like
scientific notation: decimal = coefficient * 10^exponent.
64 bits for the exponent seems excessive, and it might be nice to have
more for the coefficient but maybe that doesn't matter too much.
Why not have decimal be represented as 2 64-bit ints at the engine
levelJust to clarify this point, what's the formula to convert back and
forth between a decimal and two integers? Are you thinking like
scientific notation: decimal = coefficient * 10^exponent.64 bits for the exponent seems excessive, and it might be nice to have
more for the coefficient but maybe that doesn't matter too much.
I was thinking of no exponents, just a straightforward integer
representation for the fractional part, 14 digits long (48 bits). Taking
two 64-bit numbers and combining them into a single 128-bit value would
give us a range of "-604,462,909,807,314,587,353,088" to
"604,462,909,807,314,587,353,087" for the integer part (80 bits) and
"281,474,976,710,655" for the unsigned integer for fractions (48 bits).
With this, we can achieve 14 digits of precision without any problem. I
would say these numbers are sufficiently large to realistically cover most
scenarios that the vast majority of us, PHP developers, will ever
encounter. For everything else, extensions that handle arbitrary numbers
exist. :)
The ini setting I was considering would function similarly to what it does
for floats right now - I assume it changes the exponent, thereby increasing
their precision but reducing the integer range they can cover. The same
adjustment could be applied to decimals if people really need to tweak
those ini settings (I've never seen anyone change that from the default in
20 years, but hey, I'm sure someone out there does and needs it).
Arvīds Godjuks
+371 26 851 664
arvids.godjuks@gmail.com
Telegram: @psihius https://t.me/psihius
The ini setting I was considering would function similarly to what it does for floats right now - I assume it changes the exponent, thereby increasing their precision but reducing the integer range they can cover.
If you're thinking of the "precision" setting, it doesn't do anything nearly that clever; it's purely about how many decimal digits should be displayed when converting a binary float value to a decimal string. In recent versions og PHP, it has a "-1" setting that automatically does the right thing in most cases. https://www.php.net/manual/en/ini.core.php#ini.precision
The other way around - parsing a string to a float, including when compiling source code - has a lot of different compile-time options, presumably to optimise on different platforms; but no user options at all: https://github.com/php/php-src/blob/master/Zend/zend_strtod.c
Regards,
Rowan Tommins
[IMSoP]
On Mon, Apr 8, 2024, 16:40 Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
The ini setting I was considering would function similarly to what it does
for floats right now - I assume it changes the exponent, thereby increasing
their precision but reducing the integer range they can cover.If you're thinking of the "precision" setting, it doesn't do anything
nearly that clever; it's purely about how many decimal digits should be
displayed when converting a binary float value to a decimal string. In
recent versions og PHP, it has a "-1" setting that automatically does the
right thing in most cases.
https://www.php.net/manual/en/ini.core.php#ini.precisionThe other way around - parsing a string to a float, including when
compiling source code - has a lot of different compile-time options,
presumably to optimise on different platforms; but no user options at all:
https://github.com/php/php-src/blob/master/Zend/zend_strtod.cRegards,
Rowan Tommins
[IMSoP]
Thanks for the info. Then we just specify the value range for the decimal
the same way it's done for integer and float and let developers decide if
it fits their needs or they need to use BCMath/Decimal/GMP extensions.
Develop for the common use case for the core, let extensions take the
burden of the rest.