Hi internals,
I've created a new RFC https://wiki.php.net/rfc/vector proposing to add final class Vector
to PHP.
PHP's native array
type is rare among programming language in that it is used as an associative map of values, but also needs to support lists of values.
In order to support both use cases while also providing a consistent internal array HashTable API to the PHP's internals and PECLs, additional memory is needed to track keys (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - around twice as much as is needed to just store the values due to needing space both for the string pointer and int key in a Bucket, for non-reference counted values)).
Additionally, creating non-constant arrays will allocate space for at least 8 elements to make the initial resizing more efficient, potentially wasting memory.
It would be useful to have an efficient variable-length container in the standard library for the following reasons:
- To save memory in applications or libraries that may need to store many lists of values and/or run as a CLI or embedded process for long periods of time
(in modules identified as using the most memory or potentially exceeding memory limits in the worst case)
(both in userland and in native code written in php-src/PECLs) - To provide a better alternative to
ArrayObject
andSplFixedArray
for use cases
where objects are easier to use than arrays - e.g. variable sized collections (For lists of values) that can be passed by value to be read and modified. - To give users the option of stronger runtime guarantees that property, parameter, or return values really contain a list of values without gaps, that array modifications don't introduce gaps or unexpected indexes, etc.
Thoughts on Vector?
P.S. The functionality in this proposal can be tested/tried out at https://pecl.php.net/teds (under the class name \Teds\Vector
instead of \Vector
).
(That is a PECL I created earlier this year for future versions of iterable proposals, common data structures such as Vector/Deque, and less commonly used data structures that may be of use in future work on implementing other data structures)
Thanks,
Tyson
Hey Tyson,
Would it perhaps make sense to drag in php-ds, which has matured quite a
bit over the years? I'm referring to:
https://www.php.net/manual/en/class.ds-sequence.php
Is what you are suggesting with Vector
different from it?
Note: For some reason, I can't quote your post and then reply, so it will
be a top-post π€·ββοΈ
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
final class Vector
to PHP.PHP's native
array
type is rare among programming language in that it is
used as an associative map of values, but also needs to support lists of
values.
In order to support both use cases while also providing a consistent
internal array HashTable API to the PHP's internals and PECLs, additional
memory is needed to track keys (
https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html
- around twice as much as is needed to just store the values due to needing
space both for the string pointer and int key in a Bucket, for
non-reference counted values)).
Additionally, creating non-constant arrays will allocate space for at
least 8 elements to make the initial resizing more efficient, potentially
wasting memory.It would be useful to have an efficient variable-length container in the
standard library for the following reasons:
- To save memory in applications or libraries that may need to store many
lists of values and/or run as a CLI or embedded process for long periods of
time
(in modules identified as using the most memory or potentially
exceeding memory limits in the worst case)
(both in userland and in native code written in php-src/PECLs)- To provide a better alternative to
ArrayObject
andSplFixedArray
for use cases
where objects are easier to use than arrays - e.g. variable sized
collections (For lists of values) that can be passed by value to be read
and modified.- To give users the option of stronger runtime guarantees that property,
parameter, or return values really contain a list of values without gaps,
that array modifications don't introduce gaps or unexpected indexes, etc.Thoughts on Vector?
P.S. The functionality in this proposal can be tested/tried out at
https://pecl.php.net/teds (under the class name\Teds\Vector
instead of
\Vector
).
(That is a PECL I created earlier this year for future versions of
iterable proposals, common data structures such as Vector/Deque, and less
commonly used data structures that may be of use in future work on
implementing other data structures)Thanks,
Tyson
Hey Marco Pivetta,
Would it perhaps make sense to drag in php-ds, which has matured quite a bit over the years? I'm referring to:Β https://www.php.net/manual/en/class.ds-sequence.php
Is what you are suggesting with
Vector
different from it?Note: For some reason, I can't quote your post and then reply, so it will be a top-post π€·ββοΈ
This was outlined in the section https://wiki.php.net/rfc/vector#why_not_use_php-ds_instead before I sent out the announcement. To expand on that,
This has been asked about multiple times in threads on unrelated proposals (https://externals.io/message/112639#112641 and https://externals.io/message/93301#93301 years ago) throughout the years,
but the maintainer of php-ds had a long term goal of developing the separately from php's release cycle (and was still focusing on the PECL when I'd asked on the GitHub issue in the link almost a year ago).
- There have been no proposals from the maintainer to do that so far, that was what the maintainer mentioned as a long term plan.
- I personally doubt having it developed separately from php's release cycle would be accepted by voters (e.g. if unpopular decisions couldn't be voted against), or how backwards compatibility would be handled in that model, and had other concerns. (e.g. API debates such as https://externals.io/message/93301#93301)
- With php-ds itself getting merged anytime soon seeming unlikely to me, I decided to start independently working on efficient data structure implementations.
I don't see dragging it in (against the maintainer's wishes) as a viable option for many, many, many reasons.
But having efficient datastructures in PHP's core is still useful.
-
While PECL development outside of php has its benefits for development and ability to make new features available in older php releases,
it's less likely that application and
library authors will start making use of those data structures because many users won't have any given PECL already installed.
(though php-ds also publishes a polyfill, it would not have the cpu and memory savings, and add its own overhead) -
Additionally, users (and organizations using PHP) can often make stronger assumptions on
backwards compatibility and long-term availability of functionality that is merged into PHP's core.
So the choice of feature set, some names, signatures, and internal implementation details are different, because this is reimplementing a common datastructure found in different forms in many languages.
It's definitely a mature project, but I personally feel like reimplementing this (without referring to the php-ds source code and without copying the entire api as-is) is the best choice to add efficient data structures to core while respecting the maintainer's work on the php-ds project and their wish to maintain control over the php-ds project.
As a result, I've been working on implementing data structures such as Vector based on php-src's data structure implementations (mostly SplFixedArray and ArraayObject) instead (and based on my past PECL/RFC experience, e.g. with runkit7/igbinary)
Regards,
Tyson
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
final class Vector
to PHP.PHP's native
array
type is rare among programming language in that it is used as an associative map of values, but also needs to support lists of values.
In order to support both use cases while also providing a consistent internal array HashTable API to the PHP's internals and PECLs, additional memory is needed to track keys (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - around twice as much as is needed to just store the values due to needing space both for the string pointer and int key in a Bucket, for non-reference counted values)).
Additionally, creating non-constant arrays will allocate space for at least 8 elements to make the initial resizing more efficient, potentially wasting memory.It would be useful to have an efficient variable-length container in the standard library for the following reasons:
- To save memory in applications or libraries that may need to store many lists of values and/or run as a CLI or embedded process for long periods of time
(in modules identified as using the most memory or potentially exceeding memory limits in the worst case)
(both in userland and in native code written in php-src/PECLs)- To provide a better alternative to
ArrayObject
andSplFixedArray
for use cases
where objects are easier to use than arrays - e.g. variable sized collections (For lists of values) that can be passed by value to be read and modified.- To give users the option of stronger runtime guarantees that property, parameter, or return values really contain a list of values without gaps, that array modifications don't introduce gaps or unexpected indexes, etc.
Thoughts on Vector?
P.S. The functionality in this proposal can be tested/tried out at https://pecl.php.net/teds (under the class name
\Teds\Vector
instead of\Vector
).
(That is a PECL I created earlier this year for future versions of iterable proposals, common data structures such as Vector/Deque, and less commonly used data structures that may be of use in future work on implementing other data structures)Thanks,
TysonTo unsubscribe, visit: https://www.php.net/unsub.php
I'm okay with a final Vector class in general. I don't love the
proposed API but don't hate it either. Feedback on that at the end.
What I would love is a vec
type from hacklang, which is similar to
this but pass-by-value, copy-on-write like an array. Of course, this
would require engine work and I understand it isn't as simple to add.
Feedback on API:
-
indexOf
returningfalse
instead ofnull
when it cannot be
found. If we are keeping this method (which I don't like, because
there's no comparator), please returnnull
instead of false. The
language has facilities for working with null like??
, so please
prefer that when it isn't needed for BC (like this, this is a new
API). -
contains
also doesn't have a comparator. - Similarly but less strongly, I don't like the filter callable
returningmixed
-- please just make itbool
. - I don't know what
setSize(int $size)
does. What does it do if the
current size is less than$size
? What about if its current size is
greater? I suspect this is about capacity, not size, but without docs
I am just guessing.
Hi Levi Morrison,
I'm okay with a final Vector class in general. I don't love the
proposed API but don't hate it either. Feedback on that at the end.What I would love is a
vec
type from hacklang, which is similar to
this but pass-by-value, copy-on-write like an array. Of course, this
would require engine work and I understand it isn't as simple to add.
Yeah, as mentioned in https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec , it would require a massive amount of work.
- A standard library for dealing with
vec
, filtering it, etc - Userland libraries and PECLs would need to deal with a third complex type different from array/object that probably couldn't be implicitly
- Extensive familiarity with opcache and the JIT for x86 and other platforms beyond what I have
- Willingness to do that with the uncertainty the final implementation would get 2/3 votes with backwards compatibility objections, etc.
Feedback on API:
-Β
indexOf
returningfalse
instead ofnull
when it cannot be
found. If we are keeping this method (which I don't like, because
there's no comparator), please returnnull
instead of false. The
language has facilities for working with null like??
, so please
prefer that when it isn't needed for BC (like this, this is a new
API).
I hadn't thought about that - that seems reasonable since I don't remember anything else adding indexOf as a method name.
contains
also doesn't have a comparator.
I was considering proposing ->any(callable)
and ->all(callable)
extensions if this passed.
I'm not quite sure what you mean by a comparator for contains. There'd have to be a way to check if a raw closure is contained.
-Β Similarly but less strongly, I don't like the filter callable
returningmixed
-- please just make itbool
.
The filter callable is something that would be passed into the filter function. The return value would be checked for truthiness.
The phpdoc in the documentation could be changed, but that wouldn't change the implementation.
- I don't know what
setSize(int $size)
does. What does it do if the
current size is less than$size
? What about if its current size is
greater? I suspect this is about capacity, not size, but without docs
I am just guessing.
It's the same behavior as https://www.php.net/manual/en/splfixedarray.setsize.php . It's about size, not capacity.
Change the size of an array to the new size of size.
If size is less than the current array size, any values after the new size will be discarded.
If size is greater than the current array size, the array will be padded with null values.
I'd planned to add phpdoc documentation and examples before starting a vote to document the behavior and thrown exceptions of the proposed methods.
Thanks,
Tyson
contains
also doesn't have a comparator.I was considering proposing
->any(callable)
and->all(callable)
extensions if this passed.
I'm not quite sure what you mean by a comparator for contains. There'd have to be a way to check if a raw closure is contained.
I mean that there isn't a way to provide a custom way to compare for
equality. One way to accomplish this is to have a signature like:
function contains(T $value, ?callable(T, T):bool $comparator = null): bool
The same goes for indexOf
.
- I don't know what
setSize(int $size)
does. What does it do if the
current size is less than$size
? What about if its current size is
greater? I suspect this is about capacity, not size, but without docs
I am just guessing.It's the same behavior as https://www.php.net/manual/en/splfixedarray.setsize.php . It's about size, not capacity.
Change the size of an array to the new size of size.
If size is less than the current array size, any values after the new size will be discarded.
If size is greater than the current array size, the array will be padded with null values.I'd planned to add phpdoc documentation and examples before starting a vote to document the behavior and thrown exceptions of the proposed methods.
I would rather see multiple methods like:
function truncateTo(int $size)
function padEnd(int $length, $value) // allows more than just null
function padBeginning(int $length, $value)
And one or more for increasing/ensuring capacity without changing size.
Hi Levi Morrison,
I mean that there isn't a way to provide a custom way to compare for
equality. One way to accomplish this is to have a signature like:Β Β Β function contains(T $value, ?callable(T, T):bool $comparator = null): bool
The same goes for
indexOf
.
It'd make much more sense to have ->any(static fn($other): bool => $comparator($value, $other)): ?:int
Overloading contains to do two different things (identity check or test the result of a callable)
seems like it's unintuitive to users.
Since there is plenty of time to add more functionality,
and I still haven't created the extended iterable library proposal,
this currently only adds operations that are significantly more efficient inside the Vector
(or have a return type of Vector) rather than going through the generic Iterator methods.
- I don't know what
setSize(int $size)
does. What does it do if the
current size is less than$size
? What about if its current size is
greater? I suspect this is about capacity, not size, but without docs
Β > I am just guessing.It's the same behavior as https://www.php.net/manual/en/splfixedarray.setsize.php . It's about size, not capacity.
Change the size of an array to the new size of size.
If size is less than the current array size, any values after the new size will be discarded.
If size is greater than the current array size, the array will be padded with null values.I'd planned to add phpdoc documentation and examples before starting a vote to document the behavior and thrown exceptions of the proposed methods.
I would rather see multiple methods like:
Β Β Β function truncateTo(int $size)
Β Β Β function padEnd(int $length, $value) // allows more than just null
Β Β Β function padBeginning(int $length, $value)
I'd consider this unfriendly to users (and personally consider it a poor design) if we start with 3 or 4 different ways to change the size of the Vector.
(Especially if English is a second language)
A wide variety of programming languages such as Java, Rust, C++, etc. all use resize rather than truncateTo/padEnd,
after what I assume is considerable discussion among language design experts in those languages.
In the vast majority of cases, users know the exact size they want and don't care about the mechanism to set that.
(And if the size is set larger or smaller in an if{...}else{...}
, the existence of setSize is still needed.
Or if the user intends to reuse the allocated memory while overwriting all values.)
- Diverging from what end users are familiar with (without a strong reason to) would also make it harder to start using
Vector
.
I'd considered using a signature of setSize(int $size, mixed $value = null)
to allow using something other than null
but decided to leave that to a followup proposal if it passed.
For now, I'd omitted ways to add to the start of the array because the linear time taken would be potentially objectionable,
if people didn't imagine using it themselves or thought it'd be more appropriate for end users to use a Deque.
And one or more for increasing/ensuring capacity without changing size.
setCapacity seems useful to me for reserving exactly the amount of memory needed when the final size was known (e.g. setCapacity(2) to avoid over-allocating) but I was waiting to see if anyone else wanted that.
Thanks,
Tyson
I'd considered using a signature of
setSize(int $size, mixed $value = null)
to allow using something other than null
but decided to leave that to a followup proposal if it passed.
Rust and C++ both accept a value to pad, there's no reason to restrict
this to only null. Go ahead and make the change now.
On Thu, 16 Sept 2021 at 23:33, tyson andre tysonandre775@hotmail.com
wrote:
Yeah, as mentioned in
https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec , it
would require a massive amount of work.
- A standard library for dealing with
vec
, filtering it, etc- Userland libraries and PECLs would need to deal with a third complex
type different from array/object that probably couldn't be implicitly- Extensive familiarity with opcache and the JIT for x86 and other
platforms beyond what I have- Willingness to do that with the uncertainty the final implementation
would get 2/3 votes with backwards compatibility objections, etc.
I feel like the standard library could be added in userland first, and then
corresponding faster implementations could arrive in the std lib.
But the last point is a really important one, and feels like a weakness in
the RFC process.
I know RFCs without implementations are generally frowned upon, but if
2/3rds of the community agreed that they wanted some sort of vec[] support
in theory, it might then free up the implementer(s) to take a more granular
approach to supporting vec. It could, for example, be an experimental
feature for a minor version.
Best wishes,
Matt
On Thu, 16 Sept 2021 at 23:33, tyson andre tysonandre775@hotmail.com
wrote:
Hi Levi Morrison,
I'm okay with a final Vector class in general. I don't love the
proposed API but don't hate it either. Feedback on that at the end.What I would love is a
vec
type from hacklang, which is similar to
this but pass-by-value, copy-on-write like an array. Of course, this
would require engine work and I understand it isn't as simple to add.Yeah, as mentioned in
https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec , it
would require a massive amount of work.
- A standard library for dealing with
vec
, filtering it, etc- Userland libraries and PECLs would need to deal with a third complex
type different from array/object that probably couldn't be implicitly- Extensive familiarity with opcache and the JIT for x86 and other
platforms beyond what I have- Willingness to do that with the uncertainty the final implementation
would get 2/3 votes with backwards compatibility objections, etc.Feedback on API:
indexOf
returningfalse
instead ofnull
when it cannot be
found. If we are keeping this method (which I don't like, because
there's no comparator), please returnnull
instead of false. The
language has facilities for working with null like??
, so please
prefer that when it isn't needed for BC (like this, this is a new
API).I hadn't thought about that - that seems reasonable since I don't remember
anything else adding indexOf as a method name.
contains
also doesn't have a comparator.I was considering proposing
->any(callable)
and->all(callable)
extensions if this passed.
I'm not quite sure what you mean by a comparator for contains. There'd
have to be a way to check if a raw closure is contained.
- Similarly but less strongly, I don't like the filter callable
returningmixed
-- please just make itbool
.The filter callable is something that would be passed into the filter
function. The return value would be checked for truthiness.
The phpdoc in the documentation could be changed, but that wouldn't change
the implementation.
- I don't know what
setSize(int $size)
does. What does it do if the
current size is less than$size
? What about if its current size is
greater? I suspect this is about capacity, not size, but without docs
I am just guessing.It's the same behavior as
https://www.php.net/manual/en/splfixedarray.setsize.php . It's about
size, not capacity.Change the size of an array to the new size of size.
If size is less than the current array size, any values after the new
size will be discarded.
If size is greater than the current array size, the array will be padded
with null values.I'd planned to add phpdoc documentation and examples before starting a
vote to document the behavior and thrown exceptions of the proposed methods.Thanks,
TysonTo unsubscribe, visit: https://www.php.net/unsub.php
I can also give some in-the-trenches perspective of vec's utility, having
spent the last month and a half writing Hack. vec is a really useful data
structure to be able to use explicitly in code. It makes code that uses it
easier to understand.
The main benefit over Vector is that it could be used as a straightforward
replacement for array in many cases, with the same copy-on-write semantics
helping developers avoid spooky action-at-a-distance. I'm confident that by
the time that such a feature would be ready for primetime, there would be
tools in place to assist migrations. We could also presumably allow for
casts between vec and array.
And once opcache is sorted out, I'm also confident that the advice "using
vec will make your code a bit faster" would be a successful recruitment
tool.
Best wishes,
Matt
On Thu, Sep 16, 2021 at 8:10 PM tyson andre mailto:tysonandre775@hotmail.com wrote:
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
final class Vector
to PHP.PHP's native
array
type is rare among programming language in that it is used as an associative map of values, but also needs to support lists of values.
In order to support both use cases while also providing a consistent internal array HashTable API to the PHP's internals and PECLs, additional memory is needed to track keys (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - around twice as much as is needed to just store the values due to needing space both for the string pointer and int key in a Bucket, for non-reference counted values)).
Additionally, creating non-constant arrays will allocate space for at least 8 elements to make the initial resizing more efficient, potentially wasting memory.It would be useful to have an efficient variable-length container in the standard library for the following reasons:
- To save memory in applications or libraries that may need to store many lists of values and/or run as a CLI or embedded process for long periods of time
(in modules identified as using the most memory or potentially exceeding memory limits in the worst case)
(both in userland and in native code written in php-src/PECLs)- To provide a better alternative to
ArrayObject
andSplFixedArray
for use cases
where objects are easier to use than arrays - e.g. variable sized collections (For lists of values) that can be passed by value to be read and modified.- To give users the option of stronger runtime guarantees that property, parameter, or return values really contain a list of values without gaps, that array modifications don't introduce gaps or unexpected indexes, etc.
Thoughts on Vector?
P.S. The functionality in this proposal can be tested/tried out at https://pecl.php.net/teds (under the class name
\Teds\Vector
instead of\Vector
).
(That is a PECL I created earlier this year for future versions of iterable proposals, common data structures such as Vector/Deque, and less commonly used data structures that may be of use in future work on implementing other data structures)Thanks,
TysonTo unsubscribe, visit: https://www.php.net/unsub.php
I'm okay with a final Vector class in general. I don't love the
proposed API but don't hate it either. Feedback on that at the end.
What I would love is a vec
type from hacklang, which is similar to
this but pass-by-value, copy-on-write like an array. Of course, this
would require engine work and I understand it isn't as simple to add.
Feedback on API:
-
indexOf
returningfalse
instead ofnull
when it cannot be
found. If we are keeping this method (which I don't like, because
there's no comparator), please returnnull
instead of false. The
language has facilities for working with null like??
, so please
prefer that when it isn't needed for BC (like this, this is a new
API). -
contains
also doesn't have a comparator. - Similarly but less strongly, I don't like the filter callable
returningmixed
-- please just make itbool
. - I don't know what
setSize(int $size)
does. What does it do if the
current size is less than$size
? What about if its current size is
greater? I suspect this is about capacity, not size, but without docs
I am just guessing.
--
To unsubscribe, visit: https://www.php.net/unsub.php
use SplFixedArray as vec;
Done. ;)
Olle
Hello Tyson,
Vector support would be very good. JIT can do a lot with them if we
have a clean Vector implementation, or even without JIT.
What is your base inspiration for Vector? I do like the pretty
standard C++ Vector implementation:
https://www.cplusplus.com/reference/vector/vector/
Where a Vector is initalizied with:
$myIntVector = vector<Integer>;
What is key for performance is also the alloc/realloc/free strategy.
In C++ (or most C or other languages custom) gives control to the
creators to define max size, capacity, etc.
If a non typed Vector is the goal, then I am less in need of it, still
good to have but not as good as a clear pure Vector support :).
Also I think it will be very good to have more details about what this
RFC proposes in the RFC. It is kind of hard to follow right now, with
all external links. RFCs are better if they act as a real
specification :)
Best,
Pierre
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
final class Vector
to PHP.PHP's native
array
type is rare among programming language in that it is used as an associative map of values, but also needs to support lists of values.
In order to support both use cases while also providing a consistent internal array HashTable API to the PHP's internals and PECLs, additional memory is needed to track keys (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - around twice as much as is needed to just store the values due to needing space both for the string pointer and int key in a Bucket, for non-reference counted values)).
Additionally, creating non-constant arrays will allocate space for at least 8 elements to make the initial resizing more efficient, potentially wasting memory.It would be useful to have an efficient variable-length container in the standard library for the following reasons:
- To save memory in applications or libraries that may need to store many lists of values and/or run as a CLI or embedded process for long periods of time
(in modules identified as using the most memory or potentially exceeding memory limits in the worst case)
(both in userland and in native code written in php-src/PECLs)- To provide a better alternative to
ArrayObject
andSplFixedArray
for use cases
where objects are easier to use than arrays - e.g. variable sized collections (For lists of values) that can be passed by value to be read and modified.- To give users the option of stronger runtime guarantees that property, parameter, or return values really contain a list of values without gaps, that array modifications don't introduce gaps or unexpected indexes, etc.
Thoughts on Vector?
P.S. The functionality in this proposal can be tested/tried out at https://pecl.php.net/teds (under the class name
\Teds\Vector
instead of\Vector
).
(That is a PECL I created earlier this year for future versions of iterable proposals, common data structures such as Vector/Deque, and less commonly used data structures that may be of use in future work on implementing other data structures)Thanks,
TysonTo unsubscribe, visit: https://www.php.net/unsub.php
--
Pierre
@pierrejoye | http://www.libgd.org
Hi Tyson,
Back on my laptop so I will answer my question myself as I read the
source code. Please, really, that should be part of the RFC content.
Half of the questions here are about APIs, goals, etc. RFC should be
specifications as much as possible.
Hello Tyson,
Vector support would be very good. JIT can do a lot with them if we
have a clean Vector implementation, or even without JIT.
Teds\Vector is named as Vector however I am afraid it is not, it is a
fixed array implementation. A vector, as in all other languages are,
for the definition, , fixed or variable sizes, of element of the same
type, The same type is absolutely key here.The reason to require the
same type is the core principle of vector (and vectorization),
structure of arrays rather than array of structs. The latter are hard
(or pointless) to parallelize and hard to optimize. An easy way to
play with Vector would be to try out numpy's Vector, which is by far
one of the best (and fastest) scripting language implementations of
Vector.
I did not spend enough time on the code, but I would by deconstructing:
typedef struct _teds_vector_entries {
size_t size;
size_t capacity;
zval *entries;
} teds_vector_entries;
to different types (or using multiple entries with a zval_type entry.
ie. for a zval float Vector:
typedef struct _teds_vector_entries {
size_t size;
size_t capacity;
double float;
} teds_vector_entries;
Alternatively, common C port of C++ Vector do something along this line:
typedef struct _cVector{
unsigned int size;
unsigned int cnt_elements;
unsigned int element_size;
void *elements;
} cVector;
where the initialization is:
void cVectorInit (cVector *array, unsigned int element_size); where
element_size is sizeof(double) f.e.
so any append, truncate, etc. are aware of the size to be
(re)allocated, if needed.
The only addition to handle zval would be:
typedef struct _cVector{
unsigned int size;
unsigned int cnt_elements;
zval_enum type zval_type;
unsigned int element_size;
void *elements;
} cVector;
Doing so will drastically help to finally have a simple, by usage and
implementation/api, way to implement vectorization using PHP.
I would also like to suggest having it in the engine somehow, if it is
not possible to have JIT jump in here if it is not an actual type in
the engine. While it is possible to have intrinsics implementations in
any extension, it won't be as good or efficient as an engine type with
JIT support.
Also, as it stands, I do not think it can be called a Vector. So I am
not too keen for it as I don't think we need another SplFixed*Array
implementation, as simple as it could be. The RFC needs some work
anyway before any vote can be taken.
In any case, if the above would be something you may consider (to
implement an actual vector), I can help and would be happy too if you
need/like to.
Best,
Pierre
@pierrejoye | http://www.libgd.org
Am 17.09.2021 um 04:09 schrieb tyson andre tysonandre775@hotmail.com:
I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
final class Vector
to PHP.
First of all: I don't have a strong opinion on a Vector class being useful or necessary.
But I have two comments about this RFC:
- Using the very generic name Vector without any prefix/namespace seems dangerous and asking for BC breaks.
- I don't like that this class is final. The reasons given in https://wiki.php.net/rfc/vector#final_class https://wiki.php.net/rfc/vector#final_class seem unconvincing to me and restrict the usage of Vector in a way which makes me question the usefulness to a big enough part of the PHP community.
These two reasons combined would make me reject the RFC at the current stage.
- Chris
Good afternoon Christian,
On Fri, Sep 17, 2021 at 3:07 PM Christian Schneider
cschneid@cschneid.com wrote:
Am 17.09.2021 um 04:09 schrieb tyson andre tysonandre775@hotmail.com:
I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
final class Vector
to PHP.First of all: I don't have a strong opinion on a Vector class being useful or necessary.
But I have two comments about this RFC:
- Using the very generic name Vector without any prefix/namespace seems dangerous and asking for BC breaks.
- I don't like that this class is final. The reasons given in https://wiki.php.net/rfc/vector#final_class https://wiki.php.net/rfc/vector#final_class seem unconvincing to me and restrict the usage of Vector in a way which makes me question the usefulness to a big enough part of the PHP community.
These two reasons combined would make me reject the RFC at the current stage.
I think it is more in a draft stage for discussions.
To be more precise with my earlier reply, I only see such additions as
useful if it is an actual Vector as known in other languages and
widely used the last years in ML and other similar areas like data or
image processing.
To me a vector is useful if it allows vectorized operations, as in
SIMD, AltVec, CUBA etc. Some refs:
https://users.ece.cmu.edu/~franzf/teaching/slides-18-645-simd.pdf
https://indico.cern.ch/event/238763/attachments/401939/558861/HP-intel_mic_optimization.pdf
These two refer to Intel architecture but ARM (especiall v9 with Neon,
MIPS, ppc and maybe soon riscV does support such operations as well.
It is amazingly well suited for raw performance increase. I can
imagine having annotations and/or specific optimization for vectors of
scalar processing. It requires a bit of (re)thinking, but it is
totally worth it.
Generic multiple data types vectors are less useful for such things
and SplFixedArray does it already, if I understand the RFC, as it
stands now, correctly.
best,
Pierre
@pierrejoye | http://www.libgd.org
Hi Christian Schenider,
First of all: I don't have a strong opinion on a Vector class being useful or necessary.
But I have two comments about this RFC:
- Using the very generic name Vector without any prefix/namespace seems dangerous and asking for BC breaks.
I downloaded the top 400 composer packages with https://github.com/nikic/popular-package-analysis/ and didn't find any classes named Vector.
- Only php-cs-fixer extends SplFixedArray in one class. It can continue do so.
- I don't see other classes called Vector. Just stubs for
\Ds\Vector
.
There are tradeoffs and objections to any possible choice of name I could make, including this or alternates.
- Too likely to have conflicts
- Excessively long
- Open to adopting namespace but objecting to migrating existing classes (or not doing so)
- Objecting to a specific choice
- I don't like that this class is final. The reasons given in https://wiki.php.net/rfc/vector#final_class https://wiki.php.net/rfc/vector#final_class seem unconvincing to me and restrict the usage of Vector in a way which makes me question the usefulness to a big enough part of the PHP community.
These two reasons combined would make me reject the RFC at the current stage.
There are alternatives such as making all/almost all of the methods final(especially for reading and modifying array elements or basic properties of the vector), but allowing extending the class.
-Β Still, I don't think that'd be very useful, and would make future final method additions to Vector backwards incompatible.
- Trying to do everything (e.g. be extensible and handle all edge cases of extension) has often resulted in many spl data structures doing not anything very well(efficiently, correctly, or possible to make universal assumptions about or optimize in the future with opcache/the jit).
While it is possible to extend ArrayObject and SplFixedArray, very few things do that, and it'd generally lead to worse API design except in a few cases.
(E.g. UserList extends \Vector
wouldn't be able to enforce that inserted values are actually users with final methods)
Thanks,
Tyson
Le 17/09/2021 Γ 04:09, tyson andre a Γ©critΒ :
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
final class Vector
to PHP.
Hello,
That's nice, and I like it, but like many people I will argue about the
API itself.
One thing is that there's many methods in there that would totally fit
generic collection common interfaces, and in that regard, I'd be very
sad that it would be merged as is.
I think it's taking the problem backwards, I would personally prefer that:
Β - This RFC introduces the vector into a new Collection namespace, or
any other collection/iterable/enumerable related namespace, that'd
probably become the birth of a later to be standard collection API.
Β - Start thinking about a common API even if it's for one or two
methods, and propose something that later would give the impulsion for
adding new collection types and extending this in order to be become
something that looks like a really coherent collection API.
If this goes in without regarding the greater plan, it will induce
inconsistencies in the future, when people will try to make something
greater. I'd love having something like DS and nikic/iter fused
altogether into PHP core, as a whole, in a consistent, performant, with
a nice and comprehensive API (and that doesn't require to install
userland dependencies).
I know this vector proposal is not about that, but nevertheless, in my
opinion, it must start preparing the terrain for this, or all other RFC
in the future will only create new isolated data structures and make the
SPL even more inconsistent.
Regards,
--
Pierre
Hi Pierre,
That's nice, and I like it, but like many people I will argue about the
API itself.One thing is that there's many methods in there that would totally fit
generic collection common interfaces, and in that regard, I'd be very
sad that it would be merged as is.
It isn't an interface, but my previous attempts at introducing common functionality for working with iterables have failed,
e.g. with preferring userland reasons or being too small in scope among the reasons.
https://wiki.php.net/rfc/any_all_on_iterable#straw_poll
Until there's a Set type or a Map type, adding generic functionality such as contains()
to all spl data structures is harder.
I haven't seen any recent additions of utility methods to existing spl datastructures in years other than when filling an urgent need,
(e.g. SplHeap->isCorrupted())
and have been pessimistic about that succeeding, but may be mistaken.
I think it's taking the problem backwards, I would personally prefer that:
Β - This RFC introduces the vector into a new Collection namespace, or
any other collection/iterable/enumerable related namespace, that'd
probably become the birth of a later to be standard collection API.Β - Start thinking about a common API even if it's for one or two
methods, and propose something that later would give the impulsion for
adding new collection types and extending this in order to be become
something that looks like a really coherent collection API.If this goes in without regarding the greater plan, it will induce
inconsistencies in the future, when people will try to make something
greater. I'd love having something like DS and nikic/iter fused
altogether into PHP core, as a whole, in a consistent, performant, with
a nice and comprehensive API (and that doesn't require to install
userland dependencies).
Aside: https://github.com/TysonAndre/pecl-teds#iterable-functions
starts doing that, but evaluating eagerly instead of using generators.
I still don't think there's enough functionality yet to re-propose that.
I know this vector proposal is not about that, but nevertheless, in my
opinion, it must start preparing the terrain for this, or all other RFC
in the future will only create new isolated data structures and make the
SPL even more inconsistent.
It's possible, but I don't know what others think.
- https://www.php.net/manual/en/class.ds-collection.php actually seems fairly universal, but out of scope, and I don't know if people would json encode a SplMaxHeap. Right now that isn't implemented and the value is always
{}
-
add($value)/remove($value)/contains[Value]($value)
is limited to some structures - Only containsValue() would apply to ArrayObject/SplObjectStorage. The others wouldn't work since you'd need to know the keys as well.
Also,
- Union type/intersection type support exists, so allowing any generic collection interface is less urgent.
- equals() may work, though infinite recursion (or the way it is or isn't detected) in circular data structures is a potential objection, especially with lack of stack overflow detection - php just crashes/segfaults without a useful method when it runs out of stack space.
For the ones that are universal, Countable/ArrayAccess/IteratorAggregate/Traversable already exist.
Also, as you said, this RFC is not about that.
Requiring that anyone systematically overhaul existing data structures before adding any new data structures
seems like it would significantly delay or discourage any future additions of data structures.
In the immediate future, an RFC only doing that would not have much short-term benefit to users - it would also have short-term drawbacks for what I consider not enough benefit,
if adopting that interface made libraries drop support for older php versions.
Thanks,
Tyson
Le 17/09/2021 Γ 14:54, tyson andre a Γ©critΒ :
Aside: https://github.com/TysonAndre/pecl-teds#iterable-functions
starts doing that, but evaluating eagerly instead of using generators.
I still don't think there's enough functionality yet to re-propose that.
Nice to know this, I wasn't aware it even existed, I'll have a look.I know this vector proposal is not about that, but nevertheless, in my
opinion, it must start preparing the terrain for this, or all other RFC
in the future will only create new isolated data structures and make the
SPL even more inconsistent.
It's possible, but I don't know what others think.
- https://www.php.net/manual/en/class.ds-collection.php actually seems fairly universal, but out of scope, and I don't know if people would json encode a SplMaxHeap. Right now that isn't implemented and the value is always
{}
add($value)/remove($value)/contains[Value]($value)
is limited to some structures - Only containsValue() would apply to ArrayObject/SplObjectStorage. The others wouldn't work since you'd need to know the keys as well.
That's true, vector is a bit aside of what we'd expect from a full blown
collection API, it's a very basic structure in the end so it can
probably live on its own.
Also,
- Union type/intersection type support exists, so allowing any generic collection interface is less urgent.
That's right, but I don't think that union/intersection types solve the
generic collection problem, you'd still have to match for specific class
names or interfaces if methods are not rationalised in a single API.- equals() may work, though infinite recursion (or the way it is or isn't detected) in circular data structures is a potential objection, especially with lack of stack overflow detection - php just crashes/segfaults without a useful method when it runs out of stack space.
For the ones that are universal, Countable/ArrayAccess/IteratorAggregate/Traversable already exist.
Yes, they exist, but I wouldn't place IteratorAggregate as being part of
the interface, it's about implementation, but right. Anyway altogether
they form a very poor API covering a very small surface and I'd imagine
those becoming a legacy thing if a new API was introduced.
Also, as you said, this RFC is not about that.
Requiring that anyone systematically overhaul existing data structures before adding any new data structures
seems like it would significantly delay or discourage any future additions of data structures.In the immediate future, an RFC only doing that would not have much short-term benefit to users - it would also have short-term drawbacks for what I consider not enough benefit,
if adopting that interface made libraries drop support for older php versions.
I think your point is legit, and a part of me agrees with you, probably
having some data structure before thinking about rationalisation is
something that would make people move forward. Nevertheless it's always
very difficult to change things once they're here, and the whole problem.
I crave so deeply for a complete, easy to use, well documented and
standard collection API that I always jump on such RFC's to tell people
"stop using DS, stop using Doctrine Collections, stop using "[name it
here] collection", please everyone, let's design, implement and use the
single and same one, so that we will never have to support them all
(them being the 1,000 existing duplicated library in userland) in our
framework or business code.
Thanks a lot for your answer and you time, despite the fact I still
think that designing a collection first can still be done, having the
vector type/class in core is a great idea.
Regards,
--
Pierre
On Fri, Sep 17, 2021 at 5:10 AM tyson andre tysonandre775@hotmail.com
wrote:
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
final class Vector
to PHP.
Thank you so much, Tyson - I love your proposal. Since Ds was mentioned,
I've added it to your benchmark (code and complete results at
https://gist.github.com/MaxSem/d0ea0755d6deabaf88c9ef26039b2f27):
Appending to array: n= 1048576 iterations= 20 memory=33558608
bytes, create+destroy time=0.369 read time = 0.210 result=10995105792000
Appending to Vector: n= 1048576 iterations= 20 memory=16777304
bytes, create+destroy time=0.270 read time = 0.270 result=10995105792000
Appending to SplStack: n= 1048576 iterations= 20 memory=33554584
bytes, create+destroy time=0.893 read time = 0.397 result=10995105792000
Appending to SplFixedArray: n= 1048576 iterations= 20 memory=16777304
bytes, create+destroy time=2.475 read time = 0.340 result=10995105792000
Appending to Ds\Vector: n= 1048576 iterations= 20 memory=24129632
bytes, create+destroy time=0.389 read time = 0.305 result=10995105792000
Another comparison with Ds, I wonder if an interface akin to Ds\Sequence[1]
could be added, to have something in common with other future containers.
[1] http://docs.php.net/manual/en/class.ds-sequence.php
--
Best regards,
Max Semenik
Hi Max Semenik,
Since Ds was mentioned, I've added it to your benchmark (code and complete results at https://gist.github.com/MaxSem/d0ea0755d6deabaf88c9ef26039b2f27):
Appending to array: Β Β Β Β n= 1048576 iterations= Β Β Β 20 memory=33558608 bytes, create+destroy time=0.369 read time = 0.210 result=10995105792000
Appending to Vector: Β Β Β Β n= 1048576 iterations= Β Β Β 20 memory=16777304 bytes, create+destroy time=0.270 read time = 0.270 result=10995105792000
Appending to SplStack: Β Β Β n= 1048576 iterations= Β Β Β 20 memory=33554584 bytes, create+destroy time=0.893 read time = 0.397 result=10995105792000
Appending to SplFixedArray: n= 1048576 iterations= Β Β Β 20 memory=16777304 bytes, create+destroy time=2.475 read time = 0.340 result=10995105792000
Appending to Ds\Vector: Β Β n= 1048576 iterations= Β Β Β 20 memory=24129632 bytes, create+destroy time=0.389 read time = 0.305 result=10995105792000Another comparison with Ds, I wonder if an interface akin to Ds\Sequence[1] could be added, to have something in common with other future containers.
It's worth noting that the first 4 data structures all start with initial sizes that are powers of 2 and continue doubling (and not mattering for SplStack, a doubly linked list),
but according to Ds\Vector's documentation,
it starts with a minimum size of 10. So it's an unfair comparison. http://docs.php.net/manual/en/class.ds-vector.php#ds-vector.constants.min-capacity
So there are probably larger copies done in Ds\Vector - Ds\Vector might do better for other sizes or use less memory under other circumstances.
(for reasons mentioned in https://externals.io/message/116048#116054 , I haven't checked the resizing strategy used by Ds\Vector - doubling is a common choice in vector implementations in other languages, others use other multiples of old capacity, etc)
Regards,
- Tyson
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
final class Vector
to PHP.PHP's native
array
type is rare among programming language in that it
is used as an associative map of values, but also needs to support
lists of values.
In order to support both use cases while also providing a consistent
internal array HashTable API to the PHP's internals and PECLs,
additional memory is needed to track keys
(https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - around twice as much as is needed to just store the values due to needing space both for the string pointer and int key in a Bucket, for non-reference counted values)).
Additionally, creating non-constant arrays will allocate space for at
least 8 elements to make the initial resizing more efficient,
potentially wasting memory.It would be useful to have an efficient variable-length container in
the standard library for the following reasons:
- To save memory in applications or libraries that may need to store
many lists of values and/or run as a CLI or embedded process for long
periods of time
(in modules identified as using the most memory or potentially
exceeding memory limits in the worst case)
(both in userland and in native code written in php-src/PECLs)- To provide a better alternative to
ArrayObject
andSplFixedArray
for use cases
where objects are easier to use than arrays - e.g. variable sized
collections (For lists of values) that can be passed by value to be
read and modified.- To give users the option of stronger runtime guarantees that
property, parameter, or return values really contain a list of values
without gaps, that array modifications don't introduce gaps or
unexpected indexes, etc.Thoughts on Vector?
P.S. The functionality in this proposal can be tested/tried out at
https://pecl.php.net/teds (under the class name\Teds\Vector
instead
of\Vector
).
(That is a PECL I created earlier this year for future versions of
iterable proposals, common data structures such as Vector/Deque, and
less commonly used data structures that may be of use in future work on
implementing other data structures)
Improving collection/set operations in PHP is something near and dear to my heart, so I'm in favor of adding a Vector class or similar to the stdlib.
However, I am not a fan of this particular design.
-
As Levi noted, this being a mutable object that passes by handle is asking for trouble. It should either be some by-value internal type, or an immutable object with evolver methods on it. (Eg, add($val): Vector). Making it a mutable object is creating spooky action at a distance problems. An immutable object seems likely easier to implement than a new type, but both are beyond my capabilities so I defer to those who could do so.
-
The methods around size control are seemingly pointless from a user POV. I understand the memory optimization value they have, but that's not something PHP developers are at all used to dealing with. That makes it less of a convenient drop-in replacement for array and more just another user-space collection object, but in C with internals endorsement. If such logic needs to be included, it should be kept as minimalist as possible for usability, even at the cost of a little memory usage in some cases.
-
There is no reason to preserve keys. A Vector or list type should not have user-defined keys. It should just be a linear list. If you populate it from an existing array/iterable, the keys should be entirely ignored. If you care about keys you want a HasMap or Dictionary or similar (which we also desperately need in the stdlib, but that's a separate thing).
-
Whether or not contains() needs a comparison callback in my mind depends mainly on whether or not the operator overloading RFC passes. If it does, then contains() can/should use the __compareTo() method on objects. If it doesn't, then there needs to be some other way to compare non-identical objects or else that method becomes mostly useless.
-
To echo Pierre, a Vector needs to be of a single guaranteed type. Yes, this gets us back to the generics conversation again, but I presume (perhaps naively?) there are ways to address this question without getting into full-blown generics. But really, a non-type-guaranteed Vector/List construct is of fairly little use to me in practice, and that's before we even get into the potential performance optimizations for map() and filter() from type guarantees. I can write a type-guaranteed user-space class that does what I need in under 10 minutes, and for most low cardinality sets that's adequately performant. A built-in needs to be better than that.
I very much appreciate the chicken-and-egg challenge of wanting to get something useful in despite the absence of a larger plan, and also the challenge of getting buy-in on a larger plan. Really. :-) This is an area where PHP's current dev process is very lacking. Still, I also agree with others that we need to be thinking holistically about this problem space, which will inform what the steps are. The approach we took for enums could be a model to consider (multiple RFCs clustered together under an RFC "epic".) That would allow for a long-term design, and the influence that offers, while still having milestones along the way that offer value unto themselves. (I'm happy to help with that, since that's about all I'm good for around here. :-) )
So big +1 to improving the in-C collection story; -1 to the current proposal.
--Larry Garfield
- Whether or not contains() needs a comparison callback in my mind depends mainly on whether or not the operator overloading RFC passes. If it does, then contains() can/should use the __compareTo() method on objects. If it doesn't, then there needs to be some other way to compare non-identical objects or else that method becomes mostly useless.
This is only partly true. Let's say we have a vector of some complex
type A. There are legitimate reasons for using different ways of
comparing As, such as when projecting sub-fields (for example, sorting
by each member's name this time, but next time sorting by each
member's location).
Of course, if it passes, then using a type's built-in comparison
overloading is a sensible default, but it doesn't remove the need of
having a custom comparator.
I was tired when I originally pointed it out the comparator/equatable
stuff, but Tyson was rightly saying that any
solves this need, e.g.
<?php
if ($vec->contains($value, $eq)) {/**/}
// translates to
if ($vec->any(fn ($x) => $eq($x, $value))) {/**/}
?>
However, it's not as clear what to do for indexOf
where you care
about the index it was found at.
Improving collection/set operations in PHP is something near and dear to my heart,
so I'm in favor of adding a Vector class or similar to the stdlib.However, I am not a fan of this particular design.
As Levi noted, this being a mutable object that passes by handle is asking for trouble.
It should either be some by-value internal type, or an immutable object with evolver methods on it.
(E.g., add($val): Vector). Making it a mutable object is creating spooky action at a distance problems.
An immutable object seems likely easier to implement than a new type,
but both are beyond my capabilities so I defer to those who could do so.
https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec discusses why I'm doubtful of is_vec
getting implemented or passing.
Especially with add()
taking linear time to copy all elements of the existing value if you mean an array rather than a linked list-like structure, and any referenced copies taking a lot more memory than an imperative version would.
PHP's end users and internals members come from a wide variety of backgrounds,
and I assume most beginning or experienced PHP programmers would tend towards imperative&mutable programming rather than functional&immutable programming.
PHP provides tools such as clone
, private visibility, etc to deal with that.
The lack of any immutable object datastructures in core and the lack of immutable focused extensions in PECLΒ https://pecl.php.net/package-search.php?pkg_name=immutable
https://www.php.net/manual-lookup.php?pattern=immutable&scope=quickref
(other than DateTimeImmutable)
heavily discourage me from proposing anything immutable.
(Technically, https://github.com/TysonAndre/pecl-teds has minimal implementations of immutable data structures, but the api is still being revised and Vector is the primary focus, followed by iterable functions. e.g. there's no ImmutableSequence::add($value): ImmutableSequence
method.)
The methods around size control are seemingly pointless from a user POV.
setSize is useful in allocating exactly the variable amount of memory needed while using less memory than a PHP array.
setSize($newSize, 0)
would be much more efficient and concise in initializing the value.
- Or in quickly reducing the size of the array rather than repeatedly calling pop in a loop.
And while methods around capacity control exist in many other programming languages, they aren't used by most users and most users are fine with functionality they don't use existing.
The applications or libraries that do have a good use case to reduce memory will take advantage of them and end users of those applications/libraries would benefit from the memory usage reduction.
I understand the memory optimization value they have, but that's not something PHP developers are at all used to dealing with.
That makes it less of a convenient drop-in replacement for array and more just another user-space collection object, but in C with internals endorsement.
If such logic needs to be included, it should be kept as minimalist as possible for usability,
even at the cost of a little memory usage in some cases.
If the functionality was just a drop-in replacement for array, others may say "why not just use array and the array libraries?" (or Vector).
With the strategy of doubling capacity, it can be up to 99% more memory than needed in some cases (Even more wastage after shrinking from the maximum size).
There is no reason to preserve keys.
A Vector or list type should not have user-defined keys.
It should just be a linear list. If you populate it from an existing array/iterable, the keys should be entirely ignored.
If you care about keys you want a HashMap or Dictionary or similar (which we also desperately need in the stdlib, but that's a separate thing).
The behavior is similar to https://www.php.net/manual/en/splfixedarray.fromarray.php
It tries to preserve the keys, and fills in gaps with null.
- There's the consistency with existing functionality such as SplFixedArray::fromArray, or existing constructors preserving keys.
- And I'd imagined that a last minute objection of "Wait,
new SplFixedArray([1 => 'second', 0 => 'first'])
does what by default? Isn't this using the keys 0 and 1?", and the same for gaps
Β Β I was considering only having the no-param constructor, and adding the static method fromValues(iterable $it) to make it clearer keys are ignored.
Whether or not contains() needs a comparison callback in my mind depends mainly on whether or not the operator overloading RFC passes.
If it does, then contains() can/should use the __compareTo() method on objects.
If it doesn't, then there needs to be some other way to compare non-identical objects or else that method becomes mostly useless.
There's a distinction between needs and very nice to have - a contains check for some predicate on a Vector can be done with a userland helper method and a foreach.
Also, you're requesting functionality that I don't believe is currently available for arrays, either.
Β
To echo Pierre, a Vector needs to be of a single guaranteed type.
Yes, this gets us back to the generics conversation again, but I presume (perhaps naively?) there are ways to address this question without getting into full-blown generics.
Yep, as you said, this type is mixed, just like the SplFixedArray, ArrayObject, values of SplObjectStorage/WeakMap, etc.
Generic support is something that's been brought up before, investigated, then abandoned.
My concerns with adding StringVector, MixedVector, IntVector, FloatVector, BoolVector, ArrayVector (confusing), ObjectVector, etc is that
- I doubt many people would agree that there's a wide use case for any
Β specific one of them compared to a vector of any type.
Β This would be even harder to argue for than just a single Vector type.
- Mixes of null and type
T
might make sense in many cases (e.g. optional objects, statistics that failed to get computed, etc) but would be forbidden by that - It would be a bad choice if generic support did get added in the future.
I'm not sure if we're thinking of the same thing.
Could you provide more details on how that would be implemented? Have other PECLs done something similar?
But really, a non-type-guaranteed Vector/List construct is of fairly little use to me in practice, and that's before we even get into the potential performance optimizations for map() and filter() from type guarantees.
See earlier comments on vec
/Generics not being actively worked on right now and probably being a far way away from an implementation that would pass a vote.
As for optimizations, opcache currently doesn't optimize individual global functions (let alone methods), it optimizes opcodes.
Even array_map()
/array_filter() aren't optimized, they call callbacks in an ordinary way.
E.g. https://github.com/php/php-src/pull/5588 or https://externals.io/message/109847 regarding ordinary methods.
Aside: In the long term, I think the opcache core team had a long-term plan of changing the intermediate representation to make these types of optimizations feasible without workarounds like the one I proposed in 5588
I can write a type-guaranteed user-space class that does what I need in under 10 minutes, and for most low cardinality sets that's adequately performant. A built-in needs to be better than that.
I very much appreciate the chicken-and-egg challenge of wanting to get something useful in despite the absence of a larger plan, and also the challenge of getting buy-in on a larger plan.
Really. :-) This is an area where PHP's current dev process is very lacking.
Still, I also agree with others that we need to be thinking holistically about this problem space, which will inform what the steps are.
The approach we took for enums could be a model to consider (multiple RFCs clustered together under an RFC "epic".)
That would allow for a long-term design, and the influence that offers, while still having milestones along the way that offer value unto themselves. (I'm happy to help with that, since that's about all I'm good for around here. :-) )
Enums were extensions of existing class types (is_object(Suit::Hearts) is true) rather than adding a whole separate type to the type system and don't need to support generics or contain anything other than an int/string.
I don't think the choice of "epic" widely influenced the vote.
Regards,
Tyson
Good morning,
Not sure you care or read my reply but I had to jump in one more time here :)
setSize is useful in allocating exactly the variable amount of memory needed while using less memory than a PHP array.
setSize($newSize, 0)
would be much more efficient and concise in initializing the value.
- Or in quickly reducing the size of the array rather than repeatedly calling pop in a loop.
I would rather not reduce it at all, but use the vector_size and keep
it. User land set its max size but a realloc/free should not be
necessary and counter productive from a perf point of view. If one
uses it in a daemon, it can always be destroyed as needed.
To echo Pierre, a Vector needs to be of a single guaranteed type.
Yes, this gets us back to the generics conversation again, but I presume (perhaps naively?) there are ways to address this question without getting into full-blown generics.Yep, as you said, this type is mixed, just like the SplFixedArray, ArrayObject, values of SplObjectStorage/WeakMap, etc.
Generic support is something that's been brought up before, investigated, then abandoned.My concerns with adding StringVector, MixedVector, IntVector, FloatVector, BoolVector, ArrayVector (confusing), ObjectVector, etc is that
- I doubt many people would agree that there's a wide use case for any
specific one of them compared to a vector of any type.
I am lost here. This is the main usage of Vector. For linear
arithmetic like dot product, masking, add/sub/mul/div of vector etc. I
do not see any other usage per see for all the things I have
implemented or saw out there. Additionally, f.e., a string is a vector
already on its own, I am not sure a vector of vectors makes sense ;).
This would be even harder to argue for than just a single Vector type.
- Mixes of null and type
T
might make sense in many cases (e.g. optional objects, statistics that failed to get computed, etc) but would be forbidden by that- It would be a bad choice if generic support did get added in the future.
These are special cases for general purposes of vectors. Implementing
vectors focusing on these special cases rather than the general
purpose (vectorization) would be a strategic mistake. I mentioned it
before, but please take a look at the numpy's Vector f.e., with
python's operator overload, what has been done there is simply
amazing, bringing vector processing/arithmetic a huge boost in
performance, even with millions of entries (14 to 400x speed boost
compared to classic array, even fixed).
But really, a non-type-guaranteed Vector/List construct is of fairly little use to me in practice, and that's before we even get into the potential performance optimizations for map() and filter() from type guarantees.
See earlier comments on
vec
/Generics not being actively worked on right now and probably being a far way away from an implementation that would pass a vote.
Generics!=Vector. But I hope that's not the way we are heading here :)
As for optimizations, opcache currently doesn't optimize individual global functions (let alone methods), it optimizes opcodes.
Evenarray_map()
/array_filter() aren't optimized, they call callbacks in an ordinary way.
E.g. https://github.com/php/php-src/pull/5588 or https://externals.io/message/109847 regarding ordinary methods.Aside: In the long term, I think the opcache core team had a long-term plan of changing the intermediate representation to make these types of optimizations feasible without workarounds like the one I proposed in 5588
You are fully correct here, I see a lack of the engine devs
involvement (not complaining, just a state of the affairs :) in such
RFC where this kind of feature could greatly benefit PHP. Well
planned, this is a huge addition to PHP.
It is also why I am convinced that doing it right for Vectors (as a
start) and thinking forwards to JIT and ops overloading (internally or
userland) to allow smooth and nice vectorization (as some parts use
them already internally f.e.) will bring PHP up to speed with the
competition. If we don't, we just have something that would be similar
to what anyone could do in userland with more flexibility.
Best,
Pierre
@pierrejoye | http://www.libgd.org
Hi Pierre Joye,
Not sure you care or read my reply but I had to jump in one more time here :)
setSize is useful in allocating exactly the variable amount of memory needed while using less memory than a PHP array.
setSize($newSize, 0)
would be much more efficient and concise in initializing the value.
- Or in quickly reducing the size of the array rather than repeatedly calling pop in a loop.
I would rather not reduce it at all, but use the vector_size and keep
it. User land set its max size but a realloc/free should not be
necessary and counter productive from a perf point of view. If one
uses it in a daemon, it can always be destroyed as needed.To echo Pierre, a Vector needs to be of a single guaranteed type.
Yes, this gets us back to the generics conversation again, but I presume (perhaps naively?) there are ways to address this question without getting into full-blown generics.Yep, as you said, this type is mixed, just like the SplFixedArray, ArrayObject, values of SplObjectStorage/WeakMap, etc.
Generic support is something that's been brought up before, investigated, then abandoned.My concerns with adding StringVector, MixedVector, IntVector, FloatVector, BoolVector, ArrayVector (confusing), ObjectVector, etc is that
- I doubt many people would agree that there's a wide use case for any
Β specific one of them compared to a vector of any type.I am lost here. This is the main usage of Vector. For linear
arithmetic like dot product, masking, add/sub/mul/div of vector etc. I
do not see any other usage per see for all the things I have
implemented or saw out there. Additionally, f.e., a string is a vector
already on its own, I am not sure a vector of vectors makes sense ;).Β This would be even harder to argue for than just a single Vector type.
- Mixes of null and type
T
might make sense in many cases (e.g. optional objects, statistics that failed to get computed, etc) but would be forbidden by that- It would be a bad choice if generic support did get added in the future.
These are special cases for general purposes of vectors. Implementing
vectors focusing on these special cases rather than the general
purpose (vectorization) would be a strategic mistake. I mentioned it
before, but please take a look at the numpy's Vector f.e., with
python's operator overload, what has been done there is simply
amazing, bringing vector processing/arithmetic a huge boost in
performance, even with millions of entries (14 to 400x speed boost
compared to classic array, even fixed).But really, a non-type-guaranteed Vector/List construct is of fairly little use to me in practice, and that's before we even get into the potential performance optimizations for map() and filter() from type guarantees.
See earlier comments on
vec
/Generics not being actively worked on right now and probably being a far way away from an implementation that would pass a vote.Generics!=Vector. But I hope that's not the way we are heading here :)
As for optimizations, opcache currently doesn't optimize individual global functions (let alone methods), it optimizes opcodes.
Evenarray_map()
/array_filter() aren't optimized, they call callbacks in an ordinary way.
E.g. https://github.com/php/php-src/pull/5588 or https://externals.io/message/109847 regarding ordinary methods.Aside: In the long term, I think the opcache core team had a long-term plan of changing the intermediate representation to make these types of optimizations feasible without workarounds like the one I proposed in 5588
You are fully correct here, I see a lack of the engine devs
involvement (not complaining, just a state of the affairs :) in such
RFC where this kind of feature could greatly benefit PHP. Well
planned, this is a huge addition to PHP.It is also why I am convinced that doing it right for Vectors (as a
start) and thinking forwards to JIT and ops overloading (internally or
userland) to allow smooth and nice vectorization (as some parts use
them already internally f.e.) will bring PHP up to speed with the
competition. If we don't, we just have something that would be similar
to what anyone could do in userland with more flexibility.
I have no plans to change the direction of this RFC in those directions and no personal interest in working on generics (where others have attempted and failed) or operator overloading for array operations.
Adding anything like numpy's operator overloading or generics is entirely out of the scope of my proposal and not the goal of my proposal.
Both of those are massive projects compared to adding a small number of data structures.
See (https://github.com/Danack/RfcCodex/blob/master/etiquette/rfc_etiquette.md#dont-volunteer-other-people-for-huge-amounts-of-work)
numpy.Vector is not part of a standard library. It is a data structure within a library dedicated to numeric processing/array computing - it is not a general-purpose standard library datastructure.
If you have a desire to see a similar project for use cases you have for PHP, working on a PECL would be an analogous approach.
The typed arrays like numpy would be impossible to check at runtime - you'd only have the property/return type hint/
This proposal already has a fixed-sized type - that type is mixed
(or zval
internally), like ArrayObject, WeakMap, etc. already have in their values.
(Similar to how basic Java collections (e.g. ArrayList<String>β) are all collections of Object
after generic type erasure.)
Regards,
Tyson
Hi Tyson,
On Sat, Sep 18, 2021, 10:21 AM tyson andre tysonandre775@hotmail.com
wrote:
This proposal already has a fixed-sized type - that type is
mixed
(or
zval
internally), like ArrayObject, WeakMap, etc. already have in their
values.
(Similar to how basic Java collections (e.g. ArrayList<String>β) are all
collections ofObject
after generic type erasure.)
Thanks for the clarification.
So the name of this proposal is misleading. They are not vector.
I am not sure php needs another type of fixed array at this stage. So -1
here overall.
best,
Pierre
On Sat, 18 Sept 2021 at 02:49, tyson andre tysonandre775@hotmail.com
wrote:
To echo Pierre, a Vector needs to be of a single guaranteed type.
Yes, this gets us back to the generics conversation again, but I presume
(perhaps naively?) there are ways to address this question without getting
into full-blown generics.Yep, as you said, this type is mixed, just like the SplFixedArray,
ArrayObject, values of SplObjectStorage/WeakMap, etc.
Please rename your proposal as the use of the term "Vector" is confusing
for people who use them in other languages. Much of the discussion so far
has been around whether it's a Vector or what it should be; changing the
proposed name will allow the discussion to focus on what you're proposing
to add, not what others (myself included) would like to see added to PHP :)
Peter
Hi Peter Bowyer,
To echo Pierre, a Vector needs to be of a single guaranteed type.
Yes, this gets us back to the generics conversation again, but I presume
(perhaps naively?) there are ways to address this question without getting
into full-blown generics.Yep, as you said, this type is mixed, just like the SplFixedArray,
ArrayObject, values of SplObjectStorage/WeakMap, etc.Please rename your proposal as the use of the term "Vector" is confusing
for people who use them in other languages. Much of the discussion so far
has been around whether it's a Vector or what it should be; changing the
proposed name will allow the discussion to focus on what you're proposing
to add, not what others (myself included) would like to see added to PHP :)
Many of php's names are based on the naming choices in libraries made in C/C++.
So using https://cplusplus.com/reference/vector/vector/ for my RFC https://wiki.php.net/rfc/vector
seems like the most natural naming choice,
and would make it easier for people with backgrounds in that family of languages to find the functionality they're looking for.
PHP already has a SplStack, SplQueue, etc, like C++'s stack
, queue
, etc.
I expect having a second Stack
would be confusing and make it hard to remember which is the efficient one.
(Especially since stacks typically don't include specialized resizing methods)
No alternative names have been suggested by you or them so far, as far as I remember, and 2 of those responders seem to be saying they would vote no regardless of the choice of name (for reasons such as wanting generic-like functionality, wanting immutability or built-in types, etc.).
PHP's already using List to refer to linked lists, and array
in PHP already refers to a hash table (including in ArrayObject).
So I expect a stronger objection to alternative names that I can think of.
Also, your comment is ambiguous. Are you saying that you personally object to the name,
or that you're fine with the name but think that the comments by Larry/Chris/Pierre in this email thread are representative of voters.
- People who wouldn't find the name surprising wouldn't bother writing an email to indicate a lack of surprise.
Thanks,
-Tyson
Hi Tyson,
On Sat, Sep 18, 2021, 10:46 PM tyson andre tysonandre775@hotmail.com
wrote:
Hi Peter Bowyer,
Many of php's names are based on the naming choices in libraries made in
C/C++.
So using https://cplusplus.com/reference/vector/vector/ for my RFC
https://wiki.php.net/rfc/vector
seems like the most natural naming choice,
and would make it easier for people with backgrounds in that family of
languages to find the functionality they're looking for.
I do and as mentioned before it makes it confusing and harder because of
the use of zval and not specific type like in c++. A zval is not a php
userland type.
C++ initialized a vector using the type:
vector <int> myvalues;
then all elements must be of type int.
This proposal breaks this already.
I expect having a second Stack
would be confusing and make it hard to
remember which is the efficient one.
(Especially since stacks typically don't include specialized resizing
methods)
Yes, as it isn't a stack either ;)
I have no idea how you could name, sorry, but definitely not a vector.
Also, your comment is ambiguous. Are you saying that you personally object
to the name,
or that you're fine with the name but think that the comments by
Larry/Chris/Pierre in this email thread are representative of voters.
Maybe representative of what vectors are and how they are used.
Vectorization is a key to programming now like multi threading or
parraelism a few years back. Vectorization is an old principles as
processors gain in power in the last year, this the way raw performance
scale even more with adding more cores.
Now, having a fixed array named Vector in php would be a big mistake and
actually very confusing.
best,
Pierre
Hi Tyson,
On Sat, 18 Sept 2021 at 16:46, tyson andre tysonandre775@hotmail.com
wrote:
Many of php's names are based on the naming choices in libraries made in
C/C++.
So using https://cplusplus.com/reference/vector/vector/ for my RFC
https://wiki.php.net/rfc/vector
seems like the most natural naming choice,
and would make it easier for people with backgrounds in that family of
languages to find the functionality they're looking for.
PHP already has a SplStack, SplQueue, etc, like C++'sstack
,queue
,
etc.
That is a fair point. Vector is an overloaded and common word. For me a
vector will always default to an entity characterized by a magnitude and a
direction, because that's what I learned and used for years. The next
definition I learned was the Numpy one.
That for me is the sticking point if this Vector allows mixed types which
include arrays or vectors. Store them inside a Vector and then you end up
with a matrix, a tensor and so-on in something identified as a Vector,
which is nonsense. Yes C++ does that [1]. Yes with generics it sort-of
makes sense. Numpy gets round it by calling the type ndarray
and a vector
is a specialised one-dimensional array.
If it's a high-performance array and that's the goal, call it hparray. Call
it a tuple. Call it a dictionary.
Also, your comment is ambiguous. Are you saying that you personally object
to the name,
or that you're fine with the name but think that the comments by
Larry/Chris/Pierre in this email thread are representative of voters.
Both.
I object to the name for what's being proposed, but am not necessarily
against what's being proposed if it looks more useful than the Spl* stuff.
I'm fine with the name but for something other than what's being proposed.
HTH
Peter
Hi Peter Bowyer,
That is a fair point. Vector is an overloaded and common word. For me a
vector will always default to an entity characterized by a magnitude and a
direction, because that's what I learned and used for years. The next
definition I learned was the Numpy one.That for me is the sticking point if this Vector allows mixed types which
include arrays or vectors. Store them inside a Vector and then you end up
with a matrix, a tensor and so-on in something identified as a Vector,
which is nonsense. Yes C++ does that [1]. Yes with generics it sort-of
makes sense. Numpy gets round it by calling the typendarray
and a vector
is a specialised one-dimensional array.If it's a high-performance array and that's the goal, call it hparray. Call
it a tuple. Call it a dictionary.
-
hparray
: I think putting high performance in any class name in core is a mistake,
and generally poor naming choice, and will mislead users now or in the future.
(unless it is literally an API client for a database or server that includes high performance in the server software's name)Benchmarks currently show it using less memory but some more time than
array
,
and those benchmarks will change as opcache's internals or PHP's representation
ofobject
s orarray
s change.Which choice of data structure is highest performance would depend on the benchmark or needs of the application/library.
-
tuple
: In mathematics, most references I've heard of to tuples are generally
fixed sizes (n-tuples). In programming, python and C++ and various other languages
use tuple to refer to a fixed-size (and immutable) data structure,
making this naming choice extremely confusing.
https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
https://en.cppreference.com/w/cpp/utility/tuple(In C++)Class template std::tuple is a fixed-size collection of heterogeneous values.
-
dictionary
- Wikipedia refers to this as an associative array https://en.wikipedia.org/wiki/Associative_array
which is the exact opposite of what my Vector RFC is proposing.
So I don't consider any of those proposed names appropriate alternatives,
and expect much, much stronger opposition to an RFC using that naming choice for this functionality.
I expect opposition to any naming choice I propose; Vector
is what I expect to have the least opposition.
Thanks,
Tyson
Improving collection/set operations in PHP is something near and dear to my heart,
so I'm in favor of adding a Vector class or similar to the stdlib.However, I am not a fan of this particular design.
As Levi noted, this being a mutable object that passes by handle is asking for trouble.
It should either be some by-value internal type, or an immutable object with evolver methods on it.
(E.g., add($val): Vector). Making it a mutable object is creating spooky action at a distance problems.
An immutable object seems likely easier to implement than a new type,
but both are beyond my capabilities so I defer to those who could do so.https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec
discusses why I'm doubtful ofis_vec
getting implemented or passing.
Especially withadd()
taking linear time to copy all elements of the
existing value if you mean an array rather than a linked list-like
structure, and any referenced copies taking a lot more memory than an
imperative version would.PHP's end users and internals members come from a wide variety of
backgrounds,
and I assume most beginning or experienced PHP programmers would tend
towards imperative&mutable programming rather than functional&immutable
programming.PHP provides tools such as
clone
, private visibility, etc to deal with that.The lack of any immutable object datastructures in core and the lack of
immutable focused extensions in
PECLΒ https://pecl.php.net/package-search.php?pkg_name=immutable
https://www.php.net/manual-lookup.php?pattern=immutable&scope=quickref
(other than DateTimeImmutable)
heavily discourage me from proposing anything immutable.(Technically, https://github.com/TysonAndre/pecl-teds has minimal
implementations of immutable data structures, but the api is still
being revised and Vector is the primary focus, followed by iterable
functions. e.g. there's noImmutableSequence::add($value): ImmutableSequence
method.)The methods around size control are seemingly pointless from a user POV.
setSize is useful in allocating exactly the variable amount of memory
needed while using less memory than a PHP array.
setSize($newSize, 0)
would be much more efficient and concise in
initializing the value.
- Or in quickly reducing the size of the array rather than repeatedly
calling pop in a loop.And while methods around capacity control exist in many other
programming languages, they aren't used by most users and most users
are fine with functionality they don't use existing.
The applications or libraries that do have a good use case to reduce
memory will take advantage of them and end users of those
applications/libraries would benefit from the memory usage reduction.I understand the memory optimization value they have, but that's not something PHP developers are at all used to dealing with.
That makes it less of a convenient drop-in replacement for array and more just another user-space collection object, but in C with internals endorsement.
If such logic needs to be included, it should be kept as minimalist as possible for usability,
even at the cost of a little memory usage in some cases.If the functionality was just a drop-in replacement for array, others
may say "why not just use array and the array libraries?" (or Vector).
With the strategy of doubling capacity, it can be up to 99% more memory
than needed in some cases (Even more wastage after shrinking from the
maximum size).There is no reason to preserve keys.
A Vector or list type should not have user-defined keys.
It should just be a linear list. If you populate it from an existing array/iterable, the keys should be entirely ignored.
If you care about keys you want a HashMap or Dictionary or similar (which we also desperately need in the stdlib, but that's a separate thing).The behavior is similar to
https://www.php.net/manual/en/splfixedarray.fromarray.php
It tries to preserve the keys, and fills in gaps with null.
- There's the consistency with existing functionality such as
SplFixedArray::fromArray, or existing constructors preserving keys.- And I'd imagined that a last minute objection of "Wait,
new SplFixedArray([1 => 'second', 0 => 'first'])
does what by default?
Isn't this using the keys 0 and 1?", and the same for gapsΒ Β I was considering only having the no-param constructor, and adding
the static method fromValues(iterable $it) to make it clearer keys are
ignored.Whether or not contains() needs a comparison callback in my mind depends mainly on whether or not the operator overloading RFC passes.
If it does, then contains() can/should use the __compareTo() method on objects.
If it doesn't, then there needs to be some other way to compare non-identical objects or else that method becomes mostly useless.There's a distinction between needs and very nice to have - a contains
check for some predicate on a Vector can be done with a userland helper
method and a foreach.Also, you're requesting functionality that I don't believe is currently
available for arrays, either.
ΒTo echo Pierre, a Vector needs to be of a single guaranteed type.
Yes, this gets us back to the generics conversation again, but I presume (perhaps naively?) there are ways to address this question without getting into full-blown generics.Yep, as you said, this type is mixed, just like the SplFixedArray,
ArrayObject, values of SplObjectStorage/WeakMap, etc.
Generic support is something that's been brought up before,
investigated, then abandoned.My concerns with adding StringVector, MixedVector, IntVector,
FloatVector, BoolVector, ArrayVector (confusing), ObjectVector, etc is
that
- I doubt many people would agree that there's a wide use case for any
Β specific one of them compared to a vector of any type.Β This would be even harder to argue for than just a single Vector type.
- Mixes of null and type
T
might make sense in many cases (e.g.
optional objects, statistics that failed to get computed, etc) but
would be forbidden by that- It would be a bad choice if generic support did get added in the
future.I'm not sure if we're thinking of the same thing.
Could you provide more details on how that would be implemented? Have
other PECLs done something similar?But really, a non-type-guaranteed Vector/List construct is of fairly little use to me in practice, and that's before we even get into the potential performance optimizations for map() and filter() from type guarantees.
See earlier comments on
vec
/Generics not being actively worked on
right now and probably being a far way away from an implementation that
would pass a vote.As for optimizations, opcache currently doesn't optimize individual
global functions (let alone methods), it optimizes opcodes.
Evenarray_map()
/array_filter() aren't optimized, they call callbacks
in an ordinary way.
E.g. https://github.com/php/php-src/pull/5588 or
https://externals.io/message/109847 regarding ordinary methods.Aside: In the long term, I think the opcache core team had a long-term
plan of changing the intermediate representation to make these types of
optimizations feasible without workarounds like the one I proposed in
5588I can write a type-guaranteed user-space class that does what I need in under 10 minutes, and for most low cardinality sets that's adequately performant. A built-in needs to be better than that.
I very much appreciate the chicken-and-egg challenge of wanting to get something useful in despite the absence of a larger plan, and also the challenge of getting buy-in on a larger plan.
Really. :-) This is an area where PHP's current dev process is very lacking.
Still, I also agree with others that we need to be thinking holistically about this problem space, which will inform what the steps are.
The approach we took for enums could be a model to consider (multiple RFCs clustered together under an RFC "epic".)
That would allow for a long-term design, and the influence that offers, while still having milestones along the way that offer value unto themselves. (I'm happy to help with that, since that's about all I'm good for around here. :-) )Enums were extensions of existing class types (is_object(Suit::Hearts)
is true) rather than adding a whole separate type to the type system
and don't need to support generics or contain anything other than an
int/string.
I don't think the choice of "epic" widely influenced the vote.
Rather than go point by point, I'm going to respond globally here.
I am frequently on-record hating on PHP arrays, and stating that I want something better. The problems with PHP arrays include:
- They're badly performing (because they cannot be optimized)
- They're not type safe
- They're mutable
- They mix sequences (true arrays) with dictionaries/hashmaps, making everything uglier
- People keep using them as structs, when they're not
- The API around them is procedural, inconsistent, and overall gross
- They lack a lot of native shorthand operations found in other languages (eg, slicing)
- Their error handling is crap
Any new native/stdlib alternative to arrays needs to address at least half of those issues, preferably most/all.
This proposal addresses the first point and... that's it. Point 5 is sort of covered by virtue of being out of scope, so maybe this covers 1.5 out of 8. That's insufficient to be worth the effort to support and deal with in code. That makes this approach a strong -1 for me.
"Fancy algorithms are slow when n is small, and n is usually small." -- Rob Pike
That some of the design choices here mirror existing poor implementations is not an endorsement of them. I don't think I've seen anyone on this list say anything good about SPL beyond iterators and autoloading, so it's not really a good model to emulate.
Additionally, please don't play into the trope about procedural/mutable code being more beginner friendly. That's not the case, beyond being a self-fulfilling prophesy. (If we teach procedural/mutable code first, then most beginners will be most proficient in procedural/mutable code.) I would argue that, on the whole, immutable values make code easier to reason about and write once you get past trivially small sizes. We do new developers a gross disservice by treating immutability as an "advanced" technique, when it should really be the default, beginner technique taught from day one.
I am not aware of any PECL implementations of lists that have type safety, because I don't use many PECL packages. However, in user space it's quite simple to do:
https://presentations.garfieldtech.com/slides-never-use-arrays/phpkonf2021/#/5/2
See that slide and scroll down for additional examples. Every one of those examples took me less than 5 minutes to write. If we want to have a better alternative in core, it needs to be at least as capable as what I can throw together in 5 minutes. The proposal as-is is not even as capable as those examples.
--Larry Garfield
Hi Larry Garfield,
Rather than go point by point, I'm going to respond globally here.
I am frequently on-record hating on PHP arrays, and stating that I want something better.Β The problems with PHP arrays include:
- They're badly performing (because they cannot be optimized)
- They're not type safe
- They're mutable
- They mix sequences (true arrays) with dictionaries/hashmaps, making everything uglier
- People keep using them as structs, when they're not
- The API around them is procedural, inconsistent, and overall gross
- They lack a lot of native shorthand operations found in other languages (eg, slicing)
- Their error handling is crap
Any new native/stdlib alternative to arrays needs to address at least half of those issues, preferably most/all.
This proposal addresses the first point and... that's it.Β Point 5 is sort of covered by virtue of being out of scope, so maybe this covers 1.5 out of 8.Β That's insufficient to be worth the effort to support and deal with in code.Β That makes this approach a strong -1 for me.
"Fancy algorithms are slow when n is small, and n is usually small." -- Rob Pike
That some of the design choices here mirror existing poor implementations is not an endorsement of them.Β I don't think I've seen anyone on this list say anything good about SPL beyond iterators and autoloading, so it's not really a good model to emulate.
Additionally, please don't play into the trope about procedural/mutable code being more beginner friendly.Β That's not the case, beyond being a self-fulfilling prophesy.Β (If we teach procedural/mutable code first, then most beginners will be most proficient in procedural/mutable code.)Β I would argue that, on the whole, immutable values make code easier to reason about and write once you get past trivially small sizes.Β We do new developers a gross disservice by treating immutability as an "advanced" technique, when it should really be the default, beginner technique taught from day one.
I am not aware of any PECL implementations of lists that have type safety, because I don't use many PECL packages.Β However, in user space it's quite simple to do:
https://presentations.garfieldtech.com/slides-never-use-arrays/phpkonf2021/#/5/2
See that slide and scroll down for additional examples.Β Every one of those examples took me less than 5 minutes to write.Β If we want to have a better alternative in core, it needs to be at least as capable as what I can throw together in 5 minutes.Β The proposal as-is is not even as capable as those examples.
Yes, you can implement those immutable and typed data structures in userland.
You are doing that by adding userland code hiding the internal implementations of the mutable array
to solve the needs of your library/application (e.g. those 8).
Adding a mutable Vector
gives another way to internally represent those userland data structures when you need those userland data structures to share data internally without using PHP references (not as part of the public api), e.g. appending to a list of error objects, performing a depth-first search or breadth-first search, etc.
As for your example, it's impossible to type hint without generics, and nobody's working on generics.
If I have your userland TypedArray::forType(MyClass::class);
,
I can pass it to any parameter/return value/property expecting a TypedArray
,
but that will then throw an Error at runtime with no warning ahead of time if I pass it to a function expecting a TypedArray
of OtherClass
.
Static analyzers exist separately from php that could analyze that, but
- Many developers wouldn't have static analyzers set up.
- The TypedArrays may be created from unserialization from apcu/memcache/redis and be impractical to analyze (e.g. from an older release of a library or application)
- Voters may object to this additional way to write PHP code that could error at runtime.
What data structures do you want in core? Do you want them to eagerly evaluate generators or lazily evaluate them? Is TypedArray
or TypedSequence
something you think should have an RFC or plan to create an RFC for?
Even if immutable data structures are proposed, there's a further division between programmers who want lazy or eager immutables (e.g. their constructors or factory methods to eagerly evaluate iterable values or lazily evaluate values),
and there may be enough objections to either choice (for the specific data structure proposed) when it was time to actually vote to result in the vote failing.
(in addition to other objections that come up in any new proposal for core datastructures)
This discourages me from proposing immutable data structures.
I'd agree on the utility of Set/Map/sorted data structures (though the hashable vs not hashable, comparator vs no comparator, how to hash, etc. is a discussion that I believe would be time consuming and hasn't happened yet).
Because PHP has no stack overflow/recursion depth detection and just segfaults at runtime, I believe strict ===
equality like that example
and like JavaScript https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map would be the most appropriate for an proposal made long after PHP was released.
But while I'm open to adding immutable data structures, the mutable datastructures should also exist in many cases, and I believe immutable should be named or namespaced that way.
Thanks,
Tyson
On Sat, 18 Sept 2021 at 12:04, Larry Garfield larry@garfieldtech.com
wrote:
I am frequently on-record hating on PHP arrays, and stating that I want
something better. The problems with PHP arrays include:
- They're badly performing (because they cannot be optimized)
- They're not type safe
- They're mutable
- They mix sequences (true arrays) with dictionaries/hashmaps, making
everything uglier- People keep using them as structs, when they're not
- The API around them is procedural, inconsistent, and overall gross
- They lack a lot of native shorthand operations found in other languages
(eg, slicing)- Their error handling is crap
Any new native/stdlib alternative to arrays needs to address at least half
of those issues, preferably most/all.
Hey Larry,
I believe 1. and 2. are an impossible standard for any PHP-based proposal
to meet. If you want it to be (runtime) type-safe, that assumes the
existence of runtime type checks which can quickly become a performance
bottleneck.
For 3, having explored immutability in depth with Psalm, arrays don't
present any sort of challenge due to their copy-on-write behavior. There's
a chunk of Psalm's codebase that makes heavy use of arrays, and it's
still provably pure.
5: they're used as makeshift structs, but there's nothing preventing people
using constructor property promotion and named parameters to model the same
data. I think this is effectively a solved problem.
No real debates about the 4, 6, 7 and 8, but I'm radically opposed to
throwing out the baby with the bathwater here β languages that have solved
these problems exist, but demanding a proposal pass a purity test means the
PHP project is more likely to stay the way it is.
Best wishes,
Matt
Hi Larry,
Rather than go point by point, I'm going to respond globally here.
I am frequently on-record hating on PHP arrays, and stating that I want something better. The problems with PHP arrays include:
- They're badly performing (because they cannot be optimized)
- They're not type safe
- They're mutable
- They mix sequences (true arrays) with dictionaries/hashmaps, making everything uglier
- People keep using them as structs, when they're not
- The API around them is procedural, inconsistent, and overall gross
- They lack a lot of native shorthand operations found in other languages (eg, slicing)
- Their error handling is crap
Would you mind elaborating on points #3 and #8?
It is not clear to me what you are getting at with those points.
-Mike
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/vector proposing to add
final class Vector
to PHP.PHP's native
array
type is rare among programming language in that it is used as an associative map of values, but also needs to support lists of values.
In order to support both use cases while also providing a consistent internal array HashTable API to the PHP's internals and PECLs, additional memory is needed to track keys (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - around twice as much as is needed to just store the values due to needing space both for the string pointer and int key in a Bucket, for non-reference counted values)).
Additionally, creating non-constant arrays will allocate space for at least 8 elements to make the initial resizing more efficient, potentially wasting memory.It would be useful to have an efficient variable-length container in the standard library for the following reasons:
- To save memory in applications or libraries that may need to store many lists of values and/or run as a CLI or embedded process for long periods of time
(in modules identified as using the most memory or potentially exceeding memory limits in the worst case)
(both in userland and in native code written in php-src/PECLs)- To provide a better alternative to
ArrayObject
andSplFixedArray
for use cases
where objects are easier to use than arrays - e.g. variable sized collections (For lists of values) that can be passed by value to be read and modified.- To give users the option of stronger runtime guarantees that property, parameter, or return values really contain a list of values without gaps, that array modifications don't introduce gaps or unexpected indexes, etc.
Thoughts on Vector?
Given there seems to be a lot of concern about the approach the RFC proposes would it not address the concerns about memory usage and performance if several methods were added to SplFixedArray instead (as well as functions like indexOf(), contains(), map(), filter(), JSONSerialize(), etc., or similar):
===============
setCapacity(int) β Sets the Capacity, i.e. the maximum Size before resize
getCapacity():int β Gets the current Capacity.
setGrowthFactor(float) β Sets the Growth Factor for push(). Defaults to 2
getGrowthFactor():float β Gets the current Growth Factor
pop([shrink]):mixed β Returns [Size] then subtracts 1 from Size. If (bool)shrink passed then call shrink().
push(mixed) β Sets [Size]=mixed, then Size++, unless Size=Capacity then setSize(n) where n=round(Size*GrowthFactor,0) before Size++.
grow([new_capacity]) β Increases memory allocated. Sets Capacity to Size*GrowthFactor or new_capacity.
shrink([new_capacity]) β Reduces memory allocated. Sets Capacity to current Size or new_capacity.
===============
If you had these methods then I think you would get the memory and performance improvements you want, and if you really want a final Vector class for your own uses you could roll your own using inheritance or containment.
Would this not work?
-Mike
P.S. I also think asking for new methods on SplFixedArray has a much greater chance of successful than an RFC for Vector. #jmtcw
Hi Mike Schinkel,
Given there seems to be a lot of concern about the approach the RFC proposes would it not address the concerns about memory usage and performance if several methods were added to SplFixedArray instead (as well as functions like indexOf(), contains(), map(), filter(), JSONSerialize(), etc., or similar):
===============
setCapacity(int) β Sets the Capacity, i.e. the maximum Size before resize
getCapacity():int β Gets the current Capacity.setGrowthFactor(float) β Sets the Growth Factor for push(). Defaults to 2
getGrowthFactor():float β Gets the current Growth Factorpop([shrink]):mixed β Returns [Size] then subtracts 1 from Size. If (bool)shrink passed then call shrink().
push(mixed) β Sets [Size]=mixed, then Size++, unless Size=Capacity then setSize(n) where n=round(Size*GrowthFactor,0) before Size++.grow([new_capacity]) β Increases memory allocated. Sets Capacity to Size*GrowthFactor or new_capacity.
shrink([new_capacity]) β Reduces memory allocated. Sets Capacity to current Size or new_capacity.===============
If you had these methods then I think you would get the memory and performance improvements you want, and if you really want a final Vector class for your own uses you could roll your own using inheritance or containment.
I asked 8 months ago about push
/pop
in SplFixedArray. The few responses were unanimously opposed to SplFixedArray being repurposed like a vector,
the setSize functionality was treated more like an escape hatch and it was conceptually for fixed-size data.
I also believe adding a configurable growth factor would be excessive for a high level language.
Thanks,
Tyson
Hi Tyson,
Thanks for the reply.
Hi Mike Schinkel,
Given there seems to be a lot of concern about the approach the RFC proposes would it not address the concerns about memory usage and performance if several methods were added to SplFixedArray instead (as well as functions like indexOf(), contains(), map(), filter(), JSONSerialize(), etc., or similar):
===============
setCapacity(int) β Sets the Capacity, i.e. the maximum Size before resize
getCapacity():int β Gets the current Capacity.setGrowthFactor(float) β Sets the Growth Factor for push(). Defaults to 2
getGrowthFactor():float β Gets the current Growth Factorpop([shrink]):mixed β Returns [Size] then subtracts 1 from Size. If (bool)shrink passed then call shrink().
push(mixed) β Sets [Size]=mixed, then Size++, unless Size=Capacity then setSize(n) where n=round(Size*GrowthFactor,0) before Size++.grow([new_capacity]) β Increases memory allocated. Sets Capacity to Size*GrowthFactor or new_capacity.
shrink([new_capacity]) β Reduces memory allocated. Sets Capacity to current Size or new_capacity.===============
If you had these methods then I think you would get the memory and performance improvements you want, and if you really want a final Vector class for your own uses you could roll your own using inheritance or containment.
I asked 8 months ago about
push
/pop
in SplFixedArray. The few responses were unanimously opposed to SplFixedArray being repurposed like a vector, the setSize functionality was treated more like an escape hatch and it was conceptually for fixed-size data.
Hmm. I must have missed that thread as I was definitely following the list at that time.
But I found the thread, which only had three (3) comments from others:
https://externals.io/message/112639
From Levi Morrison it seems his objection was to adding push()
and pop()
to a class including the name "Fixed." Levi suggested soft-deprecating SplStack
because it was implemented as a linked-list, but he proposed adding Spl\ArrayStack
or similar, so it seems he was open to iterating on the Spl
classes in general (no pun intended.)
From Nikita is seemed that he did not object so much as comment on Levi's suggestion of adding Spl\ArrayStack
and suggested instead an SqlDeque
that would handle queue usage more efficiently that plain PHP arrays.
So I think those responses were promising, but that you did not followed up on them. I mean no disrespect β we all get busy, our priorities change, and things fall off our radar β but it feels to me like you might have more success pursing your use-cases related to the Spl
classes than via a pure Vector
class. Maybe propose an SplVector
class that extends SplFixedArray
, or something similar that addresses the use-case and with a name that people can agree on?
BTW, here are two other somewhat-related threads:
I also believe adding a configurable growth factor would be excessive for a high level language.
I wavered on whether or not to propose a configurable growth factor, but ironically I did so to head off the potential complaint from anyone who cares deeply about memory usage (isn't that you?) that not allowing the growth factor to be configurable would mean that either the class would use too much memory for some use-cases, or would need to reallocate more memory too frequently for other use-cases, depending on what the default growth factor would be.
That said, I don't see how a configurable growth factor should be problematic for PHP? For those who don't need/care to optimize memory usage or reallocation frequency they can simply ignore it; no harm done. But for those who do care, it would give them the ability to fine tune their memory usage, which for selected use-cases could mean the difference between being able to implement something in PHP, or not.
Note that someone could easily argue that adding a memory-optimized data structure when we already have a perfectly flexible data structure with PHP arrays that can be used for the same algorithms is "excessive for a high-level language." But then I don't think you would make that argument, so why make it for a configurable growth factor? #honestquestion
This has been asked about multiple times in threads on unrelated proposals (https://externals.io/message/112639#112641 and https://externals.io/message/93301#93301 years ago) throughout the years, but the maintainer of php-ds had a long term goal of developing the separately from php's release cycle (and was still focusing on the PECL when I'd asked on the GitHub issue in the link almost a year ago).
And finally I think when you conveyed the intent of the author of ext-ds
you omitted part of his full statement. When seen in full I believe his statement conveys a different interest than the partial one implies:
https://github.com/php-ds/ext-ds/issues/156
While he did say "My long-term intention has been to not merge this extension into php-src" he immediately also said "I would like to see it become available as a default extension at the distribution level."
Based on his full statement I assume that an RFC that would propose adding an uncommented extension=ext-ds.so
in the default php.ini
would have the author of ext-ds' backing. Assuming 2/3rd of voters would agree, that seems like a really easy lift, implementation-wise?
Adding an apparently well-respected extension to default php.ini
and mentioning in the release notes so that userland PHP developers would become aware of it, start using it, writing blog posts about it, and asking questions on StackOverflow about it would be a net plus. And those who use managed PHP hosts that stick with the officially-blessed extensions would actually finally have access to it; those who use WordPress managed hosts, for example.
Of course including it would not preclude adding new data structures into core in the future. Heck, with more people using ext-ds
there will likely be greater awareness of such functionality and better recognition of its short-comings β assuming it has them β and thus facilitate more interest in adding better data structures to PHP core later on.
Also, I noticed in that 5-year old link you referenced that a few vocal members on the list bikeshedded over some of the finer details of the ext-ds
API. If an RFC to include ext-ds
in php.ini
were to be submitted I would implore those people and others to consider that this is the inclusion of an extension to php.ini
and not a feature in PHP core, and thus to please not let the perfect be the enemy of the good.
=====
Given the above, I think you have one of two (2) potential directions to pursue (or both) that each might bring more fruit than the RFC discussed on this thread:
- Propose an additional Spl class.
- Propose addition of ext-ds to the default php.ini
-Mike
I also believe adding a configurable growth factor would be excessive for a high level language.
Thanks,
Tyson--
To unsubscribe, visit: https://www.php.net/unsub.php
Hi Mike Schinkel,
Given there seems to be a lot of concern about the approach the RFC proposes would it not address the concerns about memory usage and performance if several methods were added to SplFixedArray instead (as well as functions like indexOf(), contains(), map(), filter(), JSONSerialize(), etc., or similar):
===============
setCapacity(int) β Sets the Capacity, i.e. the maximum Size before resize
getCapacity():int β Gets the current Capacity.setGrowthFactor(float) β Sets the Growth Factor for push(). Defaults to 2
getGrowthFactor():float β Gets the current Growth Factorpop([shrink]):mixed β Returns [Size] then subtracts 1 from Size. If (bool)shrink passed then call shrink().
push(mixed) β Sets [Size]=mixed, then Size++, unless Size=Capacity then setSize(n) where n=round(Size*GrowthFactor,0) before Size++.grow([new_capacity]) β Increases memory allocated. Sets Capacity to Size*GrowthFactor or new_capacity.
shrink([new_capacity]) β Reduces memory allocated. Sets Capacity to current Size or new_capacity.===============
If you had these methods then I think you would get the memory and performance improvements you want, and if you really want a final Vector class for your own uses you could roll your own using inheritance or containment.
I asked 8 months ago about
push
/pop
in SplFixedArray. The few responses were unanimously opposed to SplFixedArray being repurposed like a vector, the setSize functionality was treated more like an escape hatch and it was conceptually for fixed-size data.Hmm. I must have missed that thread as I was definitely following the list at that time.
But I found the thread, which only had three (3) comments from others:
https://externals.io/message/112639
From Levi Morrison it seems his objection was to adding
push()
andpop()
to a class including the name "Fixed."Β Levi suggested soft-deprecatingSplStack
because it was implemented as a linked-list, but he proposed addingSpl\ArrayStack
or similar, so it seems he was open to iterating on theSpl
classes in general (no pun intended.)From Nikita is seemed that he did not object so much as comment on Levi's suggestion of adding
Spl\ArrayStack
and suggested instead anSqlDeque
that would handle queue usage more efficiently that plain PHP arrays.So I think those responses were promising, but that you did not followed up on them. I mean no disrespect β we all get busy, our priorities change, and things fall off our radar β but it feels to me like you might have more success pursing your use-cases related to the
Spl
classes than via a pureVector
class.
Experience in past RFCs gave me the impression that if:
- All of the responses are suggesting using a different approach(php-ds, arrays),
- Other comments are negative or uninterested.
- None of the feedback on the original idea is positive or interested in it.
When feedback was like that, voting would typically have mostly "no" results.
Some of the feedback such as *Deque
was interesting, but not related to extending SplFixedArray.
Maybe propose an
SplVector
class that extendsSplFixedArray
, or something similar that addresses the use-case and with a name that people can agree on?
I'd be stuck with all of the features in SplFixedArray
that get introduced later and its design deisions.
BTW, here are two other somewhat-related threads:
I also believe adding a configurable growth factor would be excessive for a high level language.
I wavered on whether or not to propose a configurable growth factor, but ironically I did so to head off the potential complaint from anyone who cares deeply about memory usage (isn't that you?) that not allowing the growth factor to be configurable would mean that either the class would use too much memory for some use-cases, or would need to reallocate more memory too frequently for other use-cases, depending on what the default growth factor would be.
That said, I don't see how a configurable growth factor should be problematic for PHP? For those who don't need/care to optimize memory usage or reallocation frequency they can simply ignore it; no harm done. But for those who do care, it would give them the ability to fine tune their memory usage, which for selected use-cases could mean the difference between being able to implement something in PHP, or not.
Note that someone could easily argue that adding a memory-optimized data structure when we already have a perfectly flexible data structure with PHP arrays that can be used for the same algorithms is "excessive for a high-level language."Β But then I don't think you would make that argument, so why make it for a configurable growth factor? #honestquestion
The growth factor is even lower level than shrinkToFit/reserve, and requires extra memory to store the float, extra cpu time to do floating point multiplication rather than doubling,
and additional API methods for something that 99% of applications wouldn't use.
I consider it more suitable for a low level language.
And if we discover a different resizing strategy is better, it prevents us from changing it.
This has been asked about multiple times in threads on unrelated proposals (https://externals.io/message/112639#112641 and https://externals.io/message/93301#93301 years ago) throughout the years, but the maintainer of php-ds had a long term goal of developing the separately from php's release cycle (and was still focusing on the PECL when I'd asked on the GitHub issue in the link almost a year ago).
And finally I think when you conveyed the intent of the author of
ext-ds
you omitted part of his full statement. When seen in full I believe his statement conveys a different interest than the partial one implies:https://github.com/php-ds/ext-ds/issues/156
While he did say "My long-term intention has been to not merge this extension into php-src" he immediately also said "I would like to see it become available as a default extension at the distribution level."
Based on his full statement I assume that an RFC that would propose adding an uncommentedΒ
extension=ext-ds.so
in the defaultphp.ini
would have the author of ext-ds' backing. Assuming 2/3rd of voters would agree, that seems like a really easy lift, implementation-wise?Adding an apparently well-respected extension to default
php.ini
and mentioning in the release notes so that userland PHP developers would become aware of it, start using it, writing blog posts about it, and asking questions on StackOverflow about it would be a net plus. And those who use managed PHP hosts that stick with the officially-blessed extensions would actually finally have access to it; those who use WordPress managed hosts, for example.
Do you mean "commented" (with ;extension=ext-ds
) or "uncommented"?
I read the response in a totally different way.
See https://externals.io/message/116048#116054 for more details, I've been busy answering emails and haven't had time to collect all of the feedback and update this RFC with that,
but I'd planned to.
There have been no proposals from the maintainer to do that so far,
that was what the maintainer mentioned as a long term plan.
I personally doubt having it developed separately from php's release cycle would be accepted by voters
(e.g. if unpopular decisions couldn't be voted against), or how backwards compatibility would be handled in that model, and had other concerns.
(e.g. API debates such as https://externals.io/message/93301#93301)
With php-ds itself getting merged anytime soon seeming unlikely to me,
I decided to start independently working on efficient data structure implementations.
If you look at the bottom of the thread, the maintainer had closed the request
and Benjamin Morel had asked about reconsidering 8 months ago https://github.com/php-ds/ext-ds/issues/156#issuecomment-752179461
Since the maintainer hadn't responded since then (and due to above points),
I don't see a point in repeating the same request to reconsider when nothing has changed.
(Also, they're working on a v2 major release, and there's no timeline for that that I know of. It could be several years.)
I don't want to encourage comments that didn't introduce any new or unconvincing arguments (e.g. not "I want this too") on their GitHub issues pages,
which is why I hadn't linked the original GitHub issue, but yes, I can quote more of the response in that section.
Of course including it would not preclude adding new data structures into core in the future. Heck, with more people using
ext-ds
there will likely be greater awareness of such functionality and better recognition of its short-comings β assuming it has them β and thus facilitate more interest in adding better data structures to PHP core later on.Also, I noticed in that 5-year old link you referenced that a few vocal members on the list bikeshedded over some of the finer details of the
ext-ds
API.Β If an RFC to includeext-ds
inphp.ini
were to be submitted I would implore those people and others to consider that this is the inclusion of an extension tophp.ini
and not a feature in PHP core, and thus to please not let the perfect be the enemy of the good.=====
Given the above, I think you have one of two (2) potential directions to pursue (or both) that each might bring more fruit than the RFC discussed on this thread:
- Propose an additional Spl class.
This is an additional class in Spl. Nothing is forcing all future functionality to use Spl as a prefix,
ArrayObject
already exists without a prefix (Iterators also exist without an Spl
prefix),
and as an end user, my personal preference is short names.
And functionality has moved from Spl to core before (e.g. Iterator
originated in Spl and moved to core)
Those data structures were all added in php 5.3.
PHP has had significantly stricter discussion and voting threshold requirements for the introduction of new functionality since then,
performance and memory usage improvements, etc., and using a different naming pattern for new data structures that fill in missing functionality
or add better functionality is something I feel is worth proposing to distinguish new additions from the old data structures.
I checked out and grepped the top 400 composer packages - none of them seemed to be declaring a class Vector
/trait/interface.
- Propose addition of ext-ds to the default php.ini
I feel like it would be inappropriate for someone who isn't a maintainer of ext-ds to propose that,
especially when I'm unclear about the exact form of their long-term goals, project plans, or when ext-ds 2.0 would be out.
Thanks,
Tyson
Hi Mike Schinkel,
Hmm. I must have missed that thread as I was definitely following the list at that time.
But I found the thread, which only had three (3) comments from others:
https://externals.io/message/112639
From Levi Morrison it seems his objection was to adding
push()
andpop()
to a class including the name "Fixed." Levi suggested soft-deprecatingSplStack
because it was implemented as a linked-list, but he proposed addingSpl\ArrayStack
or similar, so it seems he was open to iterating on theSpl
classes in general (no pun intended.)From Nikita is seemed that he did not object so much as comment on Levi's suggestion of adding
Spl\ArrayStack
and suggested instead anSqlDeque
that would handle queue usage more efficiently that plain PHP arrays.So I think those responses were promising, but that you did not followed up on them. I mean no disrespect β we all get busy, our priorities change, and things fall off our radar β but it feels to me like you might have more success pursing your use-cases related to the
Spl
classes than via a pureVector
class.Experience in past RFCs gave me the impression that if:
- All of the responses are suggesting using a different approach(php-ds, arrays),
- Other comments are negative or uninterested.
- None of the feedback on the original idea is positive or interested in it.
When feedback was like that, voting would typically have mostly "no" results.
Understood, but for clarity I was implying that wanting to change SplFixedArray
was an "XY problem" and that maybe the way to address your actually use-cases was to pursue other approaches that people were suggesting, which is what you did yesterday. :-)
Maybe propose an
SplVector
class that extendsSplFixedArray
, or something similar that addresses the use-case and with a name that people can agree on?I'd be stuck with all of the features in
SplFixedArray
that get introduced later and its design deisions.
You wouldn't be stuck with all the feature of SplFixedArray
if you did "something similar."
(I make this point only as it seems you have dismiss one aspect of my suggestion while not acknowledging the alternatives I present. Twice now, at least.)
I wavered on whether or not to propose a configurable growth factor, but ironically I did so to head off the potential complaint from anyone who cares deeply about memory usage (isn't that you?) that not allowing the growth factor to be configurable would mean that either the class would use too much memory for some use-cases, or would need to reallocate more memory too frequently for other use-cases, depending on what the default growth factor would be.
That said, I don't see how a configurable growth factor should be problematic for PHP? For those who don't need/care to optimize memory usage or reallocation frequency they can simply ignore it; no harm done. But for those who do care, it would give them the ability to fine tune their memory usage, which for selected use-cases could mean the difference between being able to implement something in PHP, or not.
Note that someone could easily argue that adding a memory-optimized data structure when we already have a perfectly flexible data structure with PHP arrays that can be used for the same algorithms is "excessive for a high-level language." But then I don't think you would make that argument, so why make it for a configurable growth factor? #honestquestion
The growth factor is even lower level than shrinkToFit/reserve, and requires extra memory to store the float,
extra cpu time to do floating point multiplication rather than doubling,
and additional API methods for something that 99% of applications wouldn't use.
I consider it more suitable for a low level language.
I respect your points here, but disagree.
And if we discover a different resizing strategy is better, it prevents us from changing it.
This is not true.
We could easily no-op the GrowthFactor method and it would not break anything in 99.9...% percent of use-cases.
The relevant question here should be, what is the likelihood of us discovering a better resizing strategy that would not benefit at all from a growth factor? Is there evidence of one anywhere else? I know that Go β designed to be performant to the extent it does not add complexity β uses a growth factor.
And finally I think when you conveyed the intent of the author of
ext-ds
you omitted part of his full statement. When seen in full I believe his statement conveys a different interest than the partial one implies:https://github.com/php-ds/ext-ds/issues/156
While he did say "My long-term intention has been to not merge this extension into php-src" he immediately also said "I would like to see it become available as a default extension at the distribution level."
Based on his full statement I assume that an RFC that would propose adding an uncommented
extension=ext-ds.so
in the defaultphp.ini
would have the author of ext-ds' backing. Assuming 2/3rd of voters would agree, that seems like a really easy lift, implementation-wise?Adding an apparently well-respected extension to default
php.ini
and mentioning in the release notes so that userland PHP developers would become aware of it, start using it, writing blog posts about it, and asking questions on StackOverflow about it would be a net plus. And those who use managed PHP hosts that stick with the officially-blessed extensions would actually finally have access to it; those who use WordPress managed hosts, for example.Do you mean "commented" (with
;extension=ext-ds
) or "uncommented"?
Definitely uncommented.
Otherwise it would not be available in a default install and thus not available at for most web hosts.
I read the response in a totally different way.
Which is why it is helpful to have multiple people interpret online communication when the communication is from someone not currently participating in the discussion. :-)
See https://externals.io/message/116048#116054 for more details,I've been busy answering emails and haven't had time to collect all of the feedback and update this RFC with that,
but I'd planned to.There have been no proposals from the maintainer to do that so far,
that was what the maintainer mentioned as a long term plan.
I personally doubt having it developed separately from php's release cycle would be accepted by voters
(e.g. if unpopular decisions couldn't be voted against), or how backwards compatibility would be handled in that model, and had other concerns.
(e.g. API debates such as https://externals.io/message/93301#93301)
With php-ds itself getting merged anytime soon seeming unlikely to me,
I decided to start independently working on efficient data structure implementations.If you look at the bottom of the thread, the maintainer had closed the request
and Benjamin Morel had asked about reconsidering 8 months ago https://github.com/php-ds/ext-ds/issues/156#issuecomment-752179461
Yes, I had read that before sending my reply.
Since the maintainer hadn't responded since then (and due to above points),
I don't see a point in repeating the same request to reconsider when nothing has changed.
(Also, they're working on a v2 major release, and there's no timeline for that that I know of. It could be several years.)
Maybe I am missing something, but since he published it as open-source and said he wants it to be included in distribution I would not think it would requires the maintainer to have to be the one to do the work to get it into PHP. I think it would be perfectly acceptable for supporters of his library to do the work to get into PHP.
Why would that not be an option?
And rather than make a wrong assumption we could minimally you could submit an issue on his repo asking the specific question of whether or not he would support adding ext-ds
to php.ini
. Then we would know explicitly.
(#fwiw I don't plan to do that myself because I don't currently have a strong need for these data structures so am not one to write then champion such an RFC.)
Of course including it would not preclude adding new data structures into core in the future. Heck, with more people using
ext-ds
there will likely be greater awareness of such functionality and better recognition of its short-comings β assuming it has them β and thus facilitate more interest in adding better data structures to PHP core later on.Also, I noticed in that 5-year old link you referenced that a few vocal members on the list bikeshedded over some of the finer details of the
ext-ds
API. If an RFC to includeext-ds
inphp.ini
were to be submitted I would implore those people and others to consider that this is the inclusion of an extension tophp.ini
and not a feature in PHP core, and thus to please not let the perfect be the enemy of the good.=====
Given the above, I think you have one of two (2) potential directions to pursue (or both) that each might bring more fruit than the RFC discussed on this thread:
- Propose an additional Spl class.
This is an additional class in Spl. Nothing is forcing all future functionality to use Spl as a prefix,
ArrayObject
already exists without a prefix (Iterators also exist without anSpl
prefix),
and as an end user, my personal preference is short names.
And functionality has moved from Spl to core before (e.g.Iterator
originated in Spl and moved to core)Those data structures were all added in php 5.3.
PHP has had significantly stricter discussion and voting threshold requirements for the introduction of new functionality since then,
performance and memory usage improvements, etc., and using a different naming pattern for new data structures that fill in missing functionality
or add better functionality is something I feel is worth proposing to distinguish new additions from the old data structures.
Yes, all true β and I dislike Spl too β but my point was you'd probably more likely see success quicker if you proposed in Spl.
But if that's not something you would do, then it a moot point.
- Propose addition of ext-ds to the default php.ini
I feel like it would be inappropriate for someone who isn't a maintainer of ext-ds to propose that,
especially when I'm unclear about the exact form of their long-term goals, project plans, or when ext-ds 2.0 would be out.
I already commented on that above so won't repeat it here.
-Mike
Hi Mike Shinkel,
Hmm. I must have missed that thread as I was definitely following the list at that time.
But I found the thread, which only had three (3) comments from others:
https://externals.io/message/112639
From Levi Morrison it seems his objection was to adding
push()
andpop()
to a class including the name "Fixed."Β Levi suggested soft-deprecatingSplStack
because it was implemented as a linked-list, but he proposed addingSpl\ArrayStack
or similar, so it seems he was open to iterating on theSpl
classes in general (no pun intended.)From Nikita is seemed that he did not object so much as comment on Levi's suggestion of adding
Spl\ArrayStack
and suggested instead anSqlDeque
that would handle queue usage more efficiently that plain PHP arrays.So I think those responses were promising, but that you did not followed up on them. I mean no disrespect β we all get busy, our priorities change, and things fall off our radar
I said that in response to you suggesting adding functionality to SplFixedArray
as the reason why I am not adding functionality to SplFixedArray
.
Those responses were promising for functionality that is not about SplFixedArray
.
The Vector
is a name choice. SplArrayStack
and a Vector
would have very similar performance characteristics and probably identical internal representations.
However, a more expansive standard library such as contains
, map
, filter
, reduce
, etc. makes more sense on a List/Vector
than a Stack
if you're solely going based on the name - when you hear Stack
, you mostly think of pushing or popping from it.
As you said also below in your followup response, I am following up on the suggestion for a Deque
.
β but it feels to me like you might have more success pursing your use-cases related to the
Spl
classes than via a pureVector
class.
It's hard to know which approach (namespaces such as Collection, SplXyz, or short names) will succeed without actually creating an RFC.
I'd mentioned my personal reasons for expecting Spl not to be the best choice.
Any email discussion only has comments from a handful of people with different arguments and preferences,
and many times more people might vote on the final RFC
Experience in past RFCs gave me the impression that if:
- All of the responses are suggesting using a different approach(php-ds, arrays),
- Other comments are negative or uninterested.
- None of the feedback on the original idea is positive or interested in it.
When feedback was like that, voting would typically have mostly "no" results.
Understood, but for clarity I was implying that wanting to change
SplFixedArray
was an "XY problem" and that maybe the way to address your actually use-cases was to pursue other approaches that people were suggesting, which is what you did yesterday.Β :-)Maybe propose an
SplVector
class that extendsSplFixedArray
, or something similar that addresses the use-case and with a name that people can agree on?I'd be stuck with all of the features in
SplFixedArray
that get introduced later and its design deisions.You wouldn't be stuck with all the feature of
SplFixedArray
if you did "something similar."
(I make this point only as it seems you have dismiss one aspect of my suggestion while not acknowledging the alternatives I present. Twice now, at least.)
I'm not sure which of the multiple suggestions you brought up was you're referring to as "something similar".
Your original suggestion I responded to was to modify "SplFixedArray",
I assumed you were suggesting that I change my RFC to focus on SplFixedArray,
I had the impression after feedback those changes to SplFixedArray would overwhelmingly fail especially due to being named "Fixed".
I don't consider making it a subclass of SplFixedArray a good design decision.
It's possible to invoke methods on base classes with ReflectionMethod
so I can't make Vector
a subclass of SplFixedArray
with an entirely different implementation.
So any memory SplFixedArray wastes (e.g. to mitigate bugs already found or found in the future) is also wasted in any subclass of SplFixedArray.
- Additionally, this has the same problem as
SplDoublyLinkedList
and its subclasses.
It prevents changing the internal representation of adding certain types of functionality if that wouldn't work with the base class. - Additionally, this would make deprecating and removing
SplFixedArray
significantly harder or impractical,
if it ever seemed appropriate in the future due to lack of use.
Additionally, I'm proposing a final class: SplFixedArray already exists and can't be converted to a final class because code already extends it.
See https://wiki.php.net/rfc/deque#final_class for the reasons why I am opposed to making this final, and why others have also been opposed to making it final.
I wavered on whether or not to propose a configurable growth factor, but ironically I did so to head off the potential complaint from anyone who cares deeply about memory usage (isn't that you?) that not allowing the growth factor to be configurable would mean that either the class would use too much memory for some use-cases, or would need to reallocate more memory too frequently for other use-cases, depending on what the default growth factor would be.
That said, I don't see how a configurable growth factor should be problematic for PHP? For those who don't need/care to optimize memory usage or reallocation frequency they can simply ignore it; no harm done. But for those who do care, it would give them the ability to fine tune their memory usage, which for selected use-cases could mean the difference between being able to implement something in PHP, or not.
Note that someone could easily argue that adding a memory-optimized data structure when we already have a perfectly flexible data structure with PHP arrays that can be used for the same algorithms is "excessive for a high-level language."Β But then I don't think you would make that argument, so why make it for a configurable growth factor? #honestquestion
The growth factor is even lower level than shrinkToFit/reserve, and requires extra memory to store the float,
extra cpu time to do floating point multiplication rather than doubling,
and additional API methods for something that 99% of applications wouldn't use.
I consider it more suitable for a low level language.I respect your points here, but disagree.
And if we discover a different resizing strategy is better, it prevents us from changing it.
This is not true.
We could easily no-op the GrowthFactor method and it would not break anything in 99.9...% percent of use-cases.
I respectfully continue to disagree and don't see any good use cases for making the growth factor fine tunable for the general case.
Even in years of using C++ or Java for various reasons, I haven't had a need to override it in those languages,
and expect use cases to be less common in PHP.
If use cases are discovered that do benefit from it, anyone can write an RFC to add this to Vector
or SplFixedArray
if it's approved, but I don't have plans on doing that myself.
Exposing the capacity in this RFC seems like it's a mistake - it's useful in unit testing or bug reporting while initially implementing the extension, but I'll likely end up hiding it until others come up with a convincing use case for ordinary use.
You also haven't brought up any use cases you have for a growth factor, just that they may exist under some circumstances.
The relevant question here should be, what is the likelihood of us discovering a better resizing strategy that would not benefit at all from a growth factor?Β Is there evidence of one anywhere else?Β I know that Go β designed to be performant to the extent it does not add complexity β uses a growth factor.
And finally I think when you conveyed the intent of the author of
ext-ds
you omitted part of his full statement. When seen in full I believe his statement conveys a different interest than the partial one implies:https://github.com/php-ds/ext-ds/issues/156
While he did say "My long-term intention has been to not merge this extension into php-src" he immediately also said "I would like to see it become available as a default extension at the distribution level."
Based on his full statement I assume that an RFC that would propose adding an uncommentedΒ
extension=ext-ds.so
in the defaultphp.ini
would have the author of ext-ds' backing. Assuming 2/3rd of voters would agree, that seems like a really easy lift, implementation-wise?Adding an apparently well-respected extension to default
php.ini
and mentioning in the release notes so that userland PHP developers would become aware of it, start using it, writing blog posts about it, and asking questions on StackOverflow about it would be a net plus. And those who use managed PHP hosts that stick with the officially-blessed extensions would actually finally have access to it; those who use WordPress managed hosts, for example.Do you mean "commented" (with
;extension=ext-ds
) or "uncommented"?Definitely uncommented.Β
Otherwise it would not be available in a default install and thus not available at for most web hosts.
Doing that and nothing else would result in many users seeing
PHP Warning: PHP Startup: Unable to load dynamic library 'asgbn' (tried: (various paths)) in %s on line %d
,
which is something I'd want to avoid.
I'd elaborated a bit more on my concerns with an extension being always-on but not maintained by php internals in https://wiki.php.net/rfc/deque#perceived_issues_and_uncertainties_about_php
I read the response in a totally different way.
Which is why it is helpful to have multiple people interpret online communication when the communication is from someone not currently participating in the discussion. :-)
At the time, I felt it would be unproductive and disrespectful of the maintainer's time to repeat the request that was already made
if nothing at all had changed since the previous time I asked.
It's been 8 months and I managed to create independent reimplementations of the core functionality of Queue
and Vector
data structures,
so it's definitely worth checking if their position has changed.
See https://externals.io/message/116048#116054 for more details,I've been busy answering emails and haven't had time to collect all of the feedback and update this RFC with that,
but I'd planned to.There have been no proposals from the maintainer to do that so far,
Β that was what the maintainer mentioned as a long term plan.
I personally doubt having it developed separately from php's release cycle would be accepted by voters
Β (e.g. if unpopular decisions couldn't be voted against), or how backwards compatibility would be handled in that model, and had other concerns.
Β (e.g. API debates such as https://externals.io/message/93301#93301)
With php-ds itself getting merged anytime soon seeming unlikely to me,
Β I decided to start independently working on efficient data structure implementations.If you look at the bottom of the thread, the maintainer had closed the request
and Benjamin Morel had asked about reconsidering 8 months ago https://github.com/php-ds/ext-ds/issues/156#issuecomment-752179461Yes, I had read that before sending my reply.
Oh. So you read that but don't share my concerns about https://wiki.php.net/rfc/deque#perceived_issues_and_uncertainties_about_php-ds_distribution_plans
I don't know how you'd actually get approval for extension=ds
in the default php.ini.
Furthermore, we don't control the contents of php.ini in OS distributions - there's a different default php.ini for Homebrew,
deb/rpm (and so on) packages for various linux distributions, docker images, etc., maintained by multiple distinct groups of people.
Third, my goal is a statically compiled functionality (i.e. always on), not a shared library that can be disabled through an ini setting.
Since the maintainer hadn't responded since then (and due to above points),
I don't see a point in repeating the same request to reconsider when nothing has changed.
(Also, they're working on a v2 major release, and there's no timeline for that that I know of. It could be several years.)Maybe I am missing something, but since he published it as open-source and said he wants it to be included in distribution I would not think it would requires the maintainer to have to be the one to do the work to get it into PHP. I think it would be perfectly acceptable for supporters of his library to do the work to get into PHP.Β
Why would that not be an option?
If he permitted and requested that the supporters of his library to do that, he wouldn't have to do the work, yes.
It's the problems in https://wiki.php.net/rfc/deque#perceived_issues_and_uncertainties_about_php-ds_distribution_plans that I'm concerned about.
If he didn't permit or request it, it'd be a huge problem if he or significant contributors to the package objected to the inclusion of it in php.
And rather than make a wrong assumption we could minimally you could submit an issue on his repo asking the specific question of whether or not he would support adding
ext-ds
tophp.ini
. Then we would know explicitly.(#fwiw I don't plan to do that myself because I don't currently have a strong need for these data structures so am not one to write then champion such an RFC.)
For the reasons stated above I don't think adding it to php.ini is a working solution.
Of course including it would not preclude adding new data structures into core in the future. Heck, with more people using
ext-ds
there will likely be greater awareness of such functionality and better recognition of its short-comings β assuming it has them β and thus facilitate more interest in adding better data structures to PHP core later on.Also, I noticed in that 5-year old link you referenced that a few vocal members on the list bikeshedded over some of the finer details of the
ext-ds
API.Β If an RFC to includeext-ds
inphp.ini
were to be submitted I would implore those people and others to consider that this is the inclusion of an extension tophp.ini
and not a feature in PHP core, and thus to please not let the perfect be the enemy of the good.=====
Given the above, I think you have one of two (2) potential directions to pursue (or both) that each might bring more fruit than the RFC discussed on this thread:
- Propose an additional Spl class.
This is an additional class in Spl. Nothing is forcing all future functionality to use Spl as a prefix,
ArrayObject
already exists without a prefix (Iterators also exist without anSpl
prefix),
and as an end user, my personal preference is short names.
And functionality has moved from Spl to core before (e.g.Iterator
originated in Spl and moved to core)Those data structures were all added in php 5.3.
PHP has had significantly stricter discussion and voting threshold requirements for the introduction of new functionality since then,
performance and memory usage improvements, etc., and using a different naming pattern for new data structures that fill in missing functionality
or add better functionality is something I feel is worth proposing to distinguish new additions from the old data structures.Yes, all true β and I dislike Spl too β but my point was you'd probably more likely see success quicker if you proposed in Spl.Β
But if that's not something you would do, then it a moot point.
If the majority of the community was in favor of it, I would switch to SplQueue
. I'm starting a straw poll shortly that https://externals.io/message/116112
I didn't expect the majority of end users or internals to like the name SplDeque
from feedback on previous RFCs
especially with SplDoublyLinkedList
already existing in with the performance and memory issues mentioned in the Deque
RFC.
As for adoption of namespaces - I didn't have any good ideas for namespaces at the time.
I like the recently mentioned name use Collections\Deque;
, but expected I'd end up with strong opposition to whatever namespace I could think up.
- Propose addition of ext-ds to the default php.ini
I feel like it would be inappropriate for someone who isn't a maintainer of ext-ds to propose that,
especially when I'm unclear about the exact form of their long-term goals, project plans, or when ext-ds 2.0 would be out.I already commented on that above so won't repeat it here.Β
I already responded and won't repeat it here
Thanks,
Tyson