Hi internals,
I've created a new RFC https://wiki.php.net/rfc/deque to add a final class Deque
This is based on the Teds\Deque
implementation I've worked on
for the https://github.com/TysonAndre/pecl-teds PECL.
While SplDoublyLinkedList
and its subclass SplQueue
/SplStack
exist in the SPL, they have several drawbacks
that are addressed by this RFC to add a Deque
class (to use instead of those):
-
SplDoublyLinkedList
is internally represented by a doubly linked list,
making it use roughly twice as much memory as the proposedDeque
-
push
/pop
/unshift
/shift
fromSplDoublyLinkedList
are slower due to
needing to allocate or free the linked list nodes. - Reading values in the middle of the
SplDoublyLinkedList
is proportional to the length of the list,
due to needing to traverse the linked list nodes. -
foreach
Iteration behavior cannot be understood without knowing what constructed the
SplDoublyLinkedList
instance or set the flags.
It would be useful to have an efficient Deque
container in the standard library
to provide an alternative without those drawbacks,
as well as for the following reasons:
- To save memory in applications or libraries that may need to store many lists of values or run for long periods of time.
Notably, PHP'sarray
type will never release allocated capacity.
See https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - To provide a better alternative to
SplDoublyLinkedList
,SplStack
, andSplQueue
for use cases that require stacks or queues. - As a more efficient option than
array
andSplDoublyLinkedList
as a queue orDeque
, especially forunshift
.
A Deque
is more efficient than an array
when used as a queue, more readable, and easier to use correctly.
While it is possible to efficiently remove elements from the start of an array
(in terms of insertion order),
it is very inefficient to prepend elements to the start of a large array
due to needing to either copy the array
or move all elements in the internal array representation,
and an array
would use much more memory than a Deque
when used that way (and be slower).
There are also several pitfalls to using an array as a queue for larger queue sizes,
some of which are not obvious and discovered while writing the benchmarks.
(Having a better (double-ended) queue datastructure (Deque
) than the SplDoublyLinkedList
would save users from needing to write code with these pitfalls):
-
array_key_first()
takes time proportional to the number of elementsunset
from the start of an array,
causing it to unexpectedly be extremely slow (quadratic time) after unsetting many elements at the start of the queue.
(when the array infrequently runs out of capacity, buckets are moved to the front) -
reset()
orend()
will convert a variable to a reference,
and php is significantly less efficient at reading or writing to reference.
Opcache is also less efficient at optimizing uses of variables using references. - More obviously,
array_unshift
andarray_shift
will take time proportional to the number of elements in the array
(to reindex and move existing/remaining elements).
After the discussion period ends, I currently plan to start voting on the Deque
RFC and await the results
to determine next steps for the Vector
RFC.
The thread for my other open proposal final class Vector
(https://externals.io/message/116048) has prior discussion on implementation choices,
naming choices, and motivation for adding datastructures to php-src.
- E.g. the question of why we should add general-purpose datastructures
to php-src itself rather than have users rely on PECLs,
and why this proposal doesn't and can't usephp-ds
/ext-ds
.
Thanks,
Tyson
Le 20/09/2021 à 02:03, tyson andre a écrit :
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/deque to add a
final class Deque
Hello,
It seems that you are writing more than one RFC to add many data
structures. I love that you're doing that, but I suggest that you'd
normalize them all and place all new classes in a single new dedicated
namespace.
Regards,
--
Pierre
Am 20.09.2021 um 10:36 schrieb Pierre pierre-php@processus.org:
Le 20/09/2021 à 02:03, tyson andre a écrit :
I've created a new RFC https://wiki.php.net/rfc/deque to add a
final class Deque
It seems that you are writing more than one RFC to add many data structures. I love that you're doing that, but I suggest that you'd normalize them all and place all new classes in a single new dedicated namespace.
+1
- Chris
Hi Pierre,
It seems that you are writing more than one RFC to add many data
structures. I love that you're doing that, but I suggest that you'd
normalize them all
I'm not certain what you mean by "normalize".
https://www.merriam-webster.com/dictionary/normalize mentions
- "to make conform to or reduce to a norm or standard"
- https://www.freetext.org/Introduction_to_Linear_Algebra/Basic_Vector_Operations/Normalization/
(no pun intended) - "to bring or restore to a normal condition"
If you mean to make Vector and Queue's APIs consistent with each other,
I plan to make changes to Vector (e.g. remove $preserveKeys, add isEmpty), but the Vector RFC is currently on hold.
If you also mean all datastructure RFCs should be combined into a single RFC,
I'd considered combining the Vector RFC with https://wiki.php.net/rfc/deque,
but decided against combining the RFCs in this instance, because of:
- Current discussion about whether or not to choose an alternate name for a
Vector
- The fact that
Deque
has much better performance for various queue workloads
on both time and memory usage thanarray
(and significantly better performance thanSplDoublyLinkedList
).
Still, I may consider the approach for future RFCs, given that
-
Many developers in internals have expressed a desire for having a significantly
larger data structure library in core along the lines of what php-ds provides,
but may be uninterested in some of the individual datastructures or design choices.E.g. if 60% of developers were in favor of a sorted set and its proposed API/name
(along the lines of https://cplusplus.com/reference/set/set/),
60% were in favor of an immutable sequence and its proposed API/name (of values) (similar to https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences),
then with the 2/3 voting threshold,
neither of those RFCs would pass but a proposal combining those two would pass,
despite ~95% of developers wanting some type of improved datastructures added to core in general (I would guess). -
This would allow seeing how datastructures compare to each other.
Combining RFCs has the drawback of significantly increasing the implementation, discussion, review,
delays, and time involvement for the volunteer RFC authors and voters,
and may lead to a larger number of last-minute concerns raised after voting has started when more time
is spent trying out the new code and looking at the RFC.
and place all new classes in a single new dedicated
namespace.
My rationale for deciding against a dedicated namespace is in https://wiki.php.net/rfc/deque#global_namespace
which I've recently expanded on.
The Deque
proposal is normalized with respect to the namespace choice of data structures that already exist.
The choice of global namespace maintains consistency with the namespace used for general-purpose collections already in the SPL
(as well as relatively recent additions such as ''WeakReference'' (PHP 7.4) and ''WeakMap'' (PHP 8.0)).
Other recent additions to PHP such as ''ReflectionIntersectionType'' in PHP 8.1 have
also continued to use the global namespace when adding classes with functionality related to other classes.
Additionally, prior polls for namespacing choices of other datastructure functionality showed preferences
for namespacing and not namespacing were evenly split in a straw poll for a new iterable type
(https://wiki.php.net/rfc/cachediterable_straw_poll#namespace_choices)
Introducing a new namespace for data structures would also raise the question of whether existing datastructures
should be moved to that new namespace (for consistency), and that process would:
- Raise the amount of work needed for end users or library/framework/application authors to migrate to new PHP versions.
- Cause confusion and inconvenience for years about which namespace can or should be used in an application
(''SplObjectStorage'' vs ''Xyz\SplObjectStorage''), especially for developers working on projects supporting different php version ranges. - Prevent applications/libraries from easily supporting as wide of a range of php versions as they otherwise could.
- Cause serialization/unserialization issues when migrating to different php versions,
if the old or new class name in the serialized data did not exist in the other php version and was not aliased.
For example, if the older PHP version could not ''unserialize()'' ''Xyz\SplObjectStorage''
and would silently create a__PHP_Incomplete_Class_Name
(see https://www.php.net/manual/en/language.oop5.serialization.php#language.oop5.serialization)
without any warnings or notices.
Thanks,
Tyson
Le 20/09/2021 à 15:46, tyson andre a écrit :
Hi Pierre,
I'm not certain what you mean by "normalize".
https://www.merriam-webster.com/dictionary/normalize mentions
At least please try to make it serious, I think you understood what I
meant. I'm in no place in arguing about technical details about how it
should be implemented because the C language is not a place where I
shine and I trust people like you for doing it best.
Nevertheless, my only point was, please put all data structures
altogether, and please, just don't throw them in global namespace.
Otherwise, as we said in french "ça va être un sacré bordel là dedans"
(actually, it already is "un sacré bordel").
I'll let someone better in english than me translate, I'm not a native
english speaker I could get it wrong.
If you also mean all datastructure RFCs should be combined into a single RFC,
I'd considered combining the Vector RFC with https://wiki.php.net/rfc/deque,
but decided against combining the RFCs in this instance, because of:
No, not necessarily, they don't need to be in the same RFC, having one
per data structure is probably the way to go you'll maximize chances
that each one pass vote. Nevertheless, many RFC's exist and if there's
many other to come, no matter in which order they'll happen and no
matter at which pace, they still can be seen as a "whole", and a
namespace is a in my opinion still a good idea.
I won't debate on the rest, because you are much more both involved and
technically competent than I am, and as being someone stupid, I need
things to be well-organized to find them easily, that's the only
constructive argument I have to bring to this discussion.
Regards,
--
Pierre
The choice of global namespace maintains consistency with the namespace used for general-purpose collections already in the SPL
I find this argument unconvincing. If the intention is for this to fit
with existing classes in the SPL, it should be called "SplDeque", or
more consistently "SplDoubleEndedQueue", and the RFC should talk about
how the design aligns with those existing classes.
If it is intended to be the first of a new set of data structures which
are not aligned with the existing SPL types, then putting it in a new
namespace would make most sense.
In the RFC and the list you've mentioned a few comparisons, but I don't
think any of them hold:
- ArrayObject, WeakReference, and WeakMap are all classes for binding to
specific engine behaviour, not generic data structures - Iterators all have an "Iterator" suffix (leading to some quite awkward
names) - Reflection classes all have a "Reflection" prefix
- Having both "Queue" and "SplQueue", or both "Stack" and "SplStack"
would be a terrible idea, and is a pretty strong argument not to add
data structures with such plain names
Regards,
--
Rowan Tommins
[IMSoP]
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/deque to add a
final class Deque
This is based on the
Teds\Deque
implementation I've worked on
for the https://github.com/TysonAndre/pecl-teds PECL.
With one caveat, this is a much stronger RFC than the Vector one. Good job!
However...
The choice of global namespace maintains consistency with the namespace used for general-purpose collections already in the SPL
I find this argument unconvincing. If the intention is for this to fit with existing classes in the SPL, it should be called "SplDeque", or more consistently "SplDoubleEndedQueue", and the RFC should talk about how the design aligns with those existing classes.
If it is intended to be the first of a new set of data structures which are not aligned with the existing SPL types, then putting it in a new namespace would make most sense.
In the RFC and the list you've mentioned a few comparisons, but I don't think any of them hold:
- ArrayObject, WeakReference, and WeakMap are all classes for binding to specific engine behaviour, not generic data structures
- Iterators all have an "Iterator" suffix (leading to some quite awkward names)
- Reflection classes all have a "Reflection" prefix
- Having both "Queue" and "SplQueue", or both "Stack" and "SplStack" would be a terrible idea, and is a pretty strong argument not to add data structures with such plain names
I am in complete agreement with Rowan.
Honestly, at first I confused Deque
with Dequeue
and was wondering why we would name a class with a verb? It wasn't until Rowan's comment that I realized Deque
is an abbreviation.
Which begs the question: how many other PHP developers will know computer science terms like this well enough to know Deque
is a noun when they see it, and more importantly how many PHP developers will think to search for Deque
when they need a queue?
So here is a straw man argument; name the class one of:
- DataStruct\DoubleEndedQueue, or
- DataStruct\DE_Queue
Let the bike-shedding begin!
(Or is that "Let it continue?")
-Mike
P.S. BTW re: https://github.com/TysonAndre/pecl-teds https://github.com/TysonAndre/pecl-teds, who is Ted? (pun intended)
Honestly, at first I confused
Deque
withDequeue
and was wondering why we would name a class with a verb? It wasn't until Rowan's comment that I realizedDeque
is an abbreviation.Which begs the question: how many other PHP developers will know computer science terms like this well enough to know
Deque
is a noun when they see it, and more importantly how many PHP developers will think to search forDeque
when they need a queue?
Unlike the Vector name which is really confusing as it is not a
vector, Deque is actually relatively known for anyone needing a double
ended queue. It even comes first in google search, wikipedia or java
implementations, which matches this implementation. We expect our
users to know all weird names of different patterns, I trust them to
be able to learn what a Deque is. :)
best,
Pierre
@pierrejoye | http://www.libgd.org
Honestly, at first I confused
Deque
withDequeue
and was wondering why we would name a class with a verb? It wasn't until Rowan's comment that I realizedDeque
is an abbreviation.Which begs the question: how many other PHP developers will know computer science terms like this well enough to know
Deque
is a noun when they see it, and more importantly how many PHP developers will think to search forDeque
when they need a queue?Unlike the Vector name which is really confusing as it is not a
vector, Deque is actually relatively known for anyone needing a double
ended queue. It even comes first in google search, wikipedia or java
implementations, which matches this implementation.
Being able to google what Deque means is one thing, but thinking to search for it when you don't know it exists and you don't know that Deque is the term you would need to be searching for.
We expect our
users to know all weird names of different patterns, I trust them to
be able to learn what a Deque is. :)
So we expect PHP developers to have a computer science background? Maybe you are assuming everyone minimally has your expertise? Have we as a group decided to forsake beginners?
-Mike
On Tue, Sep 21, 2021 at 11:21 AM Mike Schinkel mike@newclarity.net
wrote:Honestly, at first I confused
Deque
withDequeue
and was wondering
why we would name a class with a verb? It wasn't until Rowan's comment
that I realizedDeque
is an abbreviation.Which begs the question: how many other PHP developers will know
computer science terms like this well enough to knowDeque
is a noun when
they see it, and more importantly how many PHP developers will think to
search forDeque
when they need a queue?Unlike the Vector name which is really confusing as it is not a
vector, Deque is actually relatively known for anyone needing a double
ended queue. It even comes first in google search, wikipedia or java
implementations, which matches this implementation.Being able to google what Deque means is one thing, but thinking to search
for it when you don't know it exists and you don't know that Deque is the
term you would need to be searching for.We expect our
users to know all weird names of different patterns, I trust them to
be able to learn what a Deque is. :)So we expect PHP developers to have a computer science background? Maybe
you are assuming everyone minimally has your expertise? Have we as a group
decided to forsake beginners?
It is not what I was trying to explain.
However if one looks at the php documentation looking for specific things
like these, the Deque will be there.
So I am not expecting anyone to have a CS background (btw, I don't),
however I do have a minimal trust in users reading the docs and doing a
minimum of research.
best,
Honestly, at first I confused
Deque
withDequeue
and was wondering why we would name a class with a verb? It wasn't until Rowan's comment that I realizedDeque
is an abbreviation.Which begs the question: how many other PHP developers will know computer science terms like this well enough to know
Deque
is a noun when they see it, and more importantly how many PHP developers will think to search forDeque
when they need a queue?Unlike the Vector name which is really confusing as it is not a
vector, Deque is actually relatively known for anyone needing a double
ended queue. It even comes first in google search, wikipedia or java
implementations, which matches this implementation.Being able to google what Deque means is one thing, but thinking to search for it when you don't know it exists and you don't know that Deque is the term you would need to be searching for.
We expect our
users to know all weird names of different patterns, I trust them to
be able to learn what a Deque is. :)So we expect PHP developers to have a computer science background? Maybe you are assuming everyone minimally has your expertise? Have we as a group decided to forsake beginners?
It is not what I was trying to explain.
However if one looks at the php documentation looking for specific things like these, the Deque will be there.
So I am not expecting anyone to have a CS background (btw, I don't), however I do have a minimal trust in users reading the docs and doing a minimum of research.
And what I was trying to explain is that given my past experience working with other PHP developers, I believe that trust is misplaced for many.
And why make it harder than it needs to be? For PHP, jargon is Bad(tm), IMHO anyway.
-Mike
The name "deque" is used in the standard library of these languages:
- C++: std::deque
- Java: java.util.Deque (interface)
- Python: collections.deque
- Swift: Collections.Deque (not standard lib, apparently, but Apple
official? Don't know Swift) - Rust: std::collections::VecDeque
And these don't have it in the standard library:
- Go
- C#
- C
- JavaScript
Anyway, it's pretty decent evidence that:
- This functionality is pretty widely used across languages.
- This functionality should have "deque" be in the name, or be the
complete name.
Discussion naming for "vector" I can understand, as it's less widely
used or sometimes means something specific to numbers, but there's
little point in discussing the name "deque." It's well established in
programming, whether PHP programmers are aware of it or not.
As I see it, the discussion should be centered around:
- The API Deque provides.
- The package that provides it.
- Whether Deque's API is consistent with other structures in the same package.
- Whether this should be included in php-src or left to extensions.
I suggest that we try to make PHP as homogenous in each bullet as we
can with other languages:
- Name it "Deque."
- Put it in the namespace "Collections" (and if included in core,
then "ext/collections"). - Support common operations on Deque (pushing and popping items to
both front and back, subscript operator works, iteration, size, etc)
and add little else (don't be novel here). To me, the biggest thing
left to discuss is a notion of a maximum size, which in my own
experience is common for circular buffers (the implementation chosen
for Deque) but not all languages have this. - This is less clear, but I'm in favor as long as we can provide a
few other data structures at the same time. Obviously, this a voter
judgement call on the yes/no.
Given that I've suggested the most common options for these across
many languages, it shouldn't be very controversial. The worst bit
seems like picking the namespace "Collections" as this will break at
least one package: https://github.com/danielgsims/php-collections. We
should suggest that they vendor it anyway, as "collections" is common
e.g. "Ramsey\Collections", "Doctrine\Common\Collections". I don't see
a good alternative here to "Collections", as we haven't agreed on very
much on past namespace proposals.
Anyway, that's what I suggest.
Hi Levi Morrison,
The name "deque" is used in the standard library of these languages:
- C++: std::deque
- Java: java.util.Deque (interface)
- Python: collections.deque
- Swift: Collections.Deque (not standard lib, apparently, but Apple
official? Don't know Swift)- Rust: std::collections::VecDeque
And these don't have it in the standard library:
- Go
- C#
- C
- JavaScript
Anyway, it's pretty decent evidence that:
- This functionality is pretty widely used across languages.
- This functionality should have "deque" be in the name, or be the
complete name.
Thanks for putting that together.
For anyone wondering about the languages that don't have it:
The first 3 are compiled languages so there's the same performance for a standard library
and third-party library code written in those libraries.
For C and Go, the standard libraries of C and Go are written in C and Go.
C and Go also have a very minimal standard library as a design goal.
Everything in C and Go is a "native library", so a third-party library and a standard library would have the same performance.
(The standard library of C and Go use or allow embedding assembly for platform-dependent functionality or optimizations.
Third party libraries in C or Go can also use inline assembly https://stackoverflow.com/a/23796420)
(Not familiar with C#'s standard library, I assume it's similar?)
And browser vendors have put immense amounts of effort on optimizing JavaScript
and the JIT compilers for JavaScript for low memory usage,
supporting inlining, working efficiently on various platforms, etc.
Discussion naming for "vector" I can understand, as it's less widely
used or sometimes means something specific to numbers, but there's
little point in discussing the name "deque." It's well established in
programming, whether PHP programmers are aware of it or not.
Yes, I'd agree.
It's a well established term in programming, and using a non-standard or more verbose term
would make it harder to find/remember this functionality for users moving to php from
other programming languages,
or for users moving from php to other programming languages.
Learning programming inevitably has unfamiliar terms such as array
, printf
, etc.
that are not commonly used in mathematics or daily life.
As I see it, the discussion should be centered around:
- The API Deque provides.
- The package that provides it.
- Whether Deque's API is consistent with other structures in the same package.
- Whether this should be included in php-src or left to extensions.
I suggest that we try to make PHP as homogenous in each bullet as we
can with other languages:
- Name it "Deque."
- Put it in the namespace "Collections" (and if included in core,
then "ext/collections").- Support common operations on Deque (pushing and popping items to
both front and back, subscript operator works, iteration, size, etc)
and add little else (don't be novel here). To me, the biggest thing
left to discuss is a notion of a maximum size, which in my own
experience is common for circular buffers (the implementation chosen
for Deque) but not all languages have this.- This is less clear, but I'm in favor as long as we can provide a
few other data structures at the same time. Obviously, this a voter
judgement call on the yes/no.
I had planned on a minimal API for Deque.
"Maximum size" shouldn't be an issue for Deque
specifically, it's a size_t
. It isn't an issue for SplFixedArray and ArrayObject.
(or for
PHP would just encounter a fatal error due to either
- Hitting the
memory_limit
or available memory limit
while constructing or appending to theDeque
- Seeing a fatal error due to integer overflow, which would only matter on 32-bit php.
(TheDeque
API doesn't have a setSize)
Thesafe_erealloc
macros allow you to do that.
For additions such as Vector
and Deque
, because we can choose a size_t
(type large enough to store any size of memory) (and because 32-bit systems are increasingly rare),
I currently think always emitting an uncatchable fatal error (and exiting immediately) for impossibly large values
would be the most consistent (e.g. applications wouldn't reach catch (Throwable $t)
blocks meant for a different purpose in an unexpected way, if they failed to validate inputs).
This is already done in array and SPL types.
php > var_dump(new SplFixedArray(PHP_INT_MAX));
Fatal error: Possible integer overflow in memory allocation (9223372036854775807 * 16 + 0) in php shell code on line 1
php > var_dump(array_fill(0, 2**31 - 2, null));
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 68719476744 bytes) in php shell code on line 1
php > var_dump(array_fill(0, 2**31, null));
Warning: Uncaught ValueError: `array_fill()`: Argument #2 ($count) is too large in php shell code:1
...
// Without memory_limit
php > ini_set('memory_limit', -1);
php > $x = array_fill(0, 2**31 - 1, null);
mmap() failed: [12] Cannot allocate memory
mmap() failed: [12] Cannot allocate memory
Fatal error: Out of memory (allocated 2097152) (tried to allocate 68719476744 bytes) in php shell code on line 1
(If anyone's wondering about the error handling, php arrays have a size of signed int32_t
(i.e. at most ~2 billion elements)
to support common use cases and many array instances, in a memory efficient way, while still being efficient in the engine's internals)
- If someone's application or library unexpectedly has users that end up needing to process more than 2 billion values
(64GB of RAM in php 7.0-8.1), e.g. in reading/computing statistics from large binary files,
the addition ofDeque
andVector
which don't have arbitrary size limits would be useful to them,
and would reduce complaints or feature requests related to the ValueError.
Given that I've suggested the most common options for these across
many languages, it shouldn't be very controversial. The worst bit
seems like picking the namespace "Collections" as this will break at
least one package: https://github.com/danielgsims/php-collections. We
should suggest that they vendor it anyway, as "collections" is common
e.g. "Ramsey\Collections", "Doctrine\Common\Collections". I don't see
a good alternative here to "Collections", as we haven't agreed on very
much on past namespace proposals.
I don't plan to use any of the class names currently found in https://github.com/danielgsims/php-collections/tree/2.X/src
for future proposals, so that's less of an issue.
I also don't see good alternatives to "Collections". On a side note, other programming languages linked keep referring to collections packages as the plural Collections\
,
and I also keep using the name Collections
when thinking about it and find use Collection\Deque;
unnatural for a reason I'm not sure about,
so the plural Collections\
makes more sense.
I'd announced https://externals.io/message/116112 in case anyone had any ideas
-
DataStructures
was mentioned but I consider it too broad.
https://en.m.wikipedia.org/wiki/List_of_data_structures is a massive list of which collections of objects is just a small part.
It'd make more sense if everything was distributed as a single package such as a PECL, though
I also misread https://wiki.php.net/rfc/namespaces_in_bundled_extensions#proposal .
Sub-namespaces are allowed and could be used for functions/classlikes that were not
collections or ancestor types of collections in the future, if needed (Collections\Xyz\function_dealing_with_collections()
).
Thanks,
Tyson
"Maximum size" shouldn't be an issue for
Deque
specifically, it's asize_t
. It isn't an issue for SplFixedArray and ArrayObject.
(or for
PHP would just encounter a fatal error due to either
You wrote a lot, but unfortunately it was based on a misunderstanding.
In some languages you can set the maximum allowed number of items a
specific Deque can hold. For example (pseudo-code):
let deque = new Deque<int>(max_capacity: 3)
deque.push_back(1)
deque.push_back(2)
deque.push_back(3) # okay, it's now full
deque.push_back(4) # !
In this condition, they either error or remove the earliest.
It's okay if the proposed Deque doesn't add this capability, but it's
the only remaining major functionality which some languages have but
not the others; it should at least be discussed, I think.
Hi Levi Morrison,
"Maximum size" shouldn't be an issue for
Deque
specifically, it's asize_t
. It isn't an issue for SplFixedArray and ArrayObject.
(or for
PHP would just encounter a fatal error due to eitherYou wrote a lot, but unfortunately it was based on a misunderstanding.
In some languages you can set the maximum allowed number of items a
specific Deque can hold. For example (pseudo-code):let deque = new Deque<int>(max_capacity: 3)
deque.push_back(1)
deque.push_back(2)
deque.push_back(3) # okay, it's now fulldeque.push_back(4) # !
In this condition, they either error or remove the earliest.
It's okay if the proposed Deque doesn't add this capability, but it's
the only remaining major functionality which some languages have but
not the others; it should at least be discussed, I think.
Oh, I hadn't remembered seeing that before. That makes sense.
All of the implementations I'd remembered were unlimited.
(https://cplusplus.com/reference/deque/deque/max_size/ was a system limit, for example)
I can't think of a common use case for that in php.
If there was one, though, I strongly believe it shouldn't be the same class (and shouldn't be a subclass),
in order to ensure the behavior of push or other operations remains easy to reason about.
This can be done in userland as a userland class.
- (e.g. with a Deque instance property and runtime
count() < $this->maxCapacity
checks,
to choose their own desired return value or Throwable subclass (or manual array and circular buffer in PHP)
Thanks,
Tyson
On Tue, Sep 21, 2021 at 11:05 PM Levi Morrison via internals <
internals@lists.php.net> wrote:
The name "deque" is used in the standard library of these languages:
- C++: std::deque
- Java: java.util.Deque (interface)
- Python: collections.deque
- Swift: Collections.Deque (not standard lib, apparently, but Apple
official? Don't know Swift)- Rust: std::collections::VecDeque
And these don't have it in the standard library:
- Go
- C#
- C
- JavaScript
Anyway, it's pretty decent evidence that:
- This functionality is pretty widely used across languages.
- This functionality should have "deque" be in the name, or be the
complete name.Discussion naming for "vector" I can understand, as it's less widely
used or sometimes means something specific to numbers, but there's
little point in discussing the name "deque." It's well established in
programming, whether PHP programmers are aware of it or not.As I see it, the discussion should be centered around:
- The API Deque provides.
- The package that provides it.
- Whether Deque's API is consistent with other structures in the same
package.- Whether this should be included in php-src or left to extensions.
I suggest that we try to make PHP as homogenous in each bullet as we
can with other languages:
- Name it "Deque."
- Put it in the namespace "Collections" (and if included in core,
then "ext/collections").- Support common operations on Deque (pushing and popping items to
both front and back, subscript operator works, iteration, size, etc)
and add little else (don't be novel here). To me, the biggest thing
left to discuss is a notion of a maximum size, which in my own
experience is common for circular buffers (the implementation chosen
for Deque) but not all languages have this.- This is less clear, but I'm in favor as long as we can provide a
few other data structures at the same time. Obviously, this a voter
judgement call on the yes/no.Given that I've suggested the most common options for these across
many languages, it shouldn't be very controversial. The worst bit
seems like picking the namespace "Collections" as this will break at
least one package: https://github.com/danielgsims/php-collections. We
should suggest that they vendor it anyway, as "collections" is common
e.g. "Ramsey\Collections", "Doctrine\Common\Collections". I don't see
a good alternative here to "Collections", as we haven't agreed on very
much on past namespace proposals.Anyway, that's what I suggest.
I generally agree with everything Levi has said here. I think that adding a
deque structure generally makes sense, and that putting it into a new
ext/collections extension under the corresponding namespace would be
appropriate. I wouldn't insist on introducing it together with other
structures (I'm a lot more sceptical about your Vector proposal), but do
think there should be at least some overarching plan here, even if we only
realize a small part of it in this version.
A few questions for the particular API proposed here:
-
There would be the possibility of having an interface Deque that is
backed by a VecDeque/ArrayDeque implementation. I'm not convinced this
would actually make sense, but I wanted to throw it out there, given that
the class is final (which is without any doubt the correct decision). -
How do set() / offsetSet() work with an out of range index? Is it
possible to push an element with $deque[count($deque)] = x? I would assume
that throws, but want to make sure. On a related note, why do we need both
get()/set() and offsetGet()/offsetSet()? These APIs seem redundant. -
I believe it's pretty standard for Deque implementations to provide
insert() / remove() methods to insert at any position (these would be O(n)). -
The design of getIterator() here is rather unusual, and deserves some
additional justification. Normally, we let getIterator() see concurrent
modifications to the structure, though the precise behavior is unspecified.
I would also like to know how this will look like on a technical level (it
doesn't seem to be implemented yet?) This seems like something that will
require a check on every write operation, to potentially separate the
structure in some form. -
The shift/unshift terminology is already used by our array API, but I'm
not sure it's the most intuitive choice in the context of a deque. SplQueue
has enqueue() and dequeue(). Another popular option from other languages
(which I would personally favor) is pushFront() and popFront().
Regards,
Nikita
Hi Nikita Popov,
- There would be the possibility of having an interface Deque that is
backed by a VecDeque/ArrayDeque implementation. I'm not convinced this
would actually make sense, but I wanted to throw it out there, given that
the class is final (which is without any doubt the correct decision).
Do you mean ArrayDeque for a hardcoded max capacity?
I'm also not convinced there's a use case.
- How do set() / offsetSet() work with an out of range index? Is it
possible to push an element with $deque[count($deque)] = x? I would assume
that throws, but want to make sure. On a related note, why do we need both
get()/set() and offsetGet()/offsetSet()? These APIs seem redundant.
It isn't possible to set an out of range index - both throws \OutOfBoundsException
In order to support the ArrayAccess $deque[$offset] = $value;
or $deque[$offset]
shorthand,
the offsetGet/offsetSet needed to be implemented to follow conventions.
(because of ArrayAccess, offsetGet must accept mixed offsets)
Those aren't type safe, though, so get()/set() are provided for a type safe alternative
that will throw a TypeError for use cases that would benefit from runtime type safety.
- I believe it's pretty standard for Deque implementations to provide
insert() / remove() methods to insert at any position (these would be O(n)).
https://www.php.net/manual/en/class.splqueue.php didn't offer that functionality.
https://www.php.net/manual/en/class.ds-deque.php did, apparently.
- The design of getIterator() here is rather unusual, and deserves some
additional justification. Normally, we let getIterator() see concurrent
modifications to the structure, though the precise behavior is unspecified.
I would also like to know how this will look like on a technical level (it
doesn't seem to be implemented yet?) This seems like something that will
require a check on every write operation, to potentially separate the
structure in some form.
The original plan was to copy the entire array on iterator creation,
to imitate the immediate copy nature of foreach on arrays.
This was assuming that foreach
over a Deque without removing elements would be a rare use case.
That may have been a mistake since foreach (clone $deque as $key => $value)
can be done to explicitly do that.
There're around 4 approaches I could take with different tradeoffs
- Iterate over $offset = 0 and increment offset in calls to InternalIterator->next() until exceeding the size of the deque, not copying the deque.
That's the actual current implementation, but would be unintuitive with shift()/unshift()
This would repeat elements on unshift(), or skip over elements when shift() is called.
The current implementation of Ds\Deque
seems to do the same thing, but there's a similar discussion in its issue tracker in https://github.com/php-ds/ext-ds/issues/17
- Similar iteration behavior, but also have it relative to a uint64 indicating the number of times elements were added to the front of the deque minus the number of elements were removed from the back of the deque.
(e.g. if the current Deque internalOffset is 123, the InternalIterator returned by getOffset would start with an offset of 123 and increase it when advancing.
Calls to shift would raise the Deque internalOffset, calls to unshift would decrease it (by the number of elements).
This would be independent of the circular buffer offset.
And the InternalIterator would increase the internalOffset to catch up if the signed offset difference became negative, e.g. by calling shift() twice in a foreach block)
- Behave as if the entire Deque was copied when iteration starts (either by actually copying everything or by copy on write).
That was the planned implementation documented in the RFC, but would be inefficient for iterations that end early and use a lot of memory for large Deques.
- Throw if attempting to change the size of the
Deque
while a foreach is in progress.
The semantics of that would be annoying, e.g. handling clear()
or shift()
, and the exception getting thrown would be unexpected.
- The shift/unshift terminology is already used by our array API, but I'm
not sure it's the most intuitive choice in the context of a deque. SplQueue
has enqueue() and dequeue(). Another popular option from other languages
(which I would personally favor) is pushFront() and popFront().
My original name idea was pushBack/popBack/pushFront/popFront but I had decided against proposing brand new terms
for functionality that already had terms in php-src.
It would also be inconsistent with proposed names for Vector
if that was also added.
https://www.php.net/manual/en/class.ds-deque.php and SplDoublyLinkedList had done the same thing
enqueue and dequeue don't have a good term for enqueueing on the opposite side.
- It would make it easier to learn the language to have fewer terms and migrate code using SplDoublyLinkedList or arrays
- push()/shift() are also shorter names than pushBack/popFront()
Thanks,
Tyson
From: Nikita Popov nikita.ppv@gmail.com
Sent: October 4, 2021 8:04 AM
To: Levi Morrison levi.morrison@datadoghq.com
Cc: PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] Adding final class Deque
to PHP
On Tue, Sep 21, 2021 at 11:05 PM Levi Morrison via internals <
internals@lists.php.net> wrote:
The name "deque" is used in the standard library of these languages:
- C++: std::deque
- Java: java.util.Deque (interface)
- Python: collections.deque
- Swift: Collections.Deque (not standard lib, apparently, but Apple
official? Don't know Swift)
- Rust: std::collections::VecDequeAnd these don't have it in the standard library:
- Go
- C#
- C
- JavaScriptAnyway, it's pretty decent evidence that:
1. This functionality is pretty widely used across languages.
2. This functionality should have "deque" be in the name, or be the
complete name.Discussion naming for "vector" I can understand, as it's less widely
used or sometimes means something specific to numbers, but there's
little point in discussing the name "deque." It's well established in
programming, whether PHP programmers are aware of it or not.As I see it, the discussion should be centered around:
1. The API Deque provides.
2. The package that provides it.
3. Whether Deque's API is consistent with other structures in the same
package.
4. Whether this should be included in php-src or left to extensions.I suggest that we try to make PHP as homogenous in each bullet as we
can with other languages:
1. Name it "Deque."
2. Put it in the namespace "Collections" (and if included in core,
then "ext/collections").
3. Support common operations on Deque (pushing and popping items to
both front and back, subscript operator works, iteration, size, etc)
and add little else (don't be novel here). To me, the biggest thing
left to discuss is a notion of a maximum size, which in my own
experience is common for circular buffers (the implementation chosen
for Deque) but not all languages have this.
4. This is less clear, but I'm in favor as long as we can provide a
few other data structures at the same time. Obviously, this a voter
judgement call on the yes/no.Given that I've suggested the most common options for these across
many languages, it shouldn't be very controversial. The worst bit
seems like picking the namespace "Collections" as this will break at
least one package: https://github.com/danielgsims/php-collections. We
should suggest that they vendor it anyway, as "collections" is common
e.g. "Ramsey\Collections", "Doctrine\Common\Collections". I don't see
a good alternative here to "Collections", as we haven't agreed on very
much on past namespace proposals.Anyway, that's what I suggest.
I generally agree with everything Levi has said here. I think that adding a
deque structure generally makes sense, and that putting it into a new
ext/collections extension under the corresponding namespace would be
appropriate. I wouldn't insist on introducing it together with other
structures (I'm a lot more sceptical about your Vector proposal), but do
think there should be at least some overarching plan here, even if we only
realize a small part of it in this version.
A few questions for the particular API proposed here:
-
There would be the possibility of having an interface Deque that is
backed by a VecDeque/ArrayDeque implementation. I'm not convinced this
would actually make sense, but I wanted to throw it out there, given that
the class is final (which is without any doubt the correct decision). -
How do set() / offsetSet() work with an out of range index? Is it
possible to push an element with $deque[count($deque)] = x? I would assume
that throws, but want to make sure. On a related note, why do we need both
get()/set() and offsetGet()/offsetSet()? These APIs seem redundant. -
I believe it's pretty standard for Deque implementations to provide
insert() / remove() methods to insert at any position (these would be O(n)). -
The design of getIterator() here is rather unusual, and deserves some
additional justification. Normally, we let getIterator() see concurrent
modifications to the structure, though the precise behavior is unspecified.
I would also like to know how this will look like on a technical level (it
doesn't seem to be implemented yet?) This seems like something that will
require a check on every write operation, to potentially separate the
structure in some form. -
The shift/unshift terminology is already used by our array API, but I'm
not sure it's the most intuitive choice in the context of a deque. SplQueue
has enqueue() and dequeue(). Another popular option from other languages
(which I would personally favor) is pushFront() and popFront().
Regards,
Nikita
On Tue, Oct 5, 2021 at 12:47 AM tyson andre tysonandre775@hotmail.com
wrote:
Hi Nikita Popov,
- There would be the possibility of having an interface Deque that is
backed by a VecDeque/ArrayDeque implementation. I'm not convinced this
would actually make sense, but I wanted to throw it out there, given that
the class is final (which is without any doubt the correct decision).Do you mean ArrayDeque for a hardcoded max capacity?
I'm also not convinced there's a use case.
- How do set() / offsetSet() work with an out of range index? Is it
possible to push an element with $deque[count($deque)] = x? I would
assume
that throws, but want to make sure. On a related note, why do we need
both
get()/set() and offsetGet()/offsetSet()? These APIs seem redundant.It isn't possible to set an out of range index - both throws
\OutOfBoundsException
In order to support the ArrayAccess
$deque[$offset] = $value;
or
$deque[$offset]
shorthand,
the offsetGet/offsetSet needed to be implemented to follow conventions.
(because of ArrayAccess, offsetGet must accept mixed offsets)Those aren't type safe, though, so get()/set() are provided for a type
safe alternative
that will throw a TypeError for use cases that would benefit from runtime
type safety.
offsetGet() and offsetSet() can (and should) still throw if the offset is
not an integer. We just can't actually declare that type. This is a
weakness of the PHP type system, so nominally "violating" LSP is fine here.
- I believe it's pretty standard for Deque implementations to provide
insert() / remove() methods to insert at any position (these would be
O(n)).https://www.php.net/manual/en/class.splqueue.php didn't offer that
functionality.https://www.php.net/manual/en/class.ds-deque.php did, apparently.
- The design of getIterator() here is rather unusual, and deserves some
additional justification. Normally, we let getIterator() see concurrent
modifications to the structure, though the precise behavior is
unspecified.
I would also like to know how this will look like on a technical level
(it
doesn't seem to be implemented yet?) This seems like something that will
require a check on every write operation, to potentially separate the
structure in some form.The original plan was to copy the entire array on iterator creation,
to imitate the immediate copy nature of foreach on arrays.
There is no immediate copy with arrays though. The copy is just a refcount
increment and no actual copy will happen unless the array is modified
during iteration, which is a rare case. I'd expect the Deque implementation
(with those semantics) to do the same -- but we don't have a convenient
mechanism for it. Adding an indirection and a refcount to the underlying
storage would be possible, but also feels like overkill. The other approach
I see would be to track all the registered iterators, so we can explicitly
separate them on write.
This was assuming that
foreach
over a Deque without removing elements
would be a rare use case.That may have been a mistake since
foreach (clone $deque as $key => $value)
can be done to explicitly do that.
There're around 4 approaches I could take with different tradeoffs
Iterate over $offset = 0 and increment offset in calls to
InternalIterator->next() until exceeding the size of the deque, not copying
the deque.That's the actual current implementation, but would be unintuitive
with shift()/unshift()This would repeat elements on unshift(), or skip over elements when
shift() is called.The current implementation of
Ds\Deque
seems to do the same thing,
but there's a similar discussion in its issue tracker in
https://github.com/php-ds/ext-ds/issues/17Similar iteration behavior, but also have it relative to a uint64
indicating the number of times elements were added to the front of the
deque minus the number of elements were removed from the back of the deque.(e.g. if the current Deque internalOffset is 123, the InternalIterator
returned by getOffset would start with an offset of 123 and increase it
when advancing.
Calls to shift would raise the Deque internalOffset, calls to unshift
would decrease it (by the number of elements).
This would be independent of the circular buffer offset.
And the InternalIterator would increase the internalOffset to catch up
if the signed offset difference became negative, e.g. by calling shift()
twice in a foreach block)
Something to keep in mind here is that there might be multiple iterators at
the same time, so I don't think the suggestion from the last sentence would
work. Though I think the general idea does work, as long as we're working
with a delta between the offset in the iterator and the deque.
Behave as if the entire Deque was copied when iteration starts (either
by actually copying everything or by copy on write).That was the planned implementation documented in the RFC, but
would be inefficient for iterations that end early and use a lot of memory
for large Deques.Throw if attempting to change the size of the
Deque
while a foreach
is in progress.The semantics of that would be annoying, e.g. handling
clear()
or
shift()
, and the exception getting thrown would be unexpected.
- The shift/unshift terminology is already used by our array API, but
I'm
not sure it's the most intuitive choice in the context of a deque.
SplQueue
has enqueue() and dequeue(). Another popular option from other languages
(which I would personally favor) is pushFront() and popFront().My original name idea was pushBack/popBack/pushFront/popFront but I had
decided against proposing brand new terms
for functionality that already had terms in php-src.
It would also be inconsistent with proposed names forVector
if that was
also added.https://www.php.net/manual/en/class.ds-deque.php and SplDoublyLinkedList
had done the same thingenqueue and dequeue don't have a good term for enqueueing on the opposite
side.
- It would make it easier to learn the language to have fewer terms and
migrate code using SplDoublyLinkedList or arrays- push()/shift() are also shorter names than pushBack/popFront()
Thanks,
TysonFrom: Nikita Popov nikita.ppv@gmail.com
Sent: October 4, 2021 8:04 AM
To: Levi Morrison levi.morrison@datadoghq.com
Cc: PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] Addingfinal class Deque
to PHPOn Tue, Sep 21, 2021 at 11:05 PM Levi Morrison via internals <
internals@lists.php.net> wrote:The name "deque" is used in the standard library of these languages:
- C++: std::deque
- Java: java.util.Deque (interface)
- Python: collections.deque
- Swift: Collections.Deque (not standard lib, apparently, but Apple
official? Don't know Swift)- Rust: std::collections::VecDeque
And these don't have it in the standard library:
- Go
- C#
- C
- JavaScript
Anyway, it's pretty decent evidence that:
- This functionality is pretty widely used across languages.
- This functionality should have "deque" be in the name, or be the
complete name.Discussion naming for "vector" I can understand, as it's less widely
used or sometimes means something specific to numbers, but there's
little point in discussing the name "deque." It's well established in
programming, whether PHP programmers are aware of it or not.As I see it, the discussion should be centered around:
- The API Deque provides.
- The package that provides it.
- Whether Deque's API is consistent with other structures in the same
package.- Whether this should be included in php-src or left to extensions.
I suggest that we try to make PHP as homogenous in each bullet as we
can with other languages:
- Name it "Deque."
- Put it in the namespace "Collections" (and if included in core,
then "ext/collections").- Support common operations on Deque (pushing and popping items to
both front and back, subscript operator works, iteration, size, etc)
and add little else (don't be novel here). To me, the biggest thing
left to discuss is a notion of a maximum size, which in my own
experience is common for circular buffers (the implementation chosen
for Deque) but not all languages have this.- This is less clear, but I'm in favor as long as we can provide a
few other data structures at the same time. Obviously, this a voter
judgement call on the yes/no.Given that I've suggested the most common options for these across
many languages, it shouldn't be very controversial. The worst bit
seems like picking the namespace "Collections" as this will break at
least one package: https://github.com/danielgsims/php-collections. We
should suggest that they vendor it anyway, as "collections" is common
e.g. "Ramsey\Collections", "Doctrine\Common\Collections". I don't see
a good alternative here to "Collections", as we haven't agreed on very
much on past namespace proposals.Anyway, that's what I suggest.
I generally agree with everything Levi has said here. I think that adding a
deque structure generally makes sense, and that putting it into a new
ext/collections extension under the corresponding namespace would be
appropriate. I wouldn't insist on introducing it together with other
structures (I'm a lot more sceptical about your Vector proposal), but do
think there should be at least some overarching plan here, even if we only
realize a small part of it in this version.A few questions for the particular API proposed here:
There would be the possibility of having an interface Deque that is
backed by a VecDeque/ArrayDeque implementation. I'm not convinced this
would actually make sense, but I wanted to throw it out there, given that
the class is final (which is without any doubt the correct decision).How do set() / offsetSet() work with an out of range index? Is it
possible to push an element with $deque[count($deque)] = x? I would assume
that throws, but want to make sure. On a related note, why do we need both
get()/set() and offsetGet()/offsetSet()? These APIs seem redundant.I believe it's pretty standard for Deque implementations to provide
insert() / remove() methods to insert at any position (these would be
O(n)).The design of getIterator() here is rather unusual, and deserves some
additional justification. Normally, we let getIterator() see concurrent
modifications to the structure, though the precise behavior is unspecified.
I would also like to know how this will look like on a technical level (it
doesn't seem to be implemented yet?) This seems like something that will
require a check on every write operation, to potentially separate the
structure in some form.The shift/unshift terminology is already used by our array API, but I'm
not sure it's the most intuitive choice in the context of a deque. SplQueue
has enqueue() and dequeue(). Another popular option from other languages
(which I would personally favor) is pushFront() and popFront().Regards,
NikitaTo unsubscribe, visit: https://www.php.net/unsub.php
The name "deque" is used in the standard library of these languages:
- C++: std::deque
- Java: java.util.Deque (interface)
- Python: collections.deque
- Swift: Collections.Deque (not standard lib, apparently, but Apple
official? Don't know Swift)
- Rust: std::collections::VecDequeAnd these don't have it in the standard library:
- Go
- C#
- C
- JavaScriptAnyway, it's pretty decent evidence that:
1. This functionality is pretty widely used across languages.
2. This functionality should have "deque" be in the name, or be the
complete name.Discussion naming for "vector" I can understand, as it's less widely
used or sometimes means something specific to numbers, but there's
little point in discussing the name "deque." It's well established in
programming, whether PHP programmers are aware of it or not.As I see it, the discussion should be centered around:
1. The API Deque provides.
2. The package that provides it.
3. Whether Deque's API is consistent with other structures in the same
package.
4. Whether this should be included in php-src or left to extensions.I suggest that we try to make PHP as homogenous in each bullet as we
can with other languages:
1. Name it "Deque."
2. Put it in the namespace "Collections" (and if included in core,
then "ext/collections").
3. Support common operations on Deque (pushing and popping items to
both front and back, subscript operator works, iteration, size, etc)
and add little else (don't be novel here). To me, the biggest thing
left to discuss is a notion of a maximum size, which in my own
experience is common for circular buffers (the implementation chosen
for Deque) but not all languages have this.
4. This is less clear, but I'm in favor as long as we can provide a
few other data structures at the same time. Obviously, this a voter
judgement call on the yes/no.Given that I've suggested the most common options for these across
many languages, it shouldn't be very controversial. The worst bit
seems like picking the namespace "Collections" as this will break at
least one package: https://github.com/danielgsims/php-collections. We
should suggest that they vendor it anyway, as "collections" is common
e.g. "Ramsey\Collections", "Doctrine\Common\Collections". I don't see
a good alternative here to "Collections", as we haven't agreed on very
much on past namespace proposals.Anyway, that's what I suggest.
I generally agree with everything Levi has said here. I think that adding a
deque structure generally makes sense, and that putting it into a new
ext/collections extension under the corresponding namespace would be
appropriate. I wouldn't insist on introducing it together with other
structures (I'm a lot more sceptical about your Vector proposal), but do
think there should be at least some overarching plan here, even if we only
realize a small part of it in this version.
https://wiki.php.net/rfc/deque has been updated after https://wiki.php.net/rfc/deque_straw_poll.
It's now going in the namespace Collections\
, and a new always-enabled extension collections
is added in php-src/ext/collections
in that RFC (for end users, that would mainly affect php.net manual organization).
I plan to start voting in a few days.
Also, iteration behavior has changed to https://wiki.php.net/rfc/deque#iteration_behavior
I also set up a WebAssembly demo to make it easier to look up how it currently behaves:
https://tysonandre.github.io/php-rfc-demo/deque/
A few questions for the particular API proposed here:
- There would be the possibility of having an interface Deque that is
backed by a VecDeque/ArrayDeque implementation. I'm not convinced this
would actually make sense, but I wanted to throw it out there, given that
the class is final (which is without any doubt the correct decision).
Would ArrayDeque
be referring to something with a hardcoded maximum size?
I can't think of a strong use case for that, and it's possible in userland by
wrapping Collections\Deque
(with composition) and checking size on add operations.
- How do set() / offsetSet() work with an out of range index? Is it
possible to push an element with $deque[count($deque)] = x? I would assume
that throws, but want to make sure. On a related note, why do we need both
get()/set() and offsetGet()/offsetSet()? These APIs seem redundant.
They throw for out of range indexes. Offsets between 0 and count-1 are in range.
offsetSet/offsetGet attempt to coerce values to integers to act as a drop-in replacement
for arrays and Spl datastructures (e.g. for applications that deal with string inputs).
They throw TypeError if coercion fails, OutOfBoundsException for invalid offsets.
- I believe it's pretty standard for Deque implementations to provide
insert() / remove() methods to insert at any position (these would be O(n)).
I agree it's pretty standard and something I'd want,
I was concerned about increasing the scope too much (before seeing if new collections would succeed)
and making this harder to review (and with more methods added to the design,
more possible objections to API decisions)
- The design of getIterator() here is rather unusual, and deserves some
additional justification. Normally, we let getIterator() see concurrent
modifications to the structure, though the precise behavior is unspecified.
I would also like to know how this will look like on a technical level (it
doesn't seem to be implemented yet?) This seems like something that will
require a check on every write operation, to potentially separate the
structure in some form.
You have a good point - the proposed and implemented behavior have been updated.
It's now iterating over the original Deque, not an eager copy. https://wiki.php.net/rfc/deque#iteration_behavior
The Deque now tracks how the position of the front is moved by shift()/unshift() operations.
Iterators have their own positions that are unconditionally advanced by 1 in $iterator->next()
and the delta is
the iterator offset ($iterator->key()
).
The implemented behavior can be seen at https://tysonandre.github.io/php-rfc-demo/deque/
- The shift/unshift terminology is already used by our array API, but I'm
not sure it's the most intuitive choice in the context of a deque. SplQueue
has enqueue() and dequeue(). Another popular option from other languages
(which I would personally favor) is pushFront() and popFront().
enqueue() is the same as push(), dequeue is the same as shift().
It's still using the inherited unshift for adding to the front of the deque.
In isolation, I'd prefer pushFront/popFront as names, but introducing more than one name for common operations
seemed like it wouldn't be worth the inconvenience for the potential of introducing bugs in userland code when switching between collection objects,
making the standard library harder to learn/remember, etc.
Thanks,
Tyson
I think this RFC is in much better shape now.
The last thing I'll personally push for is dropping get
and set
.
I'm not sure about those names, and the functionality is already
provided by offsetGet
and offsetSet
, albeit through mixed
instead of int
, but I think this sort of cleanup should be done en
masse at some point, rather than having this one type doing something
different from the others.
Hi Levi Morrison,
I think this RFC is in much better shape now.
The last thing I'll personally push for is dropping
get
andset
.
I'm not sure about those names, and the functionality is already
provided byoffsetGet
andoffsetSet
, albeit throughmixed
instead ofint
, but I think this sort of cleanup should be done en
masse at some point, rather than having this one type doing something
different from the others.
The get/set methods have been removed. I've updated the RFC and implementation.
Thanks,
Tyson
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/deque to add a
final class Deque
This is based on the
Teds\Deque
implementation I've worked on
for the https://github.com/TysonAndre/pecl-teds PECL.While
SplDoublyLinkedList
and its subclassSplQueue
/SplStack
exist in the SPL, they have several drawbacks
that are addressed by this RFC to add aDeque
class (to use instead of those):
SplDoublyLinkedList
is internally represented by a doubly linked list,
making it use roughly twice as much memory as the proposedDeque
push
/pop
/unshift
/shift
fromSplDoublyLinkedList
are slower due to
needing to allocate or free the linked list nodes.- Reading values in the middle of the
SplDoublyLinkedList
is proportional to the length of the list,
due to needing to traverse the linked list nodes.foreach
Iteration behavior cannot be understood without knowing what constructed the
SplDoublyLinkedList
instance or set the flags.It would be useful to have an efficient
Deque
container in the standard library
to provide an alternative without those drawbacks,
as well as for the following reasons:
- To save memory in applications or libraries that may need to store many lists of values or run for long periods of time.
Notably, PHP'sarray
type will never release allocated capacity.
See https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html- To provide a better alternative to
SplDoublyLinkedList
,SplStack
, andSplQueue
for use cases that require stacks or queues.- As a more efficient option than
array
andSplDoublyLinkedList
as a queue orDeque
, especially forunshift
.A
Deque
is more efficient than anarray
when used as a queue, more readable, and easier to use correctly.
While it is possible to efficiently remove elements from the start of anarray
(in terms of insertion order) (though this makesreset()
/array_key_first() inefficient),
it is very inefficient to prepend elements to the start of a largearray
due to needing to either copy the array
or move all elements in the internal array representation,
and anarray
would use much more memory than aDeque
when used that way (and be slower).There are also several pitfalls to using an array as a queue for larger queue sizes,
some of which are not obvious and discovered while writing the benchmarks.
(Having a better (double-ended) queue datastructure (Deque
) than theSplDoublyLinkedList
would save users from needing to write code with these pitfalls):
array_key_first()
andreset()``takes time proportional to the number of elements
unset` from the start of an array,
causing it to unexpectedly be extremely slow (quadratic time) after unsetting many elements at the start of the queue.
(when the array infrequently runs out of capacity, buckets are moved to the front)reset()
orend()
will convert a variable to a reference,
and php is less efficient at reading or writing to reference.
Opcache is also less efficient at optimizing uses of variables using references.- More obviously,
array_unshift
andarray_shift
will take time proportional to the number of elements in the array
(to reindex and move existing/remaining elements).
I plan to start voting on https://wiki.php.net/rfc/deque on Friday, February 4th.
Several changes have been made to https://wiki.php.net/rfc/deque#changelog
after the feedback in https://externals.io/message/116100
- The class is now named
Collections\Deque
- The api documentation in https://wiki.php.net/rfc/deque#proposal was expanded for methods.
- Benchmarks were updated.
- Like other standard datastructures, iteration over the deque is now over the original object (instead of creating a copy),
and mutating the deque will be reflected in$iterator->current()
(and moving the end with push()/pop() will affect where iteration ends). - Iteration will account for calls to shift/unshift moving the start of the deque.
the offsets will be corrected and values won't be skipped or iterated over multiple times.
(no matter how many iterators were created byDeque->getIterator()
)
See https://wiki.php.net/rfc/deque#iteration_behavior - The get()/set() methods were removed, after feedback in https://externals.io/message/116100#116214
A WebAssembly demo is available at https://tysonandre.github.io/php-rfc-demo/deque/
Thanks,
Tyson
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/deque to add a
final class Deque
This is based on the
Teds\Deque
implementation I've worked on
for the https://github.com/TysonAndre/pecl-teds PECL.While
SplDoublyLinkedList
and its subclassSplQueue
/SplStack
exist in the SPL, they have several drawbacks
that are addressed by this RFC to add aDeque
class (to use instead of those):
SplDoublyLinkedList
is internally represented by a doubly linked list,
making it use roughly twice as much memory as the proposedDeque
push
/pop
/unshift
/shift
fromSplDoublyLinkedList
are slower due to
needing to allocate or free the linked list nodes.- Reading values in the middle of the
SplDoublyLinkedList
is proportional to the length of the list,
due to needing to traverse the linked list nodes.foreach
Iteration behavior cannot be understood without knowing what constructed the
SplDoublyLinkedList
instance or set the flags.It would be useful to have an efficient
Deque
container in the standard library
to provide an alternative without those drawbacks,
as well as for the following reasons:
- To save memory in applications or libraries that may need to store many lists of values or run for long periods of time.
Notably, PHP'sarray
type will never release allocated capacity.
See https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html- To provide a better alternative to
SplDoublyLinkedList
,SplStack
, andSplQueue
for use cases that require stacks or queues.- As a more efficient option than
array
andSplDoublyLinkedList
as a queue orDeque
, especially forunshift
.A
Deque
is more efficient than anarray
when used as a queue, more readable, and easier to use correctly.
While it is possible to efficiently remove elements from the start of anarray
(in terms of insertion order) (though this makesreset()
/array_key_first() inefficient),
it is very inefficient to prepend elements to the start of a largearray
due to needing to either copy the array
or move all elements in the internal array representation,
and anarray
would use much more memory than aDeque
when used that way (and be slower).There are also several pitfalls to using an array as a queue for larger queue sizes,
some of which are not obvious and discovered while writing the benchmarks.
(Having a better (double-ended) queue datastructure (Deque
) than theSplDoublyLinkedList
would save users from needing to write code with these pitfalls):
array_key_first()
andreset()``takes time proportional to the number of elements
unset` from the start of an array,
causing it to unexpectedly be extremely slow (quadratic time) after unsetting many elements at the start of the queue.
(when the array infrequently runs out of capacity, buckets are moved to the front)reset()
orend()
will convert a variable to a reference,
and php is less efficient at reading or writing to reference.
Opcache is also less efficient at optimizing uses of variables using references.- More obviously,
array_unshift
andarray_shift
will take time proportional to the number of elements in the array
(to reindex and move existing/remaining elements).I plan to start voting on https://wiki.php.net/rfc/deque on Friday, February 4th.
Several changes have been made to https://wiki.php.net/rfc/deque#changelog
after the feedback in https://externals.io/message/116100
- The class is now named
Collections\Deque
- The api documentation in https://wiki.php.net/rfc/deque#proposal was expanded for methods.
- Benchmarks were updated.
- Like other standard datastructures, iteration over the deque is now over the original object (instead of creating a copy),
and mutating the deque will be reflected in$iterator->current()
(and moving the end with push()/pop() will affect where iteration ends).- Iteration will account for calls to shift/unshift moving the start of the deque.
the offsets will be corrected and values won't be skipped or iterated over multiple times.
(no matter how many iterators were created byDeque->getIterator()
)
See https://wiki.php.net/rfc/deque#iteration_behavior- The get()/set() methods were removed, after feedback in https://externals.io/message/116100#116214
A WebAssembly demo is available at https://tysonandre.github.io/php-rfc-demo/deque/
Thanks,
Tyson--
To unsubscribe, visit: https://www.php.net/unsub.php
Hi Tyson,
As a userland dev & library author it’s nice to see some progression on basic data structures, so thank you for your efforts on this!
Two little things in the RFC:
The proposed API switches between terms front
, back
, start
and end
in comments - is there meant to be a conceptual difference between front/start and end/back ?
In the "Why use this instead of array?” Section, the 3rd point seems cut off:
Note that starting in php 8.2, array
Cheers
Stephen
The proposed API switches between terms
front
,back
,start
andend
in comments - is there meant to be a conceptual difference between front/start and end/back ?
On a similar note, why are the methods for getting the first/last value called
top()
/bottom()
? Off the top of my head, it is hard for me to imagine which side is
the top and which side is the bottom.
I would prefer if it was called something more intuitive, possibly first()
/last()
in
accordance with array_key_first()
/array_key_last()
.
Regards,
Mel
Hi Mel Dafert,
The proposed API switches between terms
front
,back
,start
andend
in comments - is there meant to be a conceptual difference between front/start and end/back ?On a similar note, why are the methods for getting the first/last value called
top()
/bottom()
? Off the top of my head, it is hard for me to imagine which side is
the top and which side is the bottom.
I would prefer if it was called something more intuitive, possiblyfirst()
/last()
in
accordance witharray_key_first()
/array_key_last()
.
I've changed it to first/last in https://wiki.php.net/rfc/deque
I'd forgotten about first/last when looking for existing names php used for methods/global functions.
The closest I'd seen was in SplDoublyLinkedList, I'd forgotten about https://php.net/array_key_last getting added in php 7.3
(due to it existing for keys but not values)
Thanks,
Tyson
Hi Stephen,
As a userland dev & library author it’s nice to see some progression on basic data structures, so thank you for your efforts on this!
Two little things in the RFC:
The proposed API switches between terms
front
,back
,start
andend
in comments - is there meant to be a conceptual difference between front/start and end/back ?
Good point. I've changed the method names to first()/last() and also made the wording in https://wiki.php.net/rfc/deque more consistently use first/last to avoid confusion.
No, They're the same. front=start=bottom=first. Bottom was from SplDoublyLinkedList/SplStack, e.g. the bottom of the stack, top is where push()
acts, etc.
Front was how I was referring to iteration order.
In the "Why use this instead of array?” Section, the 3rd point seems cut off:
Note that starting in php 8.2, array
That should say "Note that starting in php 8.2, arrays that are lists (with no/few gaps) are represented in a more memory efficient way than associative arrays.".
I've updated the RFC.
Thanks,
Tyson
I plan to start voting on https://wiki.php.net/rfc/deque on Friday,
February 4th.Several changes have been made to https://wiki.php.net/rfc/deque#changelog
after the feedback in https://externals.io/message/116100
- The class is now named
Collections\Deque
- The api documentation in https://wiki.php.net/rfc/deque#proposal was
expanded for methods.- Benchmarks were updated.
- Like other standard datastructures, iteration over the deque is now
over the original object (instead of creating a copy),
and mutating the deque will be reflected in$iterator->current()
(and moving the end with push()/pop() will affect where iteration ends).- Iteration will account for calls to shift/unshift moving the start of
the deque.
the offsets will be corrected and values won't be skipped or iterated
over multiple times.
(no matter how many iterators were created byDeque->getIterator()
)
See https://wiki.php.net/rfc/deque#iteration_behavior- The get()/set() methods were removed, after feedback in
https://externals.io/message/116100#116214A WebAssembly demo is available at
https://tysonandre.github.io/php-rfc-demo/deque/
Request:
push() and unshift() currently return void. That's not helpful. It would be vastly more useful if they both returned $this. Not as much for chaining, but more so that you can add a value to a queue and pass it as an argument to another call (often recursive, but not necessarily) in a single operation.
Example: I was doing last year's Advent of Code in functional PHP, and had a stack walker that looked like this:
function parse(string $line, $pos = 0, array $stack = []): Result|string
{
$next = $line[$pos] ?? null;
$head = $stack[0] ?? null;
return match ($next) {
// Opening brace, push an "expected" onto the stack.
'{' => parse($line, $pos + 1, ['}', ...$stack]),
'<' => parse($line, $pos + 1, ['>', ...$stack]),
'(' => parse($line, $pos + 1, [')', ...$stack]),
'[' => parse($line, $pos + 1, [']', ...$stack]),
'}', '>', ')', ']' => $next === $head ? parse($line, $pos + 1, array_slice($stack, 1)) : $next,
null => count($stack) ? Result::Incomplete : Result::OK,
};
}
The interesting part is the ['<', ...$stack], to pass on a modified version of an array-as-stack. That's of course annoyingly slow with arrays right now, and a Deque would be better, but only if it could be "modified and passed" like that. If not, it would be incompatible with single-expression usages (match statements, short lambdas, etc.)
Returning $this would resolve that, I think. (Making it return a new, immutable copy of the Deque would be even nicer, but I realize that's probably not an argument I'm going to win at this point on this RFC.)
Also, typo:
"By introducing a data structure (Deque) that's even faster and more memory usage than an array for use as a double-ended queue, even more applications would benefit from it. "
I think you mean "less memory usage", or possibly "more memory efficient", or something like that.
--Larry Garfield
Hi Larry Garfield,
Request:
push() and unshift() currently return void. That's not helpful. It would be vastly more useful if they both returned $this. Not as much for chaining, but more so that you can add a value to a queue and pass it as an argument to another call (often recursive, but not necessarily) in a single operation.
Example: I was doing last year's Advent of Code in functional PHP, and had a stack walker that looked like this:
function parse(string $line, $pos = 0, array $stack = []): Result|string
{
$next = $line[$pos] ?? null;
$head = $stack[0] ?? null;return match ($next) { // Opening brace, push an "expected" onto the stack. '{' => parse($line, $pos + 1, ['}', ...$stack]), '<' => parse($line, $pos + 1, ['>', ...$stack]), '(' => parse($line, $pos + 1, [')', ...$stack]), '[' => parse($line, $pos + 1, [']', ...$stack]), '}', '>', ')', ']' => $next === $head ? parse($line, $pos + 1, array_slice($stack, 1)) : $next, null => count($stack) ? Result::Incomplete : Result::OK, };
}
The interesting part is the ['<', ...$stack], to pass on a modified version of an array-as-stack. That's of course annoyingly slow with arrays right now, and a Deque would be better, but only if it could be "modified and passed" like that. If not, it would be incompatible with single-expression usages (match statements, short lambdas, etc.)
Returning $this would resolve that, I think. (Making it return a new, immutable copy of the Deque would be even nicer, but I realize that's probably not an argument I'm going to win at this point on this RFC.)
Technically, you still can have single-expression usages in readable/unreadable ways
-
[$deque->shift('value'), $deque][1]
, or -
($deque->shift('value') ?: $deque)
, or -
my_userland_helper_shift_and_return($deque, 'value')
My personal preference is against making this fluent.
I'd rather expose an efficient datastructure that's consistent with the rest of PHP's functionality to the extent possible,
which userland can use to write their own fluent/non-fluent classes.
There's drawbacks to returning $this
, including:
-
Inconsistency with existing APIs making remembering what does what harder. Barely anything in php that I remember returns $this.
https://www.php.net/manual/en/arrayobject.append.php returns void.
https://www.php.net/manual/en/function.array-push returns an int.
-
Inconsistency with possible new datastructures/methods
If a
Map
/Set
function were to be added, then methods for add/remove would return booleans (or the old value), not $this -
Slight additional performance overhead for functionality I assume will be used relatively infrequently
(php has to increment reference counts and opcache can't eliminate the opcode to decrease reference counts and possibly free the return value of
$deque->shift()
with the return type info being an object) -
Returning $this makes code easier to write at some cost to readability - Developers new to php or using
Collections\Deque
for the first time would not immediately know what the code they're reading is doing.
(less of a problem with a good IDE, typechecker, and a typed codebase, but this isn't universal)
Having it return void,return $deque->push()
would be less common and this would force the meaning to be clear.
Developers might have several guesses/assumptions based on their experience with other methods in php/elsewhere
- It returns the new count (JavaScript Array.push, array_push)
- It returns $this (Ruby)
- It returns a lazy copy, like you'd wanted, not modifying the original
- It's returning void and the code in question is shorthand for
return null
.
(Python, C++ https://www.cplusplus.com/reference/vector/vector/push_back/ , offsetSet and spl push()/shift() methods)
Also, typo:
"By introducing a data structure (Deque) that's even faster and more memory usage than an array for use as a double-ended queue, even more applications would benefit from it. "
I think you mean "less memory usage", or possibly "more memory efficient", or something like that.
Thanks, I've fixed that.
Thanks,
Tyson
Hi Larry Garfield,
Returning $this would resolve that, I think. (Making it return a new, immutable copy of the Deque would be even nicer, but I realize that's probably not an argument I'm going to win at this point on this RFC.)
Technically, you still can have single-expression usages in
readable/unreadable ways
[$deque->shift('value'), $deque][1]
, or($deque->shift('value') ?: $deque)
, ormy_userland_helper_shift_and_return($deque, 'value')
... Those are all grotesque.
My personal preference is against making this fluent.
I'd rather expose an efficient datastructure that's consistent with the
rest of PHP's functionality to the extent possible,
which userland can use to write their own fluent/non-fluent classes.
There's drawbacks to returning$this
, including:
Inconsistency with existing APIs making remembering what does what
harder. Barely anything in php that I remember returns $this.https://www.php.net/manual/en/arrayobject.append.php returns void.
https://www.php.net/manual/en/function.array-push returns an int.
So it's already inconsistent, and frankly neither of those returns is useful. If the existing convention is "inconsistently unhelpful", then let's go ahead and make it helpful since "it's lame but at least it's consistent" is not an argument here. (Sometimes that is a valid argument, but not in this case.)
Inconsistency with possible new datastructures/methods
If a
Map
/Set
function were to be added, then methods for
add/remove would return booleans (or the old value), not $this
Removal methods need to return the value being removed, sure. That makes sense across the board. But I don't see why Map::add($k, $v) or Set::add($v) shouldn't also return $this. I would make the exact same ask there, for the exact same reason: To aid functional code, which (when the syntax is conducive to it) can be more readable in many cases than procedural approaches.
- Slight additional performance overhead for functionality I assume
will be used relatively infrequently
I would assume the opposite, at least in my own code. I'm writing a lot of recursive or reduction-based routines these days, so if I had reason to use a queue/stack in them in the first place, they'd almost certainly get used in this fashion.
(php has to increment reference counts and opcache can't eliminate
the opcode to decrease reference counts and possibly free the return
value of$deque->shift()
with the return type info being an object)
I cannot speak to this one; how much of a difference is it in practice?
- Returning $this makes code easier to write at some cost to
readability - Developers new to php or usingCollections\Deque
for
the first time would not immediately know what the code they're reading
is doing.
(less of a problem with a good IDE, typechecker, and a typed
codebase, but this isn't universal)
Having it return void,return $deque->push()
would be less common
and this would force the meaning to be clear.
I disagree that it's less readable. Compare it with the alternatives you showed at the start of your reply. None of those would qualify as "readable", IMO.
Developers might have several guesses/assumptions based on their
experience with other methods in php/elsewhere
- It returns the new count (JavaScript Array.push, array_push)
- It returns $this (Ruby)
- It returns a lazy copy, like you'd wanted, not modifying the
original- It's returning void and the code in question is shorthand for
return null
.
(Python, C++
https://www.cplusplus.com/reference/vector/vector/push_back/ ,
offsetSet and spl push()/shift() methods)
Once again, the lack of a consistent pattern means we don't need to follow the pattern. As with the first point, "it's silly but at least it's consistent" is a valid argument in many cases, and we've changed the language accordingly before. (Ternary associativity order comes to mind.) That's not the case here. There is no broad-based ingrained training that people have that would lead them to expect a given return, so we can and should go for whichever one is most powerful/flexible/conducive to good code. IMO, that's returning $this. (Again, since a purely functional version is apparently off the table.)
--Larry Garfield
Hi Larry Garfield,
Returning $this would resolve that, I think. (Making it return a new, immutable copy of the Deque would be even nicer, but I realize that's probably not an argument I'm going to win at this point on this RFC.)
Technically, you still can have single-expression usages in
readable/unreadable ways
[$deque->shift('value'), $deque][1]
, or($deque->shift('value') ?: $deque)
, ormy_userland_helper_shift_and_return($deque, 'value')
... Those are all grotesque.
True, I wasn't familiar with that event and whether your goal was to write as few lines as possible or as quickly as possible vs in a readable way to people familiar with php.
That clears that up.
My personal preference is against making this fluent.
I'd rather expose an efficient datastructure that's consistent with the
rest of PHP's functionality to the extent possible,
which userland can use to write their own fluent/non-fluent classes.
There's drawbacks to returning$this
, including:
- Inconsistency with existing APIs making remembering what does what
harder. Barely anything in php that I remember returns $this.https://www.php.net/manual/en/arrayobject.append.php returns void.
https://www.php.net/manual/en/function.array-push returns an int.
So it's already inconsistent, and frankly neither of those returns is useful. If the existing convention is "inconsistently unhelpful", then let's go ahead and make it helpful since "it's lame but at least it's consistent" is not an argument here. (Sometimes that is a valid argument, but not in this case.)
I won't know if anyone is strongly against fluent interfaces belonging in core, or strongly against adopting fluent interfaces in an ad-hoc way that would make it unclear if the vote was for/against the overall functionality or the new design choice.
I'd rather not do this without a policy RFC such as the one that passed for namespaces https://wiki.php.net/rfc/namespaces_in_bundled_extensions
If you expect this to be widely considered a better design php should adopt, please create an RFC after the vote finishes (if it passes) before the feature freeze outlining which methods of Collection\Deque
would change and how.
- Inconsistency with possible new datastructures/methods
If a
Map
/Set
function were to be added, then methods for
add/remove would return booleans (or the old value), not $thisRemoval methods need to return the value being removed, sure. That makes sense across the board. But I don't see why Map::add($k, $v) or Set::add($v) shouldn't also return $this. I would make the exact same ask there, for the exact same reason: To aid functional code, which (when the syntax is conducive to it) can be more readable in many cases than procedural approaches.
- Slight additional performance overhead for functionality I assume
will be used relatively infrequentlyI would assume the opposite, at least in my own code. I'm writing a lot of recursive or reduction-based routines these days, so if I had reason to use a queue/stack in them in the first place, they'd almost certainly get used in this fashion.
(php has to increment reference counts and opcache can't eliminate
the opcode to decrease reference counts and possibly free the return
value of$deque->shift()
with the return type info being an object)I cannot speak to this one; how much of a difference is it in practice?
Adding RETURN_OBJ_COPY(Z_OBJ_P(ZEND_THIS));
and a different method with the : Deque
return type:
Around 6-8% for large arrays where push is the most frequent method call and the array reads are taken out since they'd be the same?
(and similar for shift, I'd expect)
(Intel CPU security mitigations/power management and so on have made benchmarking less convenient, I ran the functions twice to see if it was consistent)
I'd still object to this for other stated reasons even if you think 6-8% is worth it for your use cases for this or future additions to php in general.
Results for php 8.2.0-dev debug=false with opcache enabled=true
Appending to Collections\Deque []= : n= 4 iterations=10000000, create+destroy time=1.204 result=0
Appending to Collections\Deque push : n= 4 iterations=10000000, create+destroy time=1.551 result=0
Appending to Collections\Deque fluent: n= 4 iterations=10000000, create+destroy time=1.640 result=0
Appending to Collections\Deque []= : n= 4 iterations=10000000, create+destroy time=1.213 result=0
Appending to Collections\Deque push : n= 4 iterations=10000000, create+destroy time=1.549 result=0
Appending to Collections\Deque fluent: n= 4 iterations=10000000, create+destroy time=1.636 result=0
Appending to Collections\Deque []= : n= 8 iterations= 5000000, create+destroy time=0.923 result=0
Appending to Collections\Deque push : n= 8 iterations= 5000000, create+destroy time=1.263 result=0
Appending to Collections\Deque fluent: n= 8 iterations= 5000000, create+destroy time=1.335 result=0
Appending to Collections\Deque []= : n= 8 iterations= 5000000, create+destroy time=0.921 result=0
Appending to Collections\Deque push : n= 8 iterations= 5000000, create+destroy time=1.261 result=0
Appending to Collections\Deque fluent: n= 8 iterations= 5000000, create+destroy time=1.335 result=0
Appending to Collections\Deque []= : n= 1024 iterations= 100000, create+destroy time=1.246 result=0
Appending to Collections\Deque push : n= 1024 iterations= 100000, create+destroy time=1.972 result=0
Appending to Collections\Deque fluent: n= 1024 iterations= 100000, create+destroy time=2.150 result=0
Appending to Collections\Deque []= : n= 1024 iterations= 100000, create+destroy time=1.251 result=0
Appending to Collections\Deque push : n= 1024 iterations= 100000, create+destroy time=1.969 result=0
Appending to Collections\Deque fluent: n= 1024 iterations= 100000, create+destroy time=2.146 result=0
Appending to Collections\Deque []= : n= 1048576 iterations= 100, create+destroy time=2.450 result=0
Appending to Collections\Deque push : n= 1048576 iterations= 100, create+destroy time=3.172 result=0
Appending to Collections\Deque fluent: n= 1048576 iterations= 100, create+destroy time=3.371 result=0
Appending to Collections\Deque []= : n= 1048576 iterations= 100, create+destroy time=2.455 result=0
Appending to Collections\Deque push : n= 1048576 iterations= 100, create+destroy time=3.183 result=0
Appending to Collections\Deque fluent: n= 1048576 iterations= 100, create+destroy time=3.376 result=0
4. Returning $this makes code easier to write at some cost to
readability - Developers new to php or usingCollections\Deque
for
the first time would not immediately know what the code they're reading
is doing.
(less of a problem with a good IDE, typechecker, and a typed
codebase, but this isn't universal)
Having it return void,return $deque->push()
would be less common
and this would force the meaning to be clear.I disagree that it's less readable. Compare it with the alternatives you showed at the start of your reply. None of those would qualify as "readable", IMO.
True, I wasn't sure if your goal was readability (for php developers deeply familiar with libraries) or to write as few lines as possible.
Suggesting using function
and adding several lines seemed like something you were trying to avoid.
Developers might have several guesses/assumptions based on their
experience with other methods in php/elsewhere- It returns the new count (JavaScript Array.push, array_push)
- It returns $this (Ruby)
- It returns a lazy copy, like you'd wanted, not modifying the
original
- It's returning void and the code in question is shorthand for
return null
.
(Python, C++
https://www.cplusplus.com/reference/vector/vector/push_back/ ,
offsetSet and spl push()/shift() methods)Once again, the lack of a consistent pattern means we don't need to follow the pattern. As with the first point, "it's silly but at least it's consistent" is a valid argument in many cases, and we've changed the language accordingly before. (Ternary associativity order comes to mind.) That's not the case here. There is no broad-based ingrained training that people have that would lead them to expect a given return, so we can and should go for whichever one is most powerful/flexible/conducive to good code. IMO, that's returning $this.
I continue to disagree for my previously stated reasons.
(Again, since a purely functional version is apparently off the table.)
I'd be in favor of immutable datastructures in core, but I don't think I'd have much success in an RFC.
I've only heard from a few voters that are passionately for them, I'm not sure if that translates to widespread interest.
Though interestingly (and unrelated to this RFC https://wiki.php.net/rfc/deque ), they do seem possible to implement with better asymptotic performance than I'd have expected.
Google searching mentioned Relaxed Radix Binary Trees, an immutable datastructure surprisingly claiming performance near arrays in most operations.
https://hypirion.com/musings/thesis https://github.com/hyPiRion/c-rrb
Aside: An Immutable Map/Set would probably have a stronger argument than immutable sequeunces/deques/linked lists (arrays support copy on write already so the original array is immutable, but arbitrary keys are something arrays don't support),
but voters that didn't see a use case for immutables in core might be unconvinced.
Thanks,
Tyson
Hi Larry Garfield,
Returning $this would resolve that, I think. (Making it return a new, immutable copy of the Deque would be even nicer, but I realize that's probably not an argument I'm going to win at this point on this RFC.)
Technically, you still can have single-expression usages in
readable/unreadable ways
[$deque->shift('value'), $deque][1]
, or($deque->shift('value') ?: $deque)
, ormy_userland_helper_shift_and_return($deque, 'value')
... Those are all grotesque.
True, I wasn't familiar with that event and whether your goal was to
write as few lines as possible or as quickly as possible vs in a
readable way to people familiar with php.
That clears that up.
My goal with the AoC series was to 1. Demonstrate how functional-friendly PHP today is and 2. Identify places where it is still not fully FP-friendly so we can fix those.
Full writeup of my results here: https://peakd.com/hive-168588/@crell/aoc2021-review
(Assistance with any of those points raised is very welcome and requested.)
My personal preference is against making this fluent.
I'd rather expose an efficient datastructure that's consistent with the
rest of PHP's functionality to the extent possible,
which userland can use to write their own fluent/non-fluent classes.
There's drawbacks to returning$this
, including:
- Inconsistency with existing APIs making remembering what does what
harder. Barely anything in php that I remember returns $this.https://www.php.net/manual/en/arrayobject.append.php returns void.
https://www.php.net/manual/en/function.array-push returns an int.
So it's already inconsistent, and frankly neither of those returns is useful. If the existing convention is "inconsistently unhelpful", then let's go ahead and make it helpful since "it's lame but at least it's consistent" is not an argument here. (Sometimes that is a valid argument, but not in this case.)
I won't know if anyone is strongly against fluent interfaces belonging
in core, or strongly against adopting fluent interfaces in an ad-hoc
way that would make it unclear if the vote was for/against the overall
functionality or the new design choice.I'd rather not do this without a policy RFC such as the one that passed
for namespaces https://wiki.php.net/rfc/namespaces_in_bundled_extensions
I wouldn't call anything that returns $this "fluent". "Fluent" is specifically trying to make an API read like English, and most of the APIs I see that really lean into that heavily are multi-object and end up being godawful. This is simply "there's nothing else to return, so may as well return $this", which is extremely common. (Auto-generated Doctrine entities do that almost universally for setters.) In this case, doing so also has a specific additional benefit beyond "might be nice".
As far as data collection, welcome to life in RFCs. Forming consensus on something before the vote happens is basically impossible, so you have to design the best API based on limited feedback and then hope. It sucks, but that's neither here nor there.
At the moment I am leaning toward voting No if the return value remains void, because that cuts the usefulness of the object in half. (Though the performance note gives me pause; that's the only thing that does.)
If you expect this to be widely considered a better design php should
adopt, please create an RFC after the vote finishes (if it passes)
before the feature freeze outlining which methods ofCollection\Deque
would change and how.
Such an RFC would almost certainly fail without support from the original author (you).
Developers might have several guesses/assumptions based on their
experience with other methods in php/elsewhere- It returns the new count (JavaScript Array.push, array_push)
- It returns $this (Ruby)
- It returns a lazy copy, like you'd wanted, not modifying the
original
- It's returning void and the code in question is shorthand for
return null
.
(Python, C++
https://www.cplusplus.com/reference/vector/vector/push_back/ ,
offsetSet and spl push()/shift() methods)Once again, the lack of a consistent pattern means we don't need to follow the pattern. As with the first point, "it's silly but at least it's consistent" is a valid argument in many cases, and we've changed the language accordingly before. (Ternary associativity order comes to mind.) That's not the case here. There is no broad-based ingrained training that people have that would lead them to expect a given return, so we can and should go for whichever one is most powerful/flexible/conducive to good code. IMO, that's returning $this.
I continue to disagree for my previously stated reasons.
(Again, since a purely functional version is apparently off the table.)
I'd be in favor of immutable datastructures in core, but I don't think
I'd have much success in an RFC.
I've only heard from a few voters that are passionately for them, I'm
not sure if that translates to widespread interest.
"A language teaches you to not want what it doesn't provide." - Most features don't have widespread grassroots demand for them until they happen, and then they're super common. (Some do, but most don't.)
As above, we don't know if anything is going to have enough support to pass until the vote starts. This is a known design flaw in PHP's development process, independent of any particular RFC.
Though interestingly (and unrelated to this RFC
https://wiki.php.net/rfc/deque ), they do seem possible to implement
with better asymptotic performance than I'd have expected.
Google searching mentioned Relaxed Radix Binary Trees, an immutable
datastructure surprisingly claiming performance near arrays in most
operations.
https://hypirion.com/musings/thesis https://github.com/hyPiRion/c-rrb
Sounds like we should add them. :-)
--Larry Garfield
On Wed, Feb 2, 2022 at 6:59 AM tyson andre tysonandre775@hotmail.com
wrote:
- Returning $this makes code easier to write at some cost to readability
Developers new to php or using
Collections\Deque
for the first time
would not immediately know what the code they're reading is doing.
(less of a problem with a good IDE, typechecker, and a typed codebase,
but this isn't universal)
Having it return void,return $deque->push()
would be less common and
this would force the meaning to be clear.Developers might have several guesses/assumptions based on their
experience with other methods in php/elsewhere
- It returns the new count (JavaScript Array.push, array_push)
- It returns $this (Ruby)
- It returns a lazy copy, like you'd wanted, not modifying the original
- It's returning void and the code in question is shorthand for
return null
.
(Python, C++
https://www.cplusplus.com/reference/vector/vector/push_back/ , offsetSet
and spl push()/shift() methods)
I'm not sure that I buy this as a point even. Returning an immutable Deque
instance would be much more in line with modern PHP in general.
A major complaint about my operator overloads RFC was how it impacted
static analysis tools. I don't see how in one RFC we can say that creating
work for static analysis tools is a blocking problem, and in a different
RFC say that the ability to inspect the return values by the developer
can't even be assumed. If we design one feature around the idea that a
basic IDE may not even be used, but design a different feature around the
idea that we want to minimize the impact to a third party tool that
provides static analysis as part of workflow that's not even part of an
IDE... well that seems like a very inconsistent approach to me.
Either modern development tools are factored into the language design or
they are not. This seems like a "having your cake and eating it too"
situation.
Jordan
Hi Jordan,
4. Returning $this makes code easier to write at some cost to readability - Developers new to php or using
Collections\Deque
for the first time would not immediately know what the code they're reading is doing.
(less of a problem with a good IDE, typechecker, and a typed codebase, but this isn't universal)
Having it return void,return $deque->push()
would be less common and this would force the meaning to be clear.Developers might have several guesses/assumptions based on their experience with other methods in php/elsewhere
- It returns the new count (JavaScript Array.push, array_push)
- It returns $this (Ruby)
- It returns a lazy copy, like you'd wanted, not modifying the original
- It's returning void and the code in question is shorthand forreturn null
.
(Python, C++ https://www.cplusplus.com/reference/vector/vector/push_back/ , offsetSet and spl push()/shift() methods)I'm not sure that I buy this as a point even. Returning an immutable Deque instance would be much more in line with modern PHP in general.
A major complaint about my operator overloads RFC was how it impacted static analysis tools. I don't see how in one RFC we can say that creating work for static analysis tools is a blocking problem, and in a different RFC say that the ability to inspect the return values by the developer can't even be assumed. If we design one feature around the idea that a basic IDE may not even be used, but design a different feature around the idea that we want to minimize the impact to a third party tool that provides static analysis as part of workflow that's not even part of an IDE... well that seems like a very inconsistent approach to me.
There's a difference between
-
https://externals.io/message/116767#116768
One contributor to a static analyzer thinking it would not be worth the complexity it adds to support in static analysis
(and making type inference either vaguer or inaccurate for operands with a typemixed
more frequently)
Debuggability seemed to be their bigger objection.As a maintainer of a different static analysis tool, I'd disagree with that being a blocker, though. It'd be inconvenient to implement but would get implemented.
-
Aiming to "cater to the skill levels and platforms of a wide range of users", considering benefits/drawbacks to both groups of users (with/without static analyzers).
(https://wiki.php.net/rfc/template)
I don't think I'd have much success in an RFC adding immutable datastructures, though.
Though interestingly, they do seem possible to implement with better performance (https://en.wikipedia.org/wiki/Big_O_notation) than I'd have expected.
Google searching mentioned Relaxed Radix Binary Trees, an immutable datastructure surprisingly claiming performance near arrays in most operations.
https://hypirion.com/musings/thesis https://github.com/hyPiRion/c-rrb
Aside: An Immutable Map/Set would probably have a stronger argument than immutable sequeunces/deques/linked lists (arrays support copy on write already so the original array is immutable, but arbitrary keys are something arrays don't support),
but voters that didn't see a use case for immutables in core might be unconvinced.
Either modern development tools are factored into the language design or they are not. This seems like a "having your cake and eating it too" situation.
The language design is decided by what the set of voters that voted on a particular RFC would approve with a 2/3 majority.
Voters have a wide variety of backgrounds and projects/libraries that they work on, and RFC authors have to guess at what those backgrounds are and what functionality would be accepted, or why similar past RFCs were really objected.
The preferences, backgrounds, and paradigms preferred by voters vary and I likely won't know what those preferences are or how many voters there would be until a vote is started. (e.g. frequency of writing new code vs maintaining existing codebases vs working with third party libraries)
(e.g. I won't know if anyone is strongly against fluent interfaces, or specifically strongly against adopting fluent interfaces in an ad-hoc way, without a policy RFC such as the one that passed for namespaces
https://wiki.php.net/rfc/namespaces_in_bundled_extensions)
The fact that the closest look at the design/implementation for complex RFCs sometimes only happens shortly before/after the start of a vote is also inconvenient for authors, but hard to change with a volunteer-based process.
This has downsides for current or future RFC authors, but I don't have any productive ideas for how to change the process that wouldn't split development or be burdensome to voters. (and would actually be accepted)
Regards,
Tyson
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/deque to add a
final class Deque
This is based on the
Teds\Deque
implementation I've worked on
for the https://github.com/TysonAndre/pecl-teds PECL.While
SplDoublyLinkedList
and its subclassSplQueue
/SplStack
exist in the SPL, they have several drawbacks
that are addressed by this RFC to add aDeque
class (to use instead of those):
SplDoublyLinkedList
is internally represented by a doubly linked list,
making it use roughly twice as much memory as the proposedDeque
push
/pop
/unshift
/shift
fromSplDoublyLinkedList
are slower due to
needing to allocate or free the linked list nodes.- Reading values in the middle of the
SplDoublyLinkedList
is proportional to the length of the list,
due to needing to traverse the linked list nodes.foreach
Iteration behavior cannot be understood without knowing what constructed the
SplDoublyLinkedList
instance or set the flags.It would be useful to have an efficient
Deque
container in the standard library
to provide an alternative without those drawbacks,
as well as for the following reasons:
- To save memory in applications or libraries that may need to store many lists of values or run for long periods of time.
Notably, PHP'sarray
type will never release allocated capacity.
See https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html- To provide a better alternative to
SplDoublyLinkedList
,SplStack
, andSplQueue
for use cases that require stacks or queues.- As a more efficient option than
array
andSplDoublyLinkedList
as a queue orDeque
, especially forunshift
.A
Deque
is more efficient than anarray
when used as a queue, more readable, and easier to use correctly.
While it is possible to efficiently remove elements from the start of anarray
(in terms of insertion order) (though this makesreset()
/array_key_first() inefficient),
it is very inefficient to prepend elements to the start of a largearray
due to needing to either copy the array
or move all elements in the internal array representation,
and anarray
would use much more memory than aDeque
when used that way (and be slower).There are also several pitfalls to using an array as a queue for larger queue sizes,
some of which are not obvious and discovered while writing the benchmarks.
(Having a better (double-ended) queue datastructure (Deque
) than theSplDoublyLinkedList
would save users from needing to write code with these pitfalls):
array_key_first()
andreset()``takes time proportional to the number of elements
unset` from the start of an array,
causing it to unexpectedly be extremely slow (quadratic time) after unsetting many elements at the start of the queue.
(when the array infrequently runs out of capacity, buckets are moved to the front)reset()
orend()
will convert a variable to a reference,
and php is less efficient at reading or writing to reference.
Opcache is also less efficient at optimizing uses of variables using references.- More obviously,
array_unshift
andarray_shift
will take time proportional to the number of elements in the array
(to reindex and move existing/remaining elements).I plan to start voting on https://wiki.php.net/rfc/deque on Friday, February 4th.
Several changes have been made to https://wiki.php.net/rfc/deque#changelog
after the feedback in https://externals.io/message/116100
- The class is now named
Collections\Deque
- The api documentation in https://wiki.php.net/rfc/deque#proposal was expanded for methods.
- Benchmarks were updated.
- Like other standard datastructures, iteration over the deque is now over the original object (instead of creating a copy),
and mutating the deque will be reflected in$iterator->current()
(and moving the end with push()/pop() will affect where iteration ends).- Iteration will account for calls to shift/unshift moving the start of the deque.
the offsets will be corrected and values won't be skipped or iterated over multiple times.
(no matter how many iterators were created byDeque->getIterator()
)
See https://wiki.php.net/rfc/deque#iteration_behavior- The get()/set() methods were removed, after feedback in https://externals.io/message/116100#116214
A WebAssembly demo is available at https://tysonandre.github.io/php-rfc-demo/deque/
I've updated the RFC https://wiki.php.net/rfc/deque yet again (no implementation changes). I now plan to start voting on Saturday, February 5th.
I've also updated the WebAssembly demo at https://tysonandre.github.io/php-rfc-demo/deque/ to include a benchmark to illustrate the differences in performance of shift/unshift at various deque/array sizes.
A more detailed section on the performance of unshift/shift compared to array and existing data structures was added to the RFC.
I've also added a discussion section for the request to return $this
(https://externals.io/message/116100#116967), removed duplicated quotes, and clarified some parts of the RFC.
Thanks,
Tyson
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/deque to add a
final class Deque
This is based on the
Teds\Deque
implementation I've worked on
for the https://github.com/TysonAndre/pecl-teds PECL.While
SplDoublyLinkedList
and its subclassSplQueue
/SplStack
exist in the SPL, they have several drawbacks
that are addressed by this RFC to add aDeque
class (to use instead of those):
SplDoublyLinkedList
is internally represented by a doubly linked list,
making it use roughly twice as much memory as the proposedDeque
push
/pop
/unshift
/shift
fromSplDoublyLinkedList
are slower due to
needing to allocate or free the linked list nodes.- Reading values in the middle of the
SplDoublyLinkedList
is proportional to the length of the list,
due to needing to traverse the linked list nodes.foreach
Iteration behavior cannot be understood without knowing what constructed the
SplDoublyLinkedList
instance or set the flags.It would be useful to have an efficient
Deque
container in the standard library
to provide an alternative without those drawbacks,
as well as for the following reasons:
- To save memory in applications or libraries that may need to store many lists of values or run for long periods of time.
Notably, PHP'sarray
type will never release allocated capacity.
See https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html- To provide a better alternative to
SplDoublyLinkedList
,SplStack
, andSplQueue
for use cases that require stacks or queues.- As a more efficient option than
array
andSplDoublyLinkedList
as a queue orDeque
, especially forunshift
.A
Deque
is more efficient than anarray
when used as a queue, more readable, and easier to use correctly.
While it is possible to efficiently remove elements from the start of anarray
(in terms of insertion order) (though this makesreset()
/array_key_first() inefficient),
it is very inefficient to prepend elements to the start of a largearray
due to needing to either copy the array
or move all elements in the internal array representation,
and anarray
would use much more memory than aDeque
when used that way (and be slower).There are also several pitfalls to using an array as a queue for larger queue sizes,
some of which are not obvious and discovered while writing the benchmarks.
(Having a better (double-ended) queue datastructure (Deque
) than theSplDoublyLinkedList
would save users from needing to write code with these pitfalls):
array_key_first()
andreset()``takes time proportional to the number of elements
unset` from the start of an array,
causing it to unexpectedly be extremely slow (quadratic time) after unsetting many elements at the start of the queue.
(when the array infrequently runs out of capacity, buckets are moved to the front)reset()
orend()
will convert a variable to a reference,
and php is less efficient at reading or writing to reference.
Opcache is also less efficient at optimizing uses of variables using references.- More obviously,
array_unshift
andarray_shift
will take time proportional to the number of elements in the array
(to reindex and move existing/remaining elements).I plan to start voting on https://wiki.php.net/rfc/deque on Friday, February 4th.
Several changes have been made to https://wiki.php.net/rfc/deque#changelog
after the feedback in https://externals.io/message/116100
- The class is now named
Collections\Deque
- The api documentation in https://wiki.php.net/rfc/deque#proposal was expanded for methods.
- Benchmarks were updated.
- Like other standard datastructures, iteration over the deque is now over the original object (instead of creating a copy),
and mutating the deque will be reflected in$iterator->current()
(and moving the end with push()/pop() will affect where iteration ends).- Iteration will account for calls to shift/unshift moving the start of the deque.
the offsets will be corrected and values won't be skipped or iterated over multiple times.
(no matter how many iterators were created byDeque->getIterator()
)
See https://wiki.php.net/rfc/deque#iteration_behavior- The get()/set() methods were removed, after feedback in https://externals.io/message/116100#116214
A WebAssembly demo is available at https://tysonandre.github.io/php-rfc-demo/deque/
I've updated the RFC https://wiki.php.net/rfc/deque yet again (no implementation changes). I now plan to start voting on Saturday, February 5th.
I've also updated the WebAssembly demo at https://tysonandre.github.io/php-rfc-demo/deque/ to include a benchmark to illustrate the differences in performance of shift/unshift at various deque/array sizes.
A more detailed section on the performance of unshift/shift compared to array and existing data structures was added to the RFC.I've also added a discussion section for the request to
return $this
(https://externals.io/message/116100#116967), removed duplicated quotes, and clarified some parts of the RFC.
There's more changes I found this morning while working on test cases for unrelated datastructures, so I'll be postponing the vote.
-
The maximum
Collections\Deque
size is probably going to be reduced to be the same as the maximum array size.
For 64-bit php builds, that's roughly 2 billion (2147483648 elements, using at least 32 gigabytes of memory for array or Deque),
which was more memory than I'd assumed (16 bytes perzval
), and needing that much is expected to be a rare use case that can be revisited, e.g. if array size limits are raised.Collections would need to be converted to/from arrays for serialization, unserialization, debug output (var_export, json_encode, etc.), etc.,
so having a much larger maximum size didn't seem to have a compelling use case. -
I found and proposed potential improvements for detection of infinite recursion in var_export/debug_zval_dump/json_encode affecting userland classes, existing spl classes, etc.,
which seemed worth finishing discussing before starting the vote. -
Miscellaneous refactorings
Thanks,
Tyson