Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:116078 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 27763 invoked from network); 17 Sep 2021 19:59:19 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 17 Sep 2021 19:59:19 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 5F24C1804C8 for ; Fri, 17 Sep 2021 13:39:21 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS11403 64.147.123.0/24 X-Spam-Virus: No X-Envelope-From: Received: from wout5-smtp.messagingengine.com (wout5-smtp.messagingengine.com [64.147.123.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 17 Sep 2021 13:39:20 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 7B0B732007E8 for ; Fri, 17 Sep 2021 16:39:19 -0400 (EDT) Received: from imap43 ([10.202.2.93]) by compute1.internal (MEProxy); Fri, 17 Sep 2021 16:39:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=NuNT/y enXI0OGWGzUdINCRjiJQM2pMKZTDvlR9IXfDE=; b=Gt1kpJeQ1XMmsU8t0POt3M R6oYkMyUOfXCVouUwipgM0VRFXKLJ9XJbx4//ZVvljRoZAq7oXUQV+6q3Hl/GyyT yO4L7z3qcce+BeKqjpPKSWRPAvw8lJVZo+g0O7zuwbyck5Nlf7sCxrJ/zHyGJ55V DnPUpJ81uuAPl/Cy+l8DzUiN3o30owhqyX21acDVPuafRZccr8RzMwm71Q+TDi1r Jig1NyyqvxfgtqMYv1uV1Ty9323L6O9yp68WgOtltM9Zv4WDBZ70v6NL4V9cPC1b uBhqDutSTj3dyMdxanBFe14pkPfe6N+mkGOjkXIP7qgfkl0D1tQnPiAa28QWSuMA == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrudehiedgudegjecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefofgggkfgjfhffhffvufgtsehttdertderredtnecuhfhrohhmpedfnfgr rhhrhicuifgrrhhfihgvlhgufdcuoehlrghrrhihsehgrghrfhhivghlughtvggthhdrtg homheqnecuggftrfgrthhtvghrnhepteefteelfeeuffevgedvuddvhfetgfffffehfeek vdfftddvieffteelhfeuhfdunecuffhomhgrihhnpehphhhprdhnvghtpdhnphhophhovh drtghomhenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhm pehlrghrrhihsehgrghrfhhivghlughtvggthhdrtghomh X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id A1D47AC0DD0; Fri, 17 Sep 2021 16:39:18 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-1291-gc66fc0a3a2-fm-20210913.001-gc66fc0a3 Mime-Version: 1.0 Message-ID: In-Reply-To: References: Date: Fri, 17 Sep 2021 15:38:57 -0500 To: "php internals" Content-Type: text/plain Subject: Re: [PHP-DEV] RFC: Add `final class Vector` to PHP From: larry@garfieldtech.com ("Larry Garfield") On Thu, Sep 16, 2021, at 9:09 PM, tyson andre wrote: > Hi internals, > > I've created a new RFC https://wiki.php.net/rfc/vector proposing to add > `final class Vector` to PHP. > > PHP's native `array` type is rare among programming language in that it > is used as an associative map of values, but also needs to support > lists of values. > In order to support both use cases while also providing a consistent > internal array HashTable API to the PHP's internals and PECLs, > additional memory is needed to track keys > (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - around twice as much as is needed to just store the values due to needing space both for the string pointer and int key in a Bucket, for non-reference counted values)). > Additionally, creating non-constant arrays will allocate space for at > least 8 elements to make the initial resizing more efficient, > potentially wasting memory. > > It would be useful to have an efficient variable-length container in > the standard library for the following reasons: > > 1. To save memory in applications or libraries that may need to store > many lists of values and/or run as a CLI or embedded process for long > periods of time > (in modules identified as using the most memory or potentially > exceeding memory limits in the worst case) > (both in userland and in native code written in php-src/PECLs) > 2. To provide a better alternative to `ArrayObject` and `SplFixedArray` > for use cases > where objects are easier to use than arrays - e.g. variable sized > collections (For lists of values) that can be passed by value to be > read and modified. > 3. To give users the option of stronger runtime guarantees that > property, parameter, or return values really contain a list of values > without gaps, that array modifications don't introduce gaps or > unexpected indexes, etc. > > Thoughts on Vector? > > P.S. The functionality in this proposal can be tested/tried out at > https://pecl.php.net/teds (under the class name `\Teds\Vector` instead > of `\Vector`). > (That is a PECL I created earlier this year for future versions of > iterable proposals, common data structures such as Vector/Deque, and > less commonly used data structures that may be of use in future work on > implementing other data structures) Improving collection/set operations in PHP is something near and dear to my heart, so I'm in favor of adding a Vector class or similar to the stdlib. However, I am not a fan of this particular design. * As Levi noted, this being a mutable object that passes by handle is asking for trouble. It should either be some by-value internal type, or an immutable object with evolver methods on it. (Eg, add($val): Vector). Making it a mutable object is creating spooky action at a distance problems. An immutable object seems likely easier to implement than a new type, but both are beyond my capabilities so I defer to those who could do so. * The methods around size control are seemingly pointless from a user POV. I understand the memory optimization value they have, but that's not something PHP developers are at all used to dealing with. That makes it less of a convenient drop-in replacement for array and more just another user-space collection object, but in C with internals endorsement. If such logic needs to be included, it should be kept as minimalist as possible for usability, even at the cost of a little memory usage in some cases. * There is no reason to preserve keys. A Vector or list type should not have user-defined keys. It should just be a linear list. If you populate it from an existing array/iterable, the keys should be entirely ignored. If you care about keys you want a HasMap or Dictionary or similar (which we also desperately need in the stdlib, but that's a separate thing). * Whether or not contains() needs a comparison callback in my mind depends mainly on whether or not the operator overloading RFC passes. If it does, then contains() can/should use the __compareTo() method on objects. If it doesn't, then there needs to be some other way to compare non-identical objects or else that method becomes mostly useless. * To echo Pierre, a Vector needs to be of a single guaranteed type. Yes, this gets us back to the generics conversation again, but I presume (perhaps naively?) there are ways to address this question without getting into full-blown generics. But really, a non-type-guaranteed Vector/List construct is of fairly little use to me in practice, and that's before we even get into the potential performance optimizations for map() and filter() from type guarantees. I can write a type-guaranteed user-space class that does what I need in under 10 minutes, and for most low cardinality sets that's adequately performant. A built-in needs to be better than that. I very much appreciate the chicken-and-egg challenge of wanting to get something useful in despite the absence of a larger plan, and also the challenge of getting buy-in on a larger plan. Really. :-) This is an area where PHP's current dev process is very lacking. Still, I also agree with others that we need to be thinking holistically about this problem space, which will inform what the steps are. The approach we took for enums could be a model to consider (multiple RFCs clustered together under an RFC "epic".) That would allow for a long-term design, and the influence that offers, while still having milestones along the way that offer value unto themselves. (I'm happy to help with that, since that's about all I'm good for around here. :-) ) So big +1 to improving the in-C collection story; -1 to the current proposal. --Larry Garfield