Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:116089 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 1371 invoked from network); 18 Sep 2021 15:23:30 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 18 Sep 2021 15:23:30 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 93CD91804B3 for ; Sat, 18 Sep 2021 09:03:44 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS11403 64.147.123.0/24 X-Spam-Virus: No X-Envelope-From: Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com [64.147.123.25]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 18 Sep 2021 09:03:43 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id C3444320070D for ; Sat, 18 Sep 2021 12:03:42 -0400 (EDT) Received: from imap43 ([10.202.2.93]) by compute1.internal (MEProxy); Sat, 18 Sep 2021 12:03:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=grMw+pASxtoJFWu8ectbAuC+Avcod2HLwNNcNwuxc oM=; b=EbuHsTNcXnn1QsXSe3n9Qpspzrv/pM2ZjXPYRtPkwVI+cRs3hjHNvrGWK VePbDt5N30tWm9MMoJueqgCvyW45xIlyf//haCg0J8MbLPDqhVvNA/B0/yvM50dE X+z/83Or4sFIjcm5rryl88C/KqfbuvcLT5YiE+2CP2sDCEU6EsbArHxoitVNKpye LvzbK8STwOcW6xmZaIzfUG2ajQoqTaEnE7jHGyWh9En7/08hiAPm54GQ4u3nqG/P eCWPOehkvpquZsArYRauOg3SQTrPgQQcMUMIVJ3GIMW9yznHctftolXNB8Lyonn8 UChSKLI6apSENbQ4faYSdo3D0nFrA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrudehkedgleefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvffutgfgsehtqhertderreejnecuhfhrohhmpedfnfgr rhhrhicuifgrrhhfihgvlhgufdcuoehlrghrrhihsehgrghrfhhivghlughtvggthhdrtg homheqnecuggftrfgrthhtvghrnhepgedugefhhfdvtdeuvdeigeefuefgheeivdduvdeh tdegueekuddvleevtefgvdfhnecuffhomhgrihhnpehphhhprdhnvghtpdhgihhthhhusg drtghomhdpvgigthgvrhhnrghlshdrihhopdhgrghrfhhivghlughtvggthhdrtghomhen ucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehlrghrrh ihsehgrghrfhhivghlughtvggthhdrtghomh X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id 1C433AC0362; Sat, 18 Sep 2021 12:03:42 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-1291-gc66fc0a3a2-fm-20210913.001-gc66fc0a3 Mime-Version: 1.0 Message-ID: <29eb9519-ab67-44b2-9d62-9c591715946a@www.fastmail.com> In-Reply-To: References: Date: Sat, 18 Sep 2021 11:03:21 -0500 To: "php internals" Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] RFC: Add `final class Vector` to PHP From: larry@garfieldtech.com ("Larry Garfield") On Fri, Sep 17, 2021, at 8:49 PM, tyson andre wrote: >=20 > > Improving collection/set operations in PHP is something near and dea= r to my heart, > > so I'm in favor of adding a Vector class or similar to the stdlib. > >=20 > > However, I am not a fan of this particular design. > >=20 > > As Levi noted, this being a mutable object that passes by handle is = asking for trouble. > > It should either be some by-value internal type, or an immutable obj= ect with evolver methods on it. > > (E.g., add($val): Vector). Making it a mutable object is creating sp= ooky action at a distance problems. > > An immutable object seems likely easier to implement than a new type, > > but both are beyond my capabilities so I defer to those who could do= so. >=20 > https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec=20 > discusses why I'm doubtful of `is_vec` getting implemented or passing. > Especially with `add()` taking linear time to copy all elements of the=20 > existing value if you mean an array rather than a linked list-like=20 > structure, and any referenced copies taking a lot more memory than an=20 > imperative version would. >=20 >=20 > PHP's end users and internals members come from a wide variety of=20 > backgrounds, > and I assume most beginning or experienced PHP programmers would tend=20 > towards imperative&mutable programming rather than functional&immutabl= e=20 > programming. >=20 > PHP provides tools such as `clone`, private visibility, etc to deal wi= th that. >=20 > The lack of any immutable object datastructures in core and the lack o= f=20 > immutable focused extensions in=20 > PECL=C2=A0https://pecl.php.net/package-search.php?pkg_name=3Dimmutable > https://www.php.net/manual-lookup.php?pattern=3Dimmutable&scope=3Dquic= kref > (other than DateTimeImmutable) > heavily discourage me from proposing anything immutable. >=20 > (Technically, https://github.com/TysonAndre/pecl-teds has minimal=20 > implementations of immutable data structures, but the api is still=20 > being revised and Vector is the primary focus, followed by iterable=20 > functions. e.g. there's no `ImmutableSequence::add($value):=20 > ImmutableSequence` method.) >=20 >=20 > > The methods around size control are seemingly pointless from a user = POV. >=20 > setSize is useful in allocating exactly the variable amount of memory=20 > needed while using less memory than a PHP array. > `setSize($newSize, 0)` would be much more efficient and concise in=20 > initializing the value. >=20 > - Or in quickly reducing the size of the array rather than repeatedly=20 > calling pop in a loop. >=20 > And while methods around capacity control exist in many other=20 > programming languages, they aren't used by most users and most users=20 > are fine with functionality they don't use existing. > The applications or libraries that do have a good use case to reduce=20 > memory will take advantage of them and end users of those=20 > applications/libraries would benefit from the memory usage reduction. >=20 > > I understand the memory optimization value they have, but that's not= something PHP developers are at all used to dealing with. > > That makes it less of a convenient drop-in replacement for array and= more just another user-space collection object, but in C with internals= endorsement. > > If such logic needs to be included, it should be kept as minimalist = as possible for usability, > > even at the cost of a little memory usage in some cases. >=20 > If the functionality was just a drop-in replacement for array, others=20 > may say "why not just use array and the array libraries?" (or Vector). > With the strategy of doubling capacity, it can be up to 99% more memor= y=20 > than needed in some cases (Even more wastage after shrinking from the=20 > maximum size). >=20 > > There is no reason to preserve keys. > > A Vector or list type should not have user-defined keys. > > It should just be a linear list. If you populate it from an existing= array/iterable, the keys should be entirely ignored. > > If you care about keys you want a HashMap or Dictionary or similar (= which we also desperately need in the stdlib, but that's a separate thin= g). >=20 > The behavior is similar to=20 > https://www.php.net/manual/en/splfixedarray.fromarray.php=20 > It tries to preserve the keys, and fills in gaps with null. >=20 > 1. There's the consistency with existing functionality such as=20 > SplFixedArray::fromArray, or existing constructors preserving keys. > 2. And I'd imagined that a last minute objection of "Wait, `new=20 > SplFixedArray([1 =3D> 'second', 0 =3D> 'first'])` does what by default= ?=20 > Isn't this using the keys 0 and 1?", and the same for gaps >=20 > =C2=A0 =C2=A0I was considering only having the no-param constructor, a= nd adding=20 > the static method fromValues(iterable $it) to make it clearer keys are=20 > ignored. >=20 > > Whether or not contains() needs a comparison callback in my mind dep= ends mainly on whether or not the operator overloading RFC passes.=20 > > If it does, then contains() can/should use the __compareTo() method = on objects. > > If it doesn't, then there needs to be some other way to compare non-= identical objects or else that method becomes mostly useless. >=20 > There's a distinction between needs and very nice to have - a contains=20 > check for some predicate on a Vector can be done with a userland helpe= r=20 > method and a foreach. >=20 > Also, you're requesting functionality that I don't believe is currentl= y=20 > available for arrays, either. > =C2=A0 > > To echo Pierre, a Vector needs to be of a single guaranteed type. > > Yes, this gets us back to the generics conversation again, but I pre= sume (perhaps naively?) there are ways to address this question without = getting into full-blown generics. >=20 > Yep, as you said, this type is mixed, just like the SplFixedArray,=20 > ArrayObject, values of SplObjectStorage/WeakMap, etc. > Generic support is something that's been brought up before,=20 > investigated, then abandoned. >=20 > My concerns with adding StringVector, MixedVector, IntVector,=20 > FloatVector, BoolVector, ArrayVector (confusing), ObjectVector, etc is=20 > that >=20 > - I doubt many people would agree that there's a wide use case for any=20 > =C2=A0 specific one of them compared to a vector of any type. >=20 > =C2=A0 This would be even harder to argue for than just a single Vecto= r type. > - Mixes of null and type `T` might make sense in many cases (e.g.=20 > optional objects, statistics that failed to get computed, etc) but=20 > would be forbidden by that > - It would be a bad choice if generic support did get added in the=20 > future. >=20 > I'm not sure if we're thinking of the same thing. > Could you provide more details on how that would be implemented? Have=20 > other PECLs done something similar? >=20 > > But really, a non-type-guaranteed Vector/List construct is of fairly= little use to me in practice, and that's before we even get into the po= tential performance optimizations for map() and filter() from type guara= ntees. >=20 > See earlier comments on `vec`/Generics not being actively worked on=20 > right now and probably being a far way away from an implementation tha= t=20 > would pass a vote. >=20 > As for optimizations, opcache currently doesn't optimize individual=20 > global functions (let alone methods), it optimizes opcodes. > Even array_map()/array_filter() aren't optimized, they call callbacks=20 > in an ordinary way. > E.g. https://github.com/php/php-src/pull/5588 or=20 > https://externals.io/message/109847 regarding ordinary methods. >=20 > Aside: In the long term, I think the opcache core team had a long-term=20 > plan of changing the intermediate representation to make these types o= f=20 > optimizations feasible without workarounds like the one I proposed in=20 > 5588 >=20 > > I can write a type-guaranteed user-space class that does what I need= in under 10 minutes, and for most low cardinality sets that's adequatel= y performant. A built-in needs to be better than that. > >=20 > > I very much appreciate the chicken-and-egg challenge of wanting to g= et something useful in despite the absence of a larger plan, and also th= e challenge of getting buy-in on a larger plan. > > Really. :-) This is an area where PHP's current dev process is very = lacking. > > Still, I also agree with others that we need to be thinking holistic= ally about this problem space, which will inform what the steps are. > > The approach we took for enums could be a model to consider (multipl= e RFCs clustered together under an RFC "epic".) > > That would allow for a long-term design, and the influence that offe= rs, while still having milestones along the way that offer value unto th= emselves. (I'm happy to help with that, since that's about all I'm good = for around here. :-) ) >=20 > Enums were extensions of existing class types (is_object(Suit::Hearts)=20 > is true) rather than adding a whole separate type to the type system=20 > and don't need to support generics or contain anything other than an=20 > int/string. > I don't think the choice of "epic" widely influenced the vote. Rather than go point by point, I'm going to respond globally here. I am frequently on-record hating on PHP arrays, and stating that I want = something better. The problems with PHP arrays include: 1. They're badly performing (because they cannot be optimized) 2. They're not type safe 3. They're mutable 4. They mix sequences (true arrays) with dictionaries/hashmaps, making e= verything uglier 5. People keep using them as structs, when they're not 6. The API around them is procedural, inconsistent, and overall gross 7. They lack a lot of native shorthand operations found in other languag= es (eg, slicing) 8. Their error handling is crap Any new native/stdlib alternative to arrays needs to address at least ha= lf of those issues, preferably most/all. This proposal addresses the first point and... that's it. Point 5 is so= rt of covered by virtue of being out of scope, so maybe this covers 1.5 = out of 8. That's insufficient to be worth the effort to support and dea= l with in code. That makes this approach a strong -1 for me. "Fancy algorithms are slow when n is small, and n is usually small." -- = Rob Pike That some of the design choices here mirror existing poor implementation= s is not an endorsement of them. I don't think I've seen anyone on this= list say anything good about SPL beyond iterators and autoloading, so i= t's not really a good model to emulate. Additionally, please don't play into the trope about procedural/mutable = code being more beginner friendly. That's not the case, beyond being a = self-fulfilling prophesy. (If we teach procedural/mutable code first, t= hen most beginners will be most proficient in procedural/mutable code.) = I would argue that, on the whole, immutable values make code easier to = reason about and write once you get past trivially small sizes. We do n= ew developers a gross disservice by treating immutability as an "advance= d" technique, when it should really be the default, beginner technique t= aught from day one. I am not aware of any PECL implementations of lists that have type safet= y, because I don't use many PECL packages. However, in user space it's = quite simple to do: https://presentations.garfieldtech.com/slides-never-use-arrays/phpkonf20= 21/#/5/2 See that slide and scroll down for additional examples. Every one of th= ose examples took me less than 5 minutes to write. If we want to have a= better alternative in core, it needs to be *at least* as capable as wha= t I can throw together in 5 minutes. The proposal as-is is not even as = capable as those examples. --Larry Garfield