Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:114674 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 38860 invoked from network); 31 May 2021 09:39:40 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 31 May 2021 09:39:40 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 8D7BE1804CC for ; Mon, 31 May 2021 02:52:23 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,URIBL_SBL, URIBL_SBL_A autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 31 May 2021 02:52:23 -0700 (PDT) Received: by mail-lf1-f50.google.com with SMTP id x38so15957108lfa.10 for ; Mon, 31 May 2021 02:52:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OZNgppwA6diL1hPR9ZhEowdD52DqyW2dU6SDC88Q7hw=; b=T7kQfpOF6smjdXcFKKlWpYmhvd4qQPvfIMWf1Id7F5tIxywCuHJYLRwJ/7HxDYM/44 Gx7d02iLjAtcRkj+RV1DATVC5nVBqF5uDlA7wVW2Y5xaLw1Lqzsw3Zk61MgnMUz21TuI 7dylTV2g0CZMhUhXgOrC+CZCLFNsRqCEl3czvbrJSGurXDjm34nqPoGpTi/9/kAWRsKJ w0lQ2X1Iq30VI45W05l1qB9Nv81Dwjk4GsWZOmQ50k/28NrP3I6qXuEdxoLursq75Yt9 gtglIlvDNBPNSbblQzBYw65P9nm5BAzEQNpkOv69wul5pnkzj6XpnqyaKTg20TemgHU3 vkKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OZNgppwA6diL1hPR9ZhEowdD52DqyW2dU6SDC88Q7hw=; b=q2y6zxZL/q9e1lWbAEAUcSguHPNYmmeMRF+opULoRMSyvV1nCyh7pJXz5CFV4EbbpX 7KlgPC/L/PtHxP+ZBjDKqDoVyVCStvZEAv5W/A86AzVHkv4iMxKzmBzInE21Ub/4Nkcs B5uQmRLJ5eZ06GYMlaAr0JTNDFXQ7nVAuOw26mFApOLmTWjHx82OvYzoqcy6iUcdpfng a0xlNF7OxISIThPI7T23iQaJxO2NrP8KCrK3j6gQLBHAn7eaP5Vwph4DVmPr1I27sG9P of2SPQn4aoLNgXr4NgCf/80U20xiL8J25hu254uAESw9hzoXuU2PSbKJ42tnuGtBLQl7 fd9w== X-Gm-Message-State: AOAM531QZ+FPF8jLBEI1/kP9Nachf/TgmIpeJpLO0a9mJJSF8ags+k8G E5PSt3YFT5uvQvWhj3MygtuzBsBMnZ5tF68hLiSztzZZxjk= X-Google-Smtp-Source: ABdhPJyiVSaslyJW7SaLjj5fkaT2uBUs5d3IWpIIcqGC5qkfkD18fE5WyxXCVcRqTqExjZI4OKYVsCOU4NlKesX589U= X-Received: by 2002:ac2:424f:: with SMTP id m15mr13638542lfl.223.1622454739905; Mon, 31 May 2021 02:52:19 -0700 (PDT) MIME-Version: 1.0 References: <1A9FD7A8-0BFD-475F-896A-DA8579FC0D9F@newclarity.net> In-Reply-To: <1A9FD7A8-0BFD-475F-896A-DA8579FC0D9F@newclarity.net> Date: Mon, 31 May 2021 11:52:03 +0200 Message-ID: To: Mike Schinkel Cc: "internals@lists.php.net" Content-Type: multipart/alternative; boundary="0000000000006fb07805c39d2e2b" Subject: Re: [PHP-DEV] A little syntactic sugar on array_* function calls? From: nikita.ppv@gmail.com (Nikita Popov) --0000000000006fb07805c39d2e2b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, May 28, 2021 at 11:01 PM Mike Schinkel wrote: > Hi Nikita, > > Thank you for taking the time to explain in detail. > > One more question below. > > -Mike > > On May 28, 2021, at 10:31 AM, Nikita Popov wrote: > > On Fri, May 28, 2021 at 3:11 AM Mike Schinkel wrote= : > >> > On May 26, 2021, at 7:44 PM, Hendra Gunawan >> wrote: >> > >> > Hello. >> > >> >> >> >> Yes, but Nikita wrote this note about technical limitations at the >> bottom of the repo README: >> >> >> >> Due to technical limitations, it is not possible to create mutable >> APIs for >> >> primitive types. Modifying $self within the methods is not possible (= or >> >> rather, will have no effect, as you'd just be changing a copy). >> >> >> > >> > If it is solved, this is a great accomplishment for PHP. But I think >> > scalar object is not going anywhere in the near future. If you are not >> > convinced, please take a look >> > >> https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181= . >> >> Nikita's comment actually causes me more questions, not fewer. >> >> Nikita says "We need to know that $a[$b][$c is an array in order to >> determine that the call should be performed by-reference. However, we >> already need to convert $a, $a[$b] and $a[$b][$c] into references before= we >> know about that." >> >> How then are we able to do the following?: >> >> $a[$b][$c][] =3D 1; >> > > In this case, we're clearly performing a write operation on the array. If > you want to know the technical details, the compiler will convert this in= to > a sequence of FETCH_DIM_W ops followed by ASSIGN_DIM. The "W" bit here is > for "write", which will perform all the necessary special handling, such = as > copy-on-write separation and auto-vivification. > > How also can we do this: >> >> byref($a[$b][$c]); >> function byref(&$x) { >> $x[]=3D 2; >> } >> >> See https://3v4l.org/aPvTD >> > > This is a more complex case. In this case the compiler doesn't know in > advance whether the argument is passed by value or by reference. What > happens here is: > > 1. INIT_FCALL determines that we're calling byref(). > 2. CHECK_FUNC_ARG for the first arg determines that this argument is > passed by-reference for this function. > 3. FETCH_DIM_FUNC_ARG on the array will be perform either an FETCH_DIM_R > or to FETCH_DIM_W operation, depending on what CHECK_FUNC_ARG determined. > > I assume that in both my examples $a[$b][$c] would be considered an >> "lvalue"[1] and can be a target of assignment triggered by either the >> assignment operator or calling the function and passing to a by-ref >> parameter. >> >> [1] >> https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-val= ues_and_r-values >> >> So is there a reason that -> on an array could not trigger the same? Is >> Nikita saying that the performance of those calls performed by-reference >> would not matter because they are always being assigned, at least in the >> former case, but to do so with array expressions would be problematic? >> (Ignoring there is no code in the wild that currently uses the -> operat= or, >> or does that matter?) >> > > Note that the byref($a[$b][$c]) case only works because we know which > function is being called at the time the argument is passed. If you have > $a[$b][$c]->test() we need to pass $a[$b][$c] by reference (FETCH_DIM_W) = or > by value (FETCH_DIM_R) depending on whether $a[$b][$c]->test() accepts th= e > argument by-value or by-reference. But we can only know that once we have > already evaluated $a[$b][$c] and found out that it is indeed an array. > > The only way around this is to *always* perform a for-write fetch of > $a[$b][$c], even though we don't know that the end result is going to be = an > array. However, doing so would pessimize the performance of code operatin= g > on objects. Consider $some_huge_shared_array[0]->foo(). If we fetch > $some_huge_shared_array for write, we'll be required to perform a full > duplication of the array in preparation for a possible future write. If i= t > turns out that $some_huge_shared_array[0] is actually an object, or that > $some_huge_shared_array[0] is an array and the performed operation is > by-value, then we have performed this copy unnecessarily. > > I don't believe this is acceptable. > > I ask honestly to understand, and not as a rhetorical question. >> >> Additionally, if the case of updating an array variable is not a problem >> but updating an array expression is a problem then why not just limit th= e >> -> operator to only work on expressions for immutable methods and requir= e >> variables for mutable methods? I would think should be easy enough to >> throw an error for those specific "methods" that would be mutable, such = as >> shift() and unshift() if $a[$b][$c]->shift('foo') were called? >> > > There are externalities associated even with the simple $x->foo() case, > though they are less severe. They primarily involve reduced ability to > analyze code in opcache. > > > In either case, this limitation does not seem reasonable to me from a > language design perspective. If $a->push($b) works, then $a[$k]->push($b) > can reasonably be expected to work as well. > > >> Or maybe just completely limit using the -> operator on array variables. >> Don't work on any array expressions for consistency. There is already >> precedence in PHP for operators that work on variables and not on >> expressions: ++, --, and &. >> >> IF we can get a thumbs up from Nikita that one of these would actually b= e >> possible then I think the next step should be to write up a list of >> proposed array methods that would be implemented to support the -> opera= tor >> with arrays and put them in an RFC, and to flesh out any edge cases. >> > > The only correct way to resolve this issue is to not support mutable > operations. > > > I don't think I agree that this is the only correct way, but I respect > your position of authority on the matter. > > I don't think there's much need for mutable operations. sort() and > shuffle() would be best implemented by returning a new array instead. > array_push() is redundant with $array[]. array_shift() and array_unshift(= ) > should never be used. > > > Why do you say array_shift() and array_unshift() should never be used? > When I wrote the above questions the use-case I was thinking about most w= as > $a->unshift($value) as I use array_unshift() more than most of the other > array functions. > > Do you mean that these if applied as "methods" to an array should not be > use immutably =E2=80=94 meaning in-place is bad but returning an array va= lue that > has been shifted would be okay =E2=80=94 or do you have some other reason= you > believe that shifting an array is bad? Note the reason I have used them = in > the past is when I need to pass an array to a function written by someone > else that expects the array to be ordered. > > Also, what about very large arrays? I assume =E2=80=94 which could be a = bad > assumption =E2=80=94 that PHP internally can be more efficient about how = it handles > array_unshift() instead of just duplicating the large array so as to add = an > element at the beginning? > Arrays only support efficient push/pop operations. Performing an array_shift() or array_unshift() requires going through the whole array to reindex all the keys, even though you're only adding/removing one element. In other words, array_shift() and array_unshift() are O(n) operations, not O(1) as one would intuitively expect. If you use shift/unshift as common operations, you're better off using a different data-structure or construction approach. Regards, Nikita --0000000000006fb07805c39d2e2b--