Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:120747 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 58261 invoked from network); 5 Jul 2023 09:31:35 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 5 Jul 2023 09:31:35 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 3625E18054B for ; Wed, 5 Jul 2023 02:31:34 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 5 Jul 2023 02:31:33 -0700 (PDT) Received: by mail-lj1-f176.google.com with SMTP id 38308e7fff4ca-2b69f71a7easo105509951fa.1 for ; Wed, 05 Jul 2023 02:31:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1688549492; x=1691141492; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=SZDolAwv5PzS4XTBvXpaYLBAo0A4p7jeXBtMSCPxqvs=; b=YJ+VOGVVofWwyaD+chwn1IkvGe5Is+9zLUGTlQDCiaXWyYpCeS0+6yb4nFBxhRVzgc B1ioBywSJosW5WOnlHJQnlIyNL/KgD2WrPZSsq/B26KXWEU33oIZ9gtNdwpwiCoBpwvl 88luXmm5P/f6uoCkldpAzIxXoxkszgeUSMI6LqM5JqWPVPiQIWGiZtiG0HWf/s5n+6EW ZH2DLG0CkkIG7N9WiFSKwLJi/ZTh903ZP7+QRbaucpJZah8GWHf/MsZN1Usg8jpEUZ+I ilkNltei44cEEiI+uZ6Scl21oW89Lc+JEDjsZhhBIVad6YUhyb18iw5dj03s4sUauNyK dcGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688549492; x=1691141492; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SZDolAwv5PzS4XTBvXpaYLBAo0A4p7jeXBtMSCPxqvs=; b=XmQ/iEOQN8Y2SDruiVgm80vX+N3mVMHHe6MuzhL2Y7MHyPw5YvE5tblilfpoXXTtzm jFQlfSYnEbF1cRnbt0J2vKsVK5rOeoA0qMtQk4j3TXYYs1B7cuKyeDzp5MpfytcMdPCl MJ0M8+jAJgs5ZtvJrhDaW134+kqgYnq3+NopfvnhJka8qRI48TuckkQweDZYT9W+3mEi Jtsi2KSISflYTzbC/refxFynBGKVMaKBmaKl9M2UWoVNp8ZrS7/HBnlKieJ87X7gMeTM dKqeK3TX4pV/OLIbzAJzT4dryfbD+dQFI334LJdXmAqGR0PGVFyDnNu3P1ny0fnKo8WY 6xTg== X-Gm-Message-State: ABy/qLZh8N3C6CvgamuoAA1zNiP4HRof2dHWExRoBiON+u+4VqIfwwzo x9t8T5CXzpborZ7oJZNt93jgXwnvUDM92H8UXhI= X-Google-Smtp-Source: APBJJlGHFJ3jioQ08vDzA5TmQrJ87uUg6BlsliBGLpWnGiubEb7m+PfkwWd51nN0tY6Uihj/YqvNCygIktzadIpVGLI= X-Received: by 2002:a2e:800b:0:b0:2b6:cd40:21ad with SMTP id j11-20020a2e800b000000b002b6cd4021admr10527086ljg.37.1688549491670; Wed, 05 Jul 2023 02:31:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Wed, 5 Jul 2023 12:31:20 +0300 Message-ID: To: Ilija Tovilo Cc: PHP internals Content-Type: multipart/alternative; boundary="000000000000a302e305ffba10c3" Subject: Re: [PHP-DEV] ??= and function calls From: dmitrystogov@gmail.com (Dmitry Stogov) --000000000000a302e305ffba10c3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jul 5, 2023 at 1:15=E2=80=AFAM Ilija Tovilo wrote: > Hi everyone > > I recently discovered some unfortunate behavior of the coalesce > assignment operator (??=3D) in combination with function calls. Here's > the TL;DR: > > foo()['bar'] ??=3D 42; > > Currently, this code calls foo() twice. This seems rather unexpected. > The technical reason as to why this happens is not straight-forward, > but I will attempt to explain below. The behavior was not specified in > the RFC (https://wiki.php.net/rfc/null_coalesce_equal_operator) and is > completely untested, and as such I don't believe it is by design. My > proposal is to change it so that foo() is only called once. > > This is what is happening in detail. > > ??=3D is special in that it needs to evaluate the lhs (left hand side) > twice. At first, we need to check if the offset exists, then > conditionally execute the rhs (right hand side), re-fetch the offset > and assign the rhs value to it. The reason for the re-fetching of the > offset is that the evaluation of the rhs may invalidate the offset. > This is explained in the following blog post: > > https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-m= emory-safety > Essentially, the offset may be a pointer into an array element or > object property. If the rhs frees the array or object, or grows the > array causing a reallocation (meaning it is moved to some other place > in memory), the pointer is no longer valid. For this reason, PHP makes > sure no user code may execute between the fetching of an offset and > the assignment to it. Normally, that just means evaluating the rhs > before fetching the offset. In this case, we need to evaluate the lhs > first to know if we even should evaluate the rhs. > > Naively evaluating the lhs again poses a problem for expressions with > side-effects. For example: > > $array[$x++] ??=3D 42; > > We do not want to re-evaluate the entire expression because $x++ will > lead to a different array offset the second time around. The way this > is solved is by "memoizing" any compiled expression in the lhs that is > *not* a variable, meaning not part of the offset that may be > invalidated. Internally, a variable is considered anything that may be > written to, i.e. local variables ($foo), properties ($foo->bar, > Foo::$bar), array offsets ($foo['bar']), and function calls (foo(), > $foo->bar(), Foo::bar(), $foo(), as they may return a modifiable > reference). The fact that function calls are included in that list > leads to the problem presented above. It is not actually necessary to > exclude them from memoization because their result may not be > invalidated. > > Another inconsistency is that function call arguments will be > re-evaluated, but only if they are not part of some other expression. > > a. foo(bar())['baz'] ??=3D 42; > b. foo(bar() + 0)['baz'] ??=3D 42; > > a calls both foo() and bar() twice. b however calls foo() twice but > bar() only once. That is because the expression bar() + 0 is *not* > considered a variable and as such gets memoized. > > This is definitely a bug in the original implementation. In case a function is evaluated twice and returns different values, we check one value, but assign to another. > I propose to unconditionally memoize calls (in all forms) when they > appear in the lhs of a coalesce expression. This will ensure that > calls are only executed once, including function arguments and the lhs > of method calls. Consequently, the assignment will be performed on the > same offset that was previously tested, even if the expression > contains a function call with side-effects. > > The implementation for this change is simple: > https://github.com/php/php-src/pull/11592 > > Let me know if you have any concerns. I'm planning on merging this for > master if there is consensus on the semantics. > +1 Thanks. Dmitry. > Ilija > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php > > --000000000000a302e305ffba10c3--