Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:120748 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 61833 invoked from network); 5 Jul 2023 10:17:05 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 5 Jul 2023 10:17:05 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id E5316180382 for ; Wed, 5 Jul 2023 03:17:04 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-oa1-f54.google.com (mail-oa1-f54.google.com [209.85.160.54]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 5 Jul 2023 03:17:01 -0700 (PDT) Received: by mail-oa1-f54.google.com with SMTP id 586e51a60fabf-1b3da531a56so1143552fac.1 for ; Wed, 05 Jul 2023 03:17:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1688552220; x=1691144220; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+ILbvLm03HZiko8u6KCvx8CSAu+lPROZluB1JuElB04=; b=ay7Z5kWarzYgkDLUIfzyxPrnxHpVa+OnZXUGH/KPCAGZs7xhlT9Fh98LLpBI9K60Iy w70hFhouF1tkFFOGEpbo1gtcg7Rxf3qTSiOFg0YK9j8990n0sowk62Wl3ih9UTwVgGaT k0BTbDfg9kbtaTQ1bqG1Vzpzit+PQ0NNYmN1NfPoEiBk9zE8D5BPj0rFnsvWyz7ecJ6B uJzGd2XIlVx1owonwPOQkae651bg+TqimNMmfyU7D2CKbFjJZ3+rEN9eFSSn3t4SRuRr zuJ+9Lo6+Hd79RcXzP06WnbXdDw3LsowzYnylAygAJM5Cnx8N1mTrdMMLRUyInOrSrkb rDSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688552220; x=1691144220; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+ILbvLm03HZiko8u6KCvx8CSAu+lPROZluB1JuElB04=; b=MQIgjCvq5IAoIVUGtlrsAlDRlLsBlJjKYvju93qxI2vPM3oSqhHK4Vv6rzxirUaql+ W/tsTrnUluZNJ0vKHgw1keyTwYfJ82JEESZJ+LTwVAt46NBbFBcMJmJRROUmeTVlKrIB bqEL9ZKOFNd9OAFDm3PSeadovdNZz4z1J0rLK59tsJrMy6hicuMEhv1bhHhBVViqhG2c jWnvsDOETBFzVbe6PCc8APieEmwgrfjo8ceF11/xhCR9MrmITx8xFCCB+hbe769b3+vr vk7RopaXOqIRvUtEmw9P/tMq+SEOfGQp+fm42B5nlUgUqkqwB4LPHrn8fZaiSp/yrO6p LTqQ== X-Gm-Message-State: ABy/qLalDfk7jVpwsrmiQ4xa0MW/oNHLYUeo5y6B+P5bi2DmhEml3IXK 5KIyJI2SZZIcZ2HdMi5onADR+IlPmI5JoYovQqcfRbrc X-Google-Smtp-Source: APBJJlEQxoA+jN6HH83xA0JFlEDO6Ze0fmrtASVM3xvc24qKEMBjkYjAq3e8jlRG1GrAi1Bhvm2J9FAg6WfpJYCitcY= X-Received: by 2002:a05:6870:9726:b0:1a6:c968:4a15 with SMTP id n38-20020a056870972600b001a6c9684a15mr19639893oaq.4.1688552220446; Wed, 05 Jul 2023 03:17:00 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Wed, 5 Jul 2023 07:16:49 -0300 Message-ID: To: Dmitry Stogov Cc: Ilija Tovilo , PHP internals Content-Type: multipart/alternative; boundary="00000000000048d2d305ffbab353" Subject: Re: [PHP-DEV] ??= and function calls From: flaviohbatista@gmail.com (=?UTF-8?Q?Fl=C3=A1vio_Heleno?=) --00000000000048d2d305ffbab353 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Great catch Ilija! Do you mind sharing how did you stumble upon it? Thank you! On Wed, Jul 5, 2023, 06:31 Dmitry Stogov wrote: > On Wed, Jul 5, 2023 at 1:15=E2=80=AFAM Ilija Tovilo > wrote: > > > Hi everyone > > > > I recently discovered some unfortunate behavior of the coalesce > > assignment operator (??=3D) in combination with function calls. Here's > > the TL;DR: > > > > foo()['bar'] ??=3D 42; > > > > Currently, this code calls foo() twice. This seems rather unexpected. > > The technical reason as to why this happens is not straight-forward, > > but I will attempt to explain below. The behavior was not specified in > > the RFC (https://wiki.php.net/rfc/null_coalesce_equal_operator) and is > > completely untested, and as such I don't believe it is by design. My > > proposal is to change it so that foo() is only called once. > > > > This is what is happening in detail. > > > > ??=3D is special in that it needs to evaluate the lhs (left hand side) > > twice. At first, we need to check if the offset exists, then > > conditionally execute the rhs (right hand side), re-fetch the offset > > and assign the rhs value to it. The reason for the re-fetching of the > > offset is that the evaluation of the rhs may invalidate the offset. > > This is explained in the following blog post: > > > > > https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-m= emory-safety > > Essentially, the offset may be a pointer into an array element or > > object property. If the rhs frees the array or object, or grows the > > array causing a reallocation (meaning it is moved to some other place > > in memory), the pointer is no longer valid. For this reason, PHP makes > > sure no user code may execute between the fetching of an offset and > > the assignment to it. Normally, that just means evaluating the rhs > > before fetching the offset. In this case, we need to evaluate the lhs > > first to know if we even should evaluate the rhs. > > > > Naively evaluating the lhs again poses a problem for expressions with > > side-effects. For example: > > > > $array[$x++] ??=3D 42; > > > > We do not want to re-evaluate the entire expression because $x++ will > > lead to a different array offset the second time around. The way this > > is solved is by "memoizing" any compiled expression in the lhs that is > > *not* a variable, meaning not part of the offset that may be > > invalidated. Internally, a variable is considered anything that may be > > written to, i.e. local variables ($foo), properties ($foo->bar, > > Foo::$bar), array offsets ($foo['bar']), and function calls (foo(), > > $foo->bar(), Foo::bar(), $foo(), as they may return a modifiable > > reference). The fact that function calls are included in that list > > leads to the problem presented above. It is not actually necessary to > > exclude them from memoization because their result may not be > > invalidated. > > > > Another inconsistency is that function call arguments will be > > re-evaluated, but only if they are not part of some other expression. > > > > a. foo(bar())['baz'] ??=3D 42; > > b. foo(bar() + 0)['baz'] ??=3D 42; > > > > a calls both foo() and bar() twice. b however calls foo() twice but > > bar() only once. That is because the expression bar() + 0 is *not* > > considered a variable and as such gets memoized. > > > > > This is definitely a bug in the original implementation. > In case a function is evaluated twice and returns different values, we > check one value, but assign to another. > > > > I propose to unconditionally memoize calls (in all forms) when they > > appear in the lhs of a coalesce expression. This will ensure that > > calls are only executed once, including function arguments and the lhs > > of method calls. Consequently, the assignment will be performed on the > > same offset that was previously tested, even if the expression > > contains a function call with side-effects. > > > > The implementation for this change is simple: > > https://github.com/php/php-src/pull/11592 > > > > Let me know if you have any concerns. I'm planning on merging this for > > master if there is consensus on the semantics. > > > > +1 > > Thanks. Dmitry. > > > > Ilija > > > > -- > > PHP Internals - PHP Runtime Development Mailing List > > To unsubscribe, visit: https://www.php.net/unsub.php > > > > > --00000000000048d2d305ffbab353--