Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:120743 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 20464 invoked from network); 4 Jul 2023 22:15:38 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 4 Jul 2023 22:15:38 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 2C177180382 for ; Tue, 4 Jul 2023 15:15:38 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-oi1-f178.google.com (mail-oi1-f178.google.com [209.85.167.178]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 4 Jul 2023 15:15:37 -0700 (PDT) Received: by mail-oi1-f178.google.com with SMTP id 5614622812f47-3a04e5baffcso4619066b6e.3 for ; Tue, 04 Jul 2023 15:15:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1688508936; x=1691100936; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=UvOPecoLAfTLBH+SXx53Lhk5YxqfxL4WJkeTfE3Zk2s=; b=G397Ew1h/hwrOtPx8KQy8RqjA1vg0SU/m9MbfSweylctGr7sBHBn+aNf1vuO0ZlPdL Jts6vbFB/e1hzu5ih8awsCHlK8nqljhA5NkNPt0Z7+lnDAUGwGLAWHHThwaX5CcVoAYZ jAZwy/YjG72SVCLh1wXXS73rtp7btPWgjNe6mcotEtAMVyANE5GhzdnwnGMElRkEX3wi aVSSMMX7+uxqZYDRPeyMFDT4O3HPBz5BINTvk3cRFGisZ9DvqC/YfYVgmJjd31kj0KHo +PPEy9LDPBsZ++1RJyuRh1El/Yr6kIe17n4nwPdmEqfqc/FDkcpQByXVuNMUiUYC6f6D VbTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688508936; x=1691100936; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=UvOPecoLAfTLBH+SXx53Lhk5YxqfxL4WJkeTfE3Zk2s=; b=kCFml+0DQUwr7+pm0VJILOW4pdYdx3Ad8nGZVJXRlVeZnK2aM46q6aePYYAzmILRWB /u2dpivLRNFaaR9hr5GMHAdCAp8t+2tqideQGvS8nnosl5DQpXMHsPphgpu8LN/9D3Pu VkTL0yURF91yF3+djV2IPJeLDdANQI9Q0+MypqfupT1lWUESfj1Fif0KLrUvZLdpa8TC /0rUG48dafdPOu3uD9+UyD0B5+vnduze6ur61So2tsx5k62ri91AbbRt/RLMXVTif+5A LZTBcVaJ9L0qGd9nCF3CAFcL+1yiyTtXi+B5Ak1INivq7/v1URsTmP6sHfLlP8tr9Uv5 zS7Q== X-Gm-Message-State: ABy/qLanvnTSO3CkEru3ZEi7AaRUmxQgi7HCBvluBdMn7qSqeI0STxdo SGJ4xYvup0CxJFFaFCvGdxduJ63j8vGezhoyi3H48OufONc= X-Google-Smtp-Source: APBJJlHyn/d1uGD+/P5q7SqIduTCCywTi398D9Xk8ikorjeJM6/mBaPG9etem/ng5JtmFcCYzjH9G3LCYEvLgXE481c= X-Received: by 2002:a05:6808:255:b0:3a3:95cb:bf1c with SMTP id m21-20020a056808025500b003a395cbbf1cmr9808588oie.27.1688508936519; Tue, 04 Jul 2023 15:15:36 -0700 (PDT) MIME-Version: 1.0 Date: Wed, 5 Jul 2023 00:15:25 +0200 Message-ID: To: PHP internals Content-Type: text/plain; charset="UTF-8" Subject: ??= and function calls From: tovilo.ilija@gmail.com (Ilija Tovilo) Hi everyone I recently discovered some unfortunate behavior of the coalesce assignment operator (??=) in combination with function calls. Here's the TL;DR: foo()['bar'] ??= 42; Currently, this code calls foo() twice. This seems rather unexpected. The technical reason as to why this happens is not straight-forward, but I will attempt to explain below. The behavior was not specified in the RFC (https://wiki.php.net/rfc/null_coalesce_equal_operator) and is completely untested, and as such I don't believe it is by design. My proposal is to change it so that foo() is only called once. This is what is happening in detail. ??= is special in that it needs to evaluate the lhs (left hand side) twice. At first, we need to check if the offset exists, then conditionally execute the rhs (right hand side), re-fetch the offset and assign the rhs value to it. The reason for the re-fetching of the offset is that the evaluation of the rhs may invalidate the offset. This is explained in the following blog post: https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety Essentially, the offset may be a pointer into an array element or object property. If the rhs frees the array or object, or grows the array causing a reallocation (meaning it is moved to some other place in memory), the pointer is no longer valid. For this reason, PHP makes sure no user code may execute between the fetching of an offset and the assignment to it. Normally, that just means evaluating the rhs before fetching the offset. In this case, we need to evaluate the lhs first to know if we even should evaluate the rhs. Naively evaluating the lhs again poses a problem for expressions with side-effects. For example: $array[$x++] ??= 42; We do not want to re-evaluate the entire expression because $x++ will lead to a different array offset the second time around. The way this is solved is by "memoizing" any compiled expression in the lhs that is *not* a variable, meaning not part of the offset that may be invalidated. Internally, a variable is considered anything that may be written to, i.e. local variables ($foo), properties ($foo->bar, Foo::$bar), array offsets ($foo['bar']), and function calls (foo(), $foo->bar(), Foo::bar(), $foo(), as they may return a modifiable reference). The fact that function calls are included in that list leads to the problem presented above. It is not actually necessary to exclude them from memoization because their result may not be invalidated. Another inconsistency is that function call arguments will be re-evaluated, but only if they are not part of some other expression. a. foo(bar())['baz'] ??= 42; b. foo(bar() + 0)['baz'] ??= 42; a calls both foo() and bar() twice. b however calls foo() twice but bar() only once. That is because the expression bar() + 0 is *not* considered a variable and as such gets memoized. I propose to unconditionally memoize calls (in all forms) when they appear in the lhs of a coalesce expression. This will ensure that calls are only executed once, including function arguments and the lhs of method calls. Consequently, the assignment will be performed on the same offset that was previously tested, even if the expression contains a function call with side-effects. The implementation for this change is simple: https://github.com/php/php-src/pull/11592 Let me know if you have any concerns. I'm planning on merging this for master if there is consensus on the semantics. Ilija