Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:115178 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 99174 invoked from network); 28 Jun 2021 16:01:35 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 28 Jun 2021 16:01:35 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 646E4180501 for ; Mon, 28 Jun 2021 09:21:22 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 28 Jun 2021 09:21:22 -0700 (PDT) Received: by mail-lf1-f50.google.com with SMTP id q16so19425239lfr.4 for ; Mon, 28 Jun 2021 09:21:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=craigfrancis.co.uk; s=default; h=mime-version:from:date:message-id:subject:to; bh=cmau43ZX3Pbb/KHem92bsU9/5coy5Yw8L8yk+903sWY=; b=AD/fqx1TeOr9/P5m/TwbdlaAmjAPVUKrK4PThK3gRpHNatU7kJbfVPbIrtKOwldDhq 44/E4ySBbdp2BnZPF1hGeNL7YKFy5S/WO687WrtJCHMTVM7JBj+GJkaIaqCWWG8ziPeo HlIIcnFyjSSghh10ktMZT/dgxDYnNlCeB4Ni4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=cmau43ZX3Pbb/KHem92bsU9/5coy5Yw8L8yk+903sWY=; b=JTsrXUIQpU+QP3WAqtpjd6EfHyIHA4Y1I+GHenKGpQzuiE14cd10Qd+ur6YJaom5F+ BR8MD2gsjDLCGIQH1KXajYlGSQT4jpqNHgNd8/r0gMzyjQYdjqP4mYirSkpVvECufbMg zO74gFPHam1Dnv0z0mSVPZhlEJO4x/Jww2UrfXUW80NFt7PtbyNitET1OMsobfFddzHv /RXpxgcnNG9lyXULaK74fJRzmp4pWp/SkOqSntDqoabFjKfJ5woA+edKhj34fM7CAjAo LbFsRupSRQVHk8ox/v57wxxO9ZrtBCYbFitE+KsXeXEtPIctoqpFm4mqEzol0qa12N2X 0Apw== X-Gm-Message-State: AOAM531JvAHigrSmtHt+NmDzBGssxniSTNQJctVM9nvLaCRc9bgAU5Ci W/bYjXqpapvLLhsOrDG9K1kU461fT6llVhwC6SyPXdLjzinAcnzZ X-Google-Smtp-Source: ABdhPJwi9AgPwAgxZftCX12JQfat6FBeNBkMeCDNNmOz73kVutikb4cJQxKWNw6Da4wUHtsxHTkKyxfK0dyzhRIKpGM= X-Received: by 2002:a19:488b:: with SMTP id v133mr18859351lfa.519.1624897280164; Mon, 28 Jun 2021 09:21:20 -0700 (PDT) MIME-Version: 1.0 Date: Mon, 28 Jun 2021 17:21:09 +0100 Message-ID: To: PHP internals Content-Type: multipart/alternative; boundary="0000000000002e57a005c5d5e15e" Subject: is_literal() compile-time awkwardness From: craig@craigfrancis.co.uk (Craig Francis) --0000000000002e57a005c5d5e15e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Internals, There=E2=80=99s an awkward hitch with removing integer support. Fair warnin= g, we=E2=80=99re about to get into some under the hood stuff: During the compilation process (before any user data is involved), PHP makes tiny optimisations to code e.g. `=E2=80=9Da=E2=80=9D . 1` becomes `= =E2=80=9Da1=E2=80=9D`. In the parts where this happens, the developer-defined values (strings, integers, floats and booleans) are - technically accurately, but erroneously since we aren=E2=80=99t doing full =E2=80=99developer-defined=E2=80=99 support - tre= ated as literal values, and when concatenated the string gets (correctly, but erroneously) flagged as a literal. So, for example, a developer may wonder why $b is seen as a non-literal, whereas $c is flagged as a literal: $a =3D 1; $b =3D "Hello " . $a; // Non-Literal $c =3D "Hello " . 1; // Flagged Literal This is because $b involves concatenation at run-time, and because $a is an integer, it=E2=80=99s seen as a non-literal. Whereas $c has its value optim= ised by the compiler into a single literal string, so it=E2=80=99s marked as litera= l. Or for a second example, where the compiler can "coerce" an integer into a string: $a =3D "Hello "; $b =3D $a . 2; The compiler cannot do an optimisation based on the contents of $a, but it can see that $a will be concatenated with the integer 2. To optimise this, the compiler will coerce that integer into the string =E2=80=9C2=E2=80=9D, = to make the concatenation faster at runtime, so $b will be seen as a literal, which the developer may find odd, due to the presence of the developer-defined integer. Now these aren=E2=80=99t security issues, and it doesn=E2=80=99t work the o= ther way round: `is_literal()` doesn't incorrectly report any user (non-literal) data as a literal. But nonetheless it is still technically inconsistent - as it looks like it accepts =E2=80=98literal integers=E2=80=99 at some points and not o= thers. (And hence why when we were including integer support this was fine - because it simply accepted integers too and so it was all consistent. As it is, the list feedback was to not include them, as it=E2=80=99s not possible= to include a flag on integers to say if they are developer defined or not in the same way we can with strings). OPcache adds its own similar twist if it=E2=80=99s enabled, but with the ad= ded fun that unlike PHP=E2=80=99s own optimisation processes, OPcache is by its nat= ure inconsistent when it runs, changing what it optimises and when based on a number of factors (e.g. available memory) and so isn=E2=80=99t guaranteed t= o optimise the code every time. As to the specific issue, if you have: $a =3D implode("-", [1, 2, 3]); The variable $a will be set to "1-2-3", as a non-literal (because of the use of integers). However, if the OPcache is enabled, it can make its own optimisation, performing the implode() call early, and storing the literal string "1-2-3" in the OPcache, so the implode() function does not get used at runtime. In effect the compiled script becomes: $a =3D "1-2-3"; // Flagged as literal (Which as before, may be =E2=80=99literally=E2=80=99 correct but since it= =E2=80=99s not supposed to support integers=E2=80=A6) Or the OPcache while enabled doesn=E2=80=99t optimise it this time (because= it=E2=80=99s not guaranteed to), and leaves it as the previous example and *ding* non-literal sighted. Now while I imagine most people checking `is_literal()` would, on seeing an =E2=80=98error=E2=80=99 like that, simply go to their code, realise there i= s an integer in it, and then change it without any further interest, but it might still cause them confusion if they wondered why they hadn=E2=80=99t seen the erro= r every time before. Any thoughts? Craig --0000000000002e57a005c5d5e15e--