Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117589 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 46403 invoked from network); 25 Apr 2022 07:58:54 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 25 Apr 2022 07:58:54 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 0283F1804C5 for ; Mon, 25 Apr 2022 02:33:51 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 25 Apr 2022 02:33:51 -0700 (PDT) Received: by mail-wr1-f41.google.com with SMTP id w4so19885219wrg.12 for ; Mon, 25 Apr 2022 02:33:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=craigfrancis.co.uk; s=default; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=4JrrP8mf9gkTt3zgl2Z8p9+49ZcR94ecGdV2+Ea/Ivk=; b=UjBDgtmlALhftQeXyijXSqX2Xp6fAHxa6ZftatlwXrNeTAShTDq3tar9qGQBAs2QWK bBk/ofCzIxIaArKs6C6jyHcTfc3XCUSK8E6nZczk88cRsAXkZ/zsuIN0rIzgSi5Tj0ec Joo0wSKhD3Rs2y/52qB2Ig3ealVNvBJahg3+o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=4JrrP8mf9gkTt3zgl2Z8p9+49ZcR94ecGdV2+Ea/Ivk=; b=r/Vfx3qNZ2xf7wNZsQKb3DTEKArfjfXoYb3E8oRMcT1k0e5rcEeKceKKtzRJ/pfVPs /l9osXYuo8z277OUEKliAtMM8qFJPiM5TbUngHFraB4Do94Rluc1XTmeJEks74O73QIO Zk3D/pFPGW2dqjjan7SS6wqEj+9Lyxarhdu27qSKVPMQg7FOBBjKjpxaKL30MfhLTUnU ddRLMHGY74Hn0t5QkxSDYfxm6TsXkVbxBSEgyBGijx7M0NSBEUoc9D9HXbdP3SRfQddZ ZYstt3CaHeF4FUZCQYimer+NYnvZTITbv/AaNU+tjkZSdo0+rn3+aIYEju6k+I3fgcu+ WMFA== X-Gm-Message-State: AOAM531UiSeUYXhSEIBWDr8RPavj9bwvv3pIeX+GC+sR5JRPpPie0qno ewE6PUUxXbybipLzhTrGChnlTQ== X-Google-Smtp-Source: ABdhPJzGNb42YdfEzmCOWSzssMKVnr8C5L1yn/9Mt7XCum76oPJGHy/FZOne7/SqIRtimqADu82Z0g== X-Received: by 2002:a05:6000:1815:b0:20a:deee:3cf0 with SMTP id m21-20020a056000181500b0020adeee3cf0mr1623086wrh.210.1650879230096; Mon, 25 Apr 2022 02:33:50 -0700 (PDT) Received: from smtpclient.apple ([94.173.138.98]) by smtp.gmail.com with ESMTPSA id v6-20020adfa1c6000000b0020add25571bsm1832920wrv.42.2022.04.25.02.33.49 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Apr 2022 02:33:49 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.80.82.1.1\)) In-Reply-To: Date: Mon, 25 Apr 2022 10:33:46 +0100 Cc: PHP internals Content-Transfer-Encoding: quoted-printable Message-ID: <7DB0A01F-04FB-420D-9025-E027E5DE02F7@craigfrancis.co.uk> References: <42D0A480-F262-4F72-9C4D-887762A8D800@gmail.com> <0b061f28-a087-efd3-8602-424ee03458e0@gmail.com> To: Rowan Tommins X-Mailer: Apple Mail (2.3696.80.82.1.1) Subject: Re: [PHP-DEV] NULL Coercion Consistency From: craig@craigfrancis.co.uk (Craig Francis) On 21 Apr 2022, at 15:09, Rowan Tommins wrote: > On Wed, 20 Apr 2022 at 18:02, Craig Francis = wrote: >> I'm just trying to focus on how PHP has worked >=20 > You keep repeating this mantra, but user-defined functions with = declared parameter types have never accepted nulls, in any mode, unless = explicitly marked with "?" or "=3D null". Thanks Rowan, and you're right that user defined functions haven't = worked like that, but that's not really what I'm focused on - which is = how PHP has always worked in regards to internal functions (the bit that = has changed, and is causing problems), and how NULL coercion works in = other contexts. I mention user defined functions because I still want user and internal = functions to work in the same way (one form of consistency). > The fact that internal functions have parameter parsing behaviour that = is almost impossible to implement in userland, and often not even = consistent between functions, is a wart of engine internals, not a = design decision. Bit of a tangent, but do you have some examples? would be nice to clean = some of these up, or at least have them in mind as we discuss this RFC. > All of that, and the "consistency" in the title of your RFC, is a = complete distraction from the real questions: >=20 > 1) given a null input, and a non-nullable parameter, what should the = run-time do? > 2) what is the best way to help users update their code to be = compatible with that new behaviour? I'm happy to change the title, but consistency does apply in different = ways... as in, ensuring user and internal function parameters work in = the same way (even if that way might need to change slightly); how other = simple values like string/int/float/bool values can still be coerced for = parameters, but NULL won't be; and how NULL coercion does work in other = contexts (string contact, `=3D=3D` comparison, sprintf, print, array = keys). To answer your questions, 1. I think NULL inputs should be coerced for non-nullable parameters, = when not using `strict_types=3D1`, and where static analysis tools = provide the additional/stricter checks (like how they can reject a = string '5' for an integer parameter). With nullable parameters being = different by accepting NULL without coercion. 2. With this approach, developers won't need to update their code... in = contrast, the approach of NULL causing a Type Error for internal = functions, developers will need to change their code, and I cannot find = any tools that help with this, except very strict static analysis to = find but not update them (it's difficult tracing every variable from = source to sink). >> Agreed, the only thing I'd add... failing early with NULL might help = debug some problems (assuming there is a problem), but I believe static = analysis and unit tests are much better at doing this (e.g. static = analysis is able to see if a variable can be NULL, and apply those = stricter checks, as noted by George with Girgias RFC). >=20 >=20 > I think we should institute a "swear jar" - whenever someone says = "static analysis could do this better", they have to donate =E2=82=AC1 = to the PHP Foundation. :P Cool, I've got 23 credits left this month :-) So I'll spend 1 more... I think it's fair to say that developers using = `strict_types=3D1` are more likely to be using static analysis; and if = `strict_types=3D1` is going to eventually disappear, those developers = won't lose any functionality with the stricter checking being done by = static analysis, which can check all possible variable types (more = reliable than runtime), and (with the appropriate level of strictness) = static analysis can do things like rejecting the string '5' being passed = to an integer parameter and null being passed to a non-nullable = parameter. > More seriously, 1) your own RFC claims that fewer than 33% of = developers use static analysis; and 2) if PHP had a compile step with = mandatory static analysis, we would still have to decide whether passing = NULL to these functions should be rejected as an error by the static = analyser. Not sure how to respond to that... do you think PHP will have a compile = step, with mandatory static analysis? Considering how static analysis = tools today have several levels, and the ability to create a baseline = (to ignore problems with old code), are you suggesting that everyone = would have the strictest checks enabled, where no coercion happens at = all? >> In contrast, failing early at runtime, on something that is not = actually a problem, like the examples in my RFC, creates 2 problems = (primarily upgrading; but also adding unnecessary code to explicitly = cast those possible NULL values to the correct type, which isn't needed = for the other scalar types). >=20 >=20 > I've been trying not to nit-pick, because I think over-all you do have = a valid point, but several of your examples do not need any extra code = to handle nulls; and those that do need maybe a handful of characters. = For instance, $search =3D ($_GET['q'] ?? ''); is both shorter and = clearer than $search =3D ($_GET['q'] ?? NULL); Thank you; and you're right, if you write new code today, you could do = that, but that assumes you don't need to tell the difference between an = empty value vs a missing value (I find this check is rare, but they do = happen, and is why most frameworks use NULL for that distinction, = WordPress being the only exception I've found so far). But, updating existing code, while that would make automated updates = easier, it's likely to cause problems, because you're editing the value = source, with no idea about checks later on (like your example which = looks for NULL)... and that's why an automated update of existing code = would have more luck updating the sinks rather than the sources (e.g. it = knows which sinks are expecting a string, so it can add in a = `strval($var)`, or `(string) $var`, or `$var ?? ""`). >> Thank you... but I will add, while `htmlspecialchars()` rejecting = NULL would get you to look at the code again, I wouldn't say it's = directly picking up the problem, and it relies on there being a NULL = value in $fields for this to be noticed (if that doesen't happen, you're = now in a position where random errors will start happening in = production). >=20 >=20 > There are always going to be errors that depend on the value of input, = and unless you have god-like skill at writing tests, some of those will = only happen in production. Saying "this shouldn't happen" doesn't answer = the question of what to do when it does. Agreed, it's just that in the example given, runtime checks seemed = particularly fragile (as in, the NULL check is a feature to affect the = output, and while I don't expect all code to be tested, I would expect = explicit features to be, which would ensure a NULL value is provided and = checked, instead of relying on the developer remembering to test a NULL = value). >> Bit of a tangent, I'm uncomfortable that `$name` is not being HTML = encoded, which takes us to context aware templating engines, and how you = can identify these mistakes via the `is_literal` RFC or the = `literal-string` type. >=20 >=20 > This isn't real code, it's an illustrative example; but if it makes = you feel more comfortable, imagine $name has some deliberate HTML markup = in it, so needs to be echo'd raw. (/me shudders, with memories of a project that did that, and trying to = remember which variables contained HTML). >> That said, if you (or anyone) have any better ideas on how to address = this problem, please let me know. >=20 > I honestly would be more interested in having some string functions = return null for null input than changing existing behaviour to accept = null for non-nullable parameters (interestingly, until PHP 8.0, = htmlspecialchars() could return null, e.g. if given an array). = Unfortunately, that would be a different kind of compatibility break, so = I'm not sure it fully solves the problem. I've talked to a few developers who see NULL being passed to = htmlspecialchars() as something they want to be warned about, because = they see NULL as an invalid value (with my RFC, they would still keep = that check with `strict_types=3D1` and/or static analysis). Also, have a look at the quiz results: https://quiz.craigfrancis.co.uk/ It kinda confirms those discussions, while I don't think many people = carefully went though the list of parameters (a bit more on the = quantitative rather than qualitative side), if you focus on the people = who selected option 3 first (preferring a Fatal Error for NULL), it's = only Kamil and Tim who were ok with htmlspecialchars() accepting NULL, = but they were not open to many of the other 334 parameters. That said, your suggestion of "?|>" could work in addition to my RFC, so = the NULL value can be kept when explicitly asked to preserve (avoiding = that backwards compatibility issue). Craig