Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117660 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 52139 invoked from network); 3 May 2022 10:00:22 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 3 May 2022 10:00:22 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id A7D7018037E for ; Tue, 3 May 2022 04:37:20 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 3 May 2022 04:37:20 -0700 (PDT) Received: by mail-wr1-f49.google.com with SMTP id i5so22942908wrc.13 for ; Tue, 03 May 2022 04:37:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=craigfrancis.co.uk; s=default; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=vIy0so8xUZIv3U6ET5PP1znVADRuyRiEeYgfXkkhGvM=; b=QgAJctlbULSQ74UxqhU9J2hB1YuYJtXXa2Zwj3VgeIaowuRUqAnd2zlSUVKumpseF+ qNs3my3bq1sz+lxkmIKuUWpyuClnBcNxkrwQCzex+rBxO2VJYhiFk51qVze2srL3c+j9 joySp6yTq5ESAOrWPWCDChx2vI/FD2UfkKTEw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=vIy0so8xUZIv3U6ET5PP1znVADRuyRiEeYgfXkkhGvM=; b=055TyiDi0Y0gN+lVsqfqjWNImMCXVb2v/wkhsDaIymWNTIOpp3BP9QyreO1zS6nZYS oF8p52Lvbc3xKeqaTMTl8ZqDIw8CKg2lmDxiqxmfk+xRwX7rKAPG3Dpac+2YbUCaVs6i 2Q3zRYi3c/OrHOl4pSiRceq2qpqVLnxufN2QDBM62VJcWkAvV3ov9wZU0/qUQwW4uLPR FupJBWBVx+JGkiz1SVdLjXRjzzQ9vuF8526Vrsn/qfc9f5DlF2dhTEcTkev62U+JeJXk 3gv6CLqHc2YRZwboRRWuFu5BPN4Rt6TKN6L82R3FCYGvmvytmoT+rzA44Qoxq+Num5cS oKTw== X-Gm-Message-State: AOAM5304VmtkH7NEH3G0foXZv3W0qDjtHI8AmxXcC1j6qbQ0AndnSs8w bI2wuX303nkWeKNuN7LZ77aZEwSelQzoig== X-Google-Smtp-Source: ABdhPJzDKsWQrAnW6/CmR0oyfLlCRNdHHMS4XvzhDgdm5frjr7vnUi9QKImeXFByEgeTbGdcpPdLOg== X-Received: by 2002:a5d:51ce:0:b0:20c:55a8:69e with SMTP id n14-20020a5d51ce000000b0020c55a8069emr11529988wrv.380.1651577838675; Tue, 03 May 2022 04:37:18 -0700 (PDT) Received: from smtpclient.apple ([94.173.138.98]) by smtp.gmail.com with ESMTPSA id i18-20020a05600c355200b003942a244f32sm1080376wmq.11.2022.05.03.04.37.17 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 03 May 2022 04:37:17 -0700 (PDT) Message-ID: <84E29842-B411-451A-94B7-B4707953DA94@craigfrancis.co.uk> Content-Type: multipart/alternative; boundary="Apple-Mail=_073F51D3-69D3-4053-88AD-1344BE31BB9F" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.80.82.1.1\)) Date: Tue, 3 May 2022 12:37:12 +0100 In-Reply-To: <0484c0c8-569f-7889-343d-829fb820c64d@gmail.com> Cc: PHP internals To: Rowan Tommins References: <42D0A480-F262-4F72-9C4D-887762A8D800@gmail.com> <0b061f28-a087-efd3-8602-424ee03458e0@gmail.com> <7DB0A01F-04FB-420D-9025-E027E5DE02F7@craigfrancis.co.uk> <9859B3B4-091A-4311-8F68-F6C35FBC32A1@craigfrancis.co.uk> <0484c0c8-569f-7889-343d-829fb820c64d@gmail.com> X-Mailer: Apple Mail (2.3696.80.82.1.1) Subject: Re: [PHP-DEV] NULL Coercion Consistency From: craig@craigfrancis.co.uk (Craig Francis) --Apple-Mail=_073F51D3-69D3-4053-88AD-1344BE31BB9F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > On 30 Apr 2022, at 18:05, Rowan Tommins = wrote: >=20 > On 27/04/2022 16:51, Craig Francis wrote: >> Forgive this primitive example, but this shows `$name` being used in = three different ways, where an automated tool cannot simply change line = 1 so it doesn't return a NULL, because it would break the Register link. >>=20 >> ``` >> $name =3D $request->get('name'); >>=20 >> if (trim($name) =3D=3D=3D '') { // Contains non-whitespace = characters; so not "", " ", NULL, etc >> $where_sql[] =3D 'name LIKE ?'; >> $where_val[] =3D $name; >> } >>=20 >> echo ' >>
>> >> >>
'; >>=20 >> if ($name !=3D=3D NULL) { >> $register_url =3D '/admin/accounts/add/?name=3D' . = urlencode($name); >> echo ' >>

Add = Account

'; >> } >> ``` >>=20 >> In this example, why does `trim()` and `htmlspecialchars()` justify a = fatal Type Error? >=20 >=20 > Honestly, I fail to see how the inconsistent null handling in this = example is anything other than a bug: My example was only trying to show how an automated tool cannot simply = update the `$name` source (line 1), because it will not be able to = easily tell what happens later (especially if the code is split into = multiple files)... whereas an automated tool will find it much easier to = do this at the sinks: ``` - if (trim($name) =3D=3D=3D '') { + if (trim((string) $name) =3D=3D=3D '') { ``` Which is how the default NULL from `getConfigData()` was handled with = PHPCompatibility: = https://github.com/PHPCompatibility/PHPCompatibility/commit/ab7b0b82d95c82= e62f9cffb0f1960f3ccbf0805e = Or maybe: ``` - [...] value=3D"' . htmlspecialchars($name) . '"> + [...] value=3D"' . htmlspecialchars($name ?? '') . '"> ``` Which is how "Vaimo Composer Patches" might "fix" the NULL from = `extractSingleValue()`: https://github.com/vaimo/composer-patches/pull/96/files = = https://github.com/vaimo/composer-patches/blob/cbceb44dfad9e37f18e319125b0= 36a10496ea094/src/Patch/SourceLoaders/PatchesSearch.php#L234 = This is how Laravel Blade handles NULL: = https://github.com/laravel/framework/blob/ab1506091b9f166b312b3990d07b2e21= d971f2e6/src/Illuminate/Support/helpers.php#L119 = But, on the basis that PHP considers it a problem if NULL is passed to = these functions, and should not be allowed (blocking will make better = code, somehow)... maybe someone should ask for this PR to be reverted? = It's only one library, so they should be able to easily fix all of the = code affected by this, right? :-) https://github.com/laravel/framework/pull/36262 = Interestingly Symphony Twig (which uses NULL for undefined `$context` = values), preserves NULL when passed to `twig_escape_filter()`... but = that's typically provided directly to `echo`, where NULL is accepted and = coerced to an empty string :-) ``` echo twig_escape_filter($this->env, ($context["v"] ?? null), "html", = null, true); ``` = https://github.com/twigphp/Twig/blob/b4d6723715da57667cca851051eba37867142= 90d/src/Extension/EscaperExtension.php#L195 = On the basis that PHP considers it a problem if NULL is passed to these = functions, maybe `echo(string ...$expressions)` and `print(string = $expression)` should also deprecate (and eventually fatal error) when = they receive NULL? That would be fun :-) None of these library changes (to support 8.1) are making their code = easier to read, nor are they improving the code quality. Anyway, I'm going on a pointless tangent... the important bit is that = many frameworks already (and will probably continue to) return NULL by = default, doing so has been fine (and can be useful in some cases)... and = now that NULL is going to be rejected, I can only see that creating an = upgrade problem. > * If strings containing only spaces are considered empty, it's = probably a mistake that they're not trimmed everywhere. But of course = adding $name =3D trim($name) would remove all distinction between null = and ''. Bit of a tangent, but someone can use `trim()` to check if a value is = worth saving (or using the SELECT query), but will still want to = preserve exactly what the user wrote (e.g. taking your name as an = example, searching for "Tom" vs " Tom" will return your record for both, = but not my step-brother Tom). Either way, why should `trim()` trigger a fatal Type Error if it = receives NULL? Does that actually represent problem, considering it's = fine receiving an integer? > * If null is equivalent to an empty string when deciding whether to = run the SQL, why is it not also equivalent to an empty string when = rendering the register link? The current logic means that clicking "Go" = causes a register link to appear, with an empty name on the query = string, which doesn't seem like useful functionality. While I wasn't intending it to be a real example... maybe imagine the = original code did not run the DB query when the field is left blank (or = only contains whitespace characters), probably for performance = reasons... and later, after someone complains about duplicate accounts, = they put in a small change to ensure the admin did at least one search, = but for anti-frustration reasons, they allowed the admin to just press = [enter]? > * On the other hand, if NULL is considered a different state, why is = there no behaviour distinguishing it around the SQL logic? It seems = likely that an else clause would be added there, perhaps "else { echo = 'You must enter a name.'; }" In the current code, that would appear for = both null and empty strings, which is probably a mistake. >=20 > I think that actually demonstrates quite nicely that most code would = benefit from treating "string" and "?string" as strictly different = types, and either defaulting to an empty string explicitly and early, or = considering null at every step. Simultaneously relying on null values = being preserved, and them being silently coerced, leads to fragile code = where it's not clear where nulls are deliberately handled as empty = strings, and where they've simply been forgotten about. It wasn't supposed to demonstrate anything other than the problem an = automated tool would have. But, on the basis that many developers don't seem to be "considering = null at every step" (going on the roughly 15% of scripts using = `strict_types=3D1`)... should frameworks default to returning an empty = string instead? If so, how much existing code will break if that was to = happen? Or should all of these developers specify an empty string as = their default every time they ask for a user-value? In either case, it's = a lot of work to fix the things that break, and I still cannot see the = value in doing so. >>> Telling users when they've passed null to a non-nullable parameter = is precisely about *preserving* that distinction: if you want null to = mean something specific, treating it as a string is a bug. >>=20 >> I don't think that represents a bug, we are talking about a system = that takes user input (so often nullable), supports coercion with other = simple types, and supports NULL coercion in other contexts (e.g. string = concat). >=20 >=20 > Read the sentence you replied to again: IF you want to treat null as = distinct from '', THEN failing to do so is a bug. If, on the other hand, = you just want to take nullable user input and handle it CONSISTENTLY as = a string, then it's fine to explicitly default to '' AT SOURCE. But we're talking about existing code, and exiting frameworks, who use = NULL by default; and it's been coerced perfectly fine (until 8.1, or for = user defined functions that have specified a non-nullable type). Maybe the frameworks should have taken WordPress as their example, and = used an empty string by default? And we're back to the question, what is actually wrong with passing NULL = to the mentioned functions? I can only see the argument of possibly = preserving NULL (with different backwards compatibility issues, and I = suspect better handled with your "?|>" suggestion); or those developers = who prefer a very strict environment, with no type coercion, where they = can treat NULL as an invalid value (but probably should be using static = analysis to reliably check their code never passes NULL to these = functions or performs any coercion). >>> Despite all of the above, I am honestly torn on this issue. It is a = disruptive change, and I'm not a fan of errors for errors' sake; but I = can see the value in the decision made back in 7.0 to exclude nulls by = default. >>=20 >> Thanks Rowan, I'll just add that I think the fatal Type Error for = NULL will be much more disruptive, and I would rather relax that = requirement for user defined functions, so NULL coercion works in all = contexts. >=20 >=20 > I think the main difference between our positions is that I believe = that if PHP's type system was designed from scratch today, null would = not be silently coerced in these situations. So while I agree that the = change will be disruptive, I disagree with your position that it brings = no benefit. But what is that benefit? I'm sorry, but I really don't see it. I'm going on the basis that you're ok with numbers in strings being = coerced to integers/floats (which I also see as being useful, because = you're right, most inputs are strings)... but you're not ok with NULL = being coerced (which is also common, because values aren't guaranteed to = be provided by the user, and NULL is typically the default). > On 27/04/2022 18:34, Craig Francis wrote: >> But I'm wondering, is it only one function? and assuming it's a = problem, could we use `Z_PARAM_LONG_OR_NULL()` and specifically throw an = exception when either parameter is NULL, like the `max < min` check? On = the basis that I'd rather have one extra check for this function, and = keep NULL coercion working everywhere else (i.e. where it's fine). >=20 >=20 > Well, since it's one of three examples in Guilliam's e-mail, the = answer to the first question seems rather trivially "no", unless I'm = missing something? Sorry, I can only see two examples... `get_class()`, which is more of an = example of UNKNOWN being used (and NULL has not been "allowed as of PHP = 7.2")... and `mt_rand()`, which I mentioned in the RFC, because you = provided a good example of NULL being a potential problem, not that I = can find anyone doing this publicly: https://grep.app/search?q=3Dmt_rand%28NULL&filter[lang][0]=3DPHP = = = https://grep.app/search?q=3Dmt_rand%5Cs%2A%5C%28%5Cs%2ANULL®exp=3Dtrue&= filter[lang][0]=3DPHP = https://www.google.com/search?q=3D%22mt_rand+NULL+NULL%22 = > As for the second question, certainly we could add specific = prohibitions to null on a case by case basis, but that's basically = equivalent to your previous suggestion of explicitly allowing null on a = case by case basis, and doesn't really answer the question of what the = default behaviour should be - especially bearing in mind that any = default should apply to both built-in and user-defined functions. I don't want oddities either, I was just exploring the possibility of a = developer accidentally writing `mt_rand(NULL, NULL)`, in the current = environment (where the NULL's are coerced to the integer 0); and if it's = a problem worth protecting against, how we could handle it. Craig --Apple-Mail=_073F51D3-69D3-4053-88AD-1344BE31BB9F--