Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117672 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 29258 invoked from network); 5 May 2022 13:52:29 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 5 May 2022 13:52:29 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 02A21180545 for ; Thu, 5 May 2022 08:30:01 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 5 May 2022 08:30:00 -0700 (PDT) Received: by mail-wm1-f51.google.com with SMTP id bg25so2872838wmb.4 for ; Thu, 05 May 2022 08:30:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=craigfrancis.co.uk; s=default; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=2KTBDC2Xd6uzgwr4r3ktA0ILySoV8hvo/7v5gMmelas=; b=C7+LK4eyb5HM3YBJrsOeY8QzE3Mb0Dh7PZMkhuXcNpRSal/wQBpdrsp2fMTNSacpVG 8qvEEJX4OmuE6etoWUcXJn8QGrgGJFAGYkf9RANWF00MsyvCpGkYUv8Q+2c4R3+2tyMs 5jQENuaiz1tjbPYDUaLqAfPguWbZluDGXDu04= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=2KTBDC2Xd6uzgwr4r3ktA0ILySoV8hvo/7v5gMmelas=; b=GqTWzpk0xRf4gllWJjVcqMF0k22+bxuxFCcZU1BawCmkjV0FNesBX5dkSARky3ncvw oSerKbj/qnZ8GiexNPXiCeuyuYM4kSucB7GDBBZy38FBQFXOXFvAqdLKtbeR1jvvrJpG GbdHHpUkWs78RTgoz3LVuthPJBPBAdby7PFk5hTytps4vX3PcIFHnesonKOC6yBWVFTG nTdICJEOLJBWwtulBx63FL+tJhMIvMzZNl75yyaxbgzk0YkSEVn4h6kWGfOrJ4NKpYOZ KzsDWTwkjOIBznjDfwprkFZ6szQgumWc1mFfMc/ST3wLsZfMrS7ajMTGHchHYt66iyzB wyXA== X-Gm-Message-State: AOAM532vzh6d82q+NtTdlqqLdEF//Ec3EGjbeJ3CrADIPEw2CRFSF4V4 GkQ0L9TBOwx17Eqzj50C3Rae1mBjWci9XA== X-Google-Smtp-Source: ABdhPJwZl97ZKfVUDpQXqqsoeCVnyK5ktZZzPmhUaFYl44i2GkcW8ay4MGn9OwWE/lX1g5OU0h6fOA== X-Received: by 2002:a7b:c3cb:0:b0:394:3533:c712 with SMTP id t11-20020a7bc3cb000000b003943533c712mr5588886wmj.141.1651764598879; Thu, 05 May 2022 08:29:58 -0700 (PDT) Received: from smtpclient.apple ([94.173.138.98]) by smtp.gmail.com with ESMTPSA id k1-20020a5d5181000000b0020c5253d8f5sm1429651wrv.65.2022.05.05.08.29.57 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 05 May 2022 08:29:57 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.80.82.1.1\)) In-Reply-To: Date: Thu, 5 May 2022 16:29:56 +0100 Cc: PHP internals Content-Transfer-Encoding: quoted-printable Message-ID: References: <42D0A480-F262-4F72-9C4D-887762A8D800@gmail.com> <0b061f28-a087-efd3-8602-424ee03458e0@gmail.com> <7DB0A01F-04FB-420D-9025-E027E5DE02F7@craigfrancis.co.uk> <9859B3B4-091A-4311-8F68-F6C35FBC32A1@craigfrancis.co.uk> <0484c0c8-569f-7889-343d-829fb820c64d@gmail.com> <84E29842-B411-451A-94B7-B4707953DA94@craigfrancis.co.uk> To: Rowan Tommins X-Mailer: Apple Mail (2.3696.80.82.1.1) Subject: Re: [PHP-DEV] NULL Coercion Consistency From: craig@craigfrancis.co.uk (Craig Francis) On 3 May 2022, at 14:55, Rowan Tommins wrote: >=20 > On 03/05/2022 12:37, Craig Francis wrote: >> But what is that benefit? I'm sorry, but I really don't see it. >=20 >=20 > I started drafting a longer reply, but honestly I don't think we're = getting anywhere. Every attempt to explain the benefit seems to end in = one of two ways: >=20 > - an endless back and forth nit-picking hypothetical situations where = it might or might not be useful > - an outright dismissal that "people who want it strict can go over = there and use strict_types=3D1 and/or static analysis" >=20 > To me, it's *always* about trade-offs: the *benefit* of strict checks = exists for everyone, and the question is whether they want to pay the = *cost* or not. As long as we can't agree on that fundamental point, = there's no point continuing the discussion. I hope I don't come across like that; I'm really trying to understand = the benefits and costs (I tend to use small examples to help myself = understand). I'm not trying to be dismissive with the use of static analysis. I just = think PHP should be tolerant of some things (e.g. string '5' to int 5, = and null to empty string), but I also recognise some developers prefer a = very strict environment that does not do any type coercion (that's where = I think static analysis works really well, as it can enforce extra = checks, including type checks for all variables from all sources to all = sinks). That said, I do see value in some Type Errors, like how I updated my RFC = a couple of weeks ago with some examples "like `substr($string, = =E2=80=9Coffset=E2=80=9D)` and `htmlspecialchars(array())` as being = clearly problematic" (thanks again George). I'm also fine with `substr('abc', $offset)` rejecting an Empty String or = NULL for `$offset` (I'll note that `$offset` was never added to my list = of 335 parameters). Under "Future Scope" I've given 4 example parameters that probably = should reject an Empty String or NULL (because they do represent = problems, similar to how `$separator` in `explode()` already has the = =E2=80=9Ccannot be empty=E2=80=9D fatal error). And finally, I can see how `mt_rand(NULL, NULL)` could be a problem = (someone assuming NULL represents a default value, but it's coerced to = the integer 0), but as I noted in my previous email, I cannot find = anyone doing this, and after re-checking my lists and having a re-look = though the manual, I think it's the only one that benefits from the = rejection of NULL coercion. Taking that as my rough position on type coercion, I don't see a = *benefit* from blocking NULL coercion (more below). Whereas, blocking NULL coercion does introduce an upgrade *cost* (not = made easier due to the lack of tooling); and the continuing cost to some = developers using the noted frameworks or `filter_input()` (e.g. always = specifying an empty string default, or always manually casting NULL to a = string)... the other cost is the weirdness in how NULL coercion still = works for echo()/print(), string concatenation, =3D=3D comparisons, = arithmetics, sprintf, etc. >> I'm going on the basis that you're ok with numbers in strings being = coerced to integers/floats (which I also see as being useful, because = you're right, most inputs are strings)... but you're not ok with NULL = being coerced (which is also common, because values aren't guaranteed to = be provided by the user, and NULL is typically the default). >=20 >=20 > I will reply to this point, though, because I think it's a genuinely = interesting thing to ponder. >=20 > One significant difference is that not only is it often not *useful* = to distinguish an input of 123 from '123', it's often not *possible*. = There is literally no way for an HTTP URL or header to contain an = integer, rather than a string representation of one, because it's not a = binary protocol. >=20 > On the other hand, you might well receive an empty string as input = where you're expecting an integer. Notably, that is *not* coerced = automatically to zero; the code has to explicitly decide if that should = trigger distinct behaviour (such as a validation error) or be treated as = a default value. Not receiving a field you expected feels very similar, = so similar behaviour feels reasonable. I'm someone who will try to justify some very strict coding styles - = like no inline JavaScript, use of Trusted Types, the use of = literal-string for SQL/HTML/etc, and in some cases the use of = application/xhtml+xml (these have easily provable benefits, but they can = also be tricky, so few developers use them). With your example, I'm probably fine with an Empty String or NULL not = being coerced to int 0 (as in, I could see how it might represent a = problem, although I wouldn't care if it did get coerced to 0). But a lot of existing PHP code simply takes user input (which can be = NULL), and passes it to these functions with no expectation of a fatal = error (not good if it's in mid-way though processing data). If we were talking about a desktop application, where the UI was defined = and displayed by that application, then a missing field would represent = a problem in that application, but we're typically talking about the web = with PHP, where the data often comes from an un-trusted browser... i.e. = the user/browser/extension/network can be doing something odd, all the = way down to how a standard HTML checkbox works (unchecked does not = provide a field). It's because of these oddities, and the way NULL has historically = worked, many developers simply don't see the difference between an empty = or missing field (like other sources of NULL). That's why I don't see any benefit to blocking NULL coercion in this = context, as an Empty String or NULL are often seen as the same - e.g. a = programmer is simply checking if a name was provided, checking an email = address contains the '@' character, checking if a message is too long, = trimming the whitespace from a value, getting a record with an = id/slug/name/ref, adding the value to a url, showing the search term, = etc. Craig