Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117636 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 65065 invoked from network); 27 Apr 2022 14:15:50 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 27 Apr 2022 14:15:50 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 7E64B1804DB for ; Wed, 27 Apr 2022 08:51:21 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 27 Apr 2022 08:51:21 -0700 (PDT) Received: by mail-wm1-f42.google.com with SMTP id l16-20020a05600c1d1000b00394011013e8so1129447wms.1 for ; Wed, 27 Apr 2022 08:51:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=craigfrancis.co.uk; s=default; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=yjofm+n045H77JL5xkxuxDMDUDOyig/y1OzSk58H4AM=; b=AcOFeTZh7OldRv9ZwwWcOV6bTDrsdmapjyny1ni5xOyPsSmEpNgmle7CyxeDh0slbn nksX28FoHNTYPO80wyD+4z2JSQKouqy9WmqCk6UDelqQAEln0HJZ4a8Rmts00hq78h0T AJMWz00xC3jS/4+J4mpIh1z+wW9KuiNy04dnM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=yjofm+n045H77JL5xkxuxDMDUDOyig/y1OzSk58H4AM=; b=d+c4AQ3hfnCM3ePxCBeKC0Jzb6xDYBJtxGKAN0qIhip5YH+bkNNLcmmxCiUk5UAHr5 nOg+EjP2Y5KTsoxL+kVN0p11zgViSSszq7YSTO7FPkGYytajM0k6qvPclGmF5729WvB7 sw1lhvUrz2TfnQuK5lbT80GWHwC2IUcxss2YHxi1kxa4ogZ/9RWebrNX+R/Qm7mtlGb/ TjTBG9fYtbpBk2a3Jz7wepi3p+1eLbKJ66Nn4JIqilj2mbCeAzs6onZqzqB43mtkjWrj YYZBq0jkD4ezGD36YgX2gCDa0/BVcAa/KoEiRiQ6dGSIy1ytrmnGh2Z1b8PTNqlNXhWn vW3A== X-Gm-Message-State: AOAM533K6Q3BYoqNDSO+059HZxRHKVkq4wYy2ow+0qb83PwLFjb1Kg+D Ydwtg4LJLUEhq6r+4ViENT5VGw== X-Google-Smtp-Source: ABdhPJzkeRHDOqZf7ndTmpOx4QHll8XSPFpNx2rUbCF8bP/cZnvx6ZjAo/h8aRN1GWHuRShx+hQvEw== X-Received: by 2002:a05:600c:3ac4:b0:392:8906:7e53 with SMTP id d4-20020a05600c3ac400b0039289067e53mr36148724wms.23.1651074679487; Wed, 27 Apr 2022 08:51:19 -0700 (PDT) Received: from smtpclient.apple ([94.173.138.98]) by smtp.gmail.com with ESMTPSA id z7-20020a7bc7c7000000b0038eaf85b0absm1811401wmk.20.2022.04.27.08.51.17 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Apr 2022 08:51:18 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.80.82.1.1\)) In-Reply-To: Date: Wed, 27 Apr 2022 16:51:17 +0100 Cc: PHP internals Content-Transfer-Encoding: quoted-printable Message-ID: <9859B3B4-091A-4311-8F68-F6C35FBC32A1@craigfrancis.co.uk> References: <42D0A480-F262-4F72-9C4D-887762A8D800@gmail.com> <0b061f28-a087-efd3-8602-424ee03458e0@gmail.com> <7DB0A01F-04FB-420D-9025-E027E5DE02F7@craigfrancis.co.uk> To: Rowan Tommins X-Mailer: Apple Mail (2.3696.80.82.1.1) Subject: Re: [PHP-DEV] NULL Coercion Consistency From: craig@craigfrancis.co.uk (Craig Francis) On 25 Apr 2022, at 22:07, Rowan Tommins wrote: > On 25/04/2022 10:33, Craig Francis wrote: >=20 >>> The fact that internal functions have parameter parsing behaviour = that is almost impossible to implement in userland, and often not even = consistent between functions, is a wart of engine internals, not a = design decision. >> Bit of a tangent, but do you have some examples? would be nice to = clean some of these up, or at least have them in mind as we discuss this = RFC. >=20 >=20 > Fundamentally, the internal parameter handling system (ZPP) is = completely separate from the way function signatures work in userland, = and evolved based on a different set of requirements. The emphasis of = ZPP is on unwrapping zval structs to values that can be manipulated = directly in C; so, for instance, it has always had support for integer = parameters. Since 7.0, userland signatures have evolved an essentially = parallel set of features with an emphasis on designing a consistent and = useful dynamic typing system. >=20 > Increasingly, ZPP is being aligned with the userland language, which = also allows reflection information to be generated based on PHP stubs. = For instance: >=20 > * Making rejected parameters throw TypeError rather than raise a = Warning and return null > * Giving optional parameters an explicit default in the signature = rather than inspecting the argument count > * Using union types, rather than ad hoc if/switch on zval type >=20 > The currently proposed change to how internal functions handle nulls = in 9.0 is just another part of that process - the userland behaviour is = well-established, and we're making the ZPP behaviour match. >=20 > Off the top of my head, I don't know what other inconsistencies = remain, but my point was that in every case so far, internal functions = have been adapted to match userland, not vice versa. Thank you Rowan, that's really helpful. >> So I'll spend 1 more... I think it's fair to say that developers = using `strict_types=3D1` are more likely to be using static analysis; = and if `strict_types=3D1` is going to eventually disappear, those = developers won't lose any functionality with the stricter checking being = done by static analysis, which can check all possible variable types = (more reliable than runtime), and (with the appropriate level of = strictness) static analysis can do things like rejecting the string '5' = being passed to an integer parameter and null being passed to a = non-nullable parameter. >=20 >=20 > There's an unhelpful implication here, and in your discussion of = testing, that PHP users can be divided into two camps: those who check = program correctness with static analysis tools, unit tests, etc; and = those who don't care about program correctness. >=20 > Instead, how about we think about those who are writing new code and = want PHP to tell them early when they do something silly; and those who = are maintaining large code bases and have to deal with compatibility = problems. Neither of these groups is helped enough by static analysers - = as you've rightly pointed out elsewhere, static checks are *not* = reliable in a dynamic language, and are not likely to be built-in any = time soon. >=20 > I'm by no means the strongest advocate of strictness in PHP - I think = there is a risk of throwing out good features with the bad. But I would = love to see strict_types=3D1 become unnecessary - not because = "everyone's running static analysers anyway, so who cares" but because = the default behaviour provides a good balance of safety and usability. >=20 > That makes me very hesitant to use the strict_types modes as a crutch = for this compatibility break - it only puts off the question of what we = think the sensible behaviour actually is. Yep, I agree, and now I know what George is planning, with the gradual = removal of `strict_types=3D1`, you're right that should not be how the = two groups are defined. I would prefer PHP itself to be strict enough to identify and complain = about actual mistakes/problems, but be tolerant and allow reasonable = forms of type coercion (e.g. your example with a string holding an = integer being coerced to an actual integer as needed). That's where I = see static analysis providing strict checking for those who want it = (e.g. they can choose to not allow any coercion). The only issue I have is when NULL is passed to functions like = urlencode(), I do not see that as a problem, and I think NULL coercion = should continue to also work in that context (I really don't see why it = justifies a fatal Type Error for everyone, as not everyone treats NULL = as an invalid value). I've made some tweaks to my RFC in an attempt to better reflect that. >> Thank you; and you're right, if you write new code today, you could = do that, but that assumes you don't need to tell the difference between = an empty value vs a missing value >=20 >=20 > As I've said multiple times now, as soon as you pass it to a function = that doesn't have specific handling for nulls, you lose that distinction = anyway. There is literally zero difference in behaviour between "$foo =3D = htmlspecialchars($_GET['foo'] ?? null)" and "$foo =3D = htmlspecialchars($_GET['foo'] ?? '')". You're right that changing the NULL coalesce at that point would be = fine, because the value is only going to htmlspecialchars() - it's why I = think automated tools will have much more luck taking this approach = (even if it seems redundant) But we're usually looking at variables for user input (often nullable), = that are set earlier in the script, then that nullable variable is used = multiple times later. Forgive this primitive example, but this shows `$name` being used in = three different ways, where an automated tool cannot simply change line = 1 so it doesn't return a NULL, because it would break the Register link. ``` $name =3D $request->get('name'); if (trim($name) =3D=3D=3D '') { // Contains non-whitespace characters; = so not "", " ", NULL, etc $where_sql[] =3D 'name LIKE ?'; $where_val[] =3D $name; } echo '
'; if ($name !=3D=3D NULL) { $register_url =3D '/admin/accounts/add/?name=3D' . urlencode($name); echo '

Add = Account

'; } ``` In this example, why does `trim()` and `htmlspecialchars()` justify a = fatal Type Error? > Telling users when they've passed null to a non-nullable parameter is = precisely about *preserving* that distinction: if you want null to mean = something specific, treating it as a string is a bug. I don't think that represents a bug, are we are talking about a system = that takes user input (so often nullable), supports coercion with other = simple types, and supports NULL coercion in other contexts (e.g. string = concat). >> But, updating existing code, while that would make automated updates = easier, it's likely to cause problems, because you're editing the value = source, with no idea about checks later on (like your example which = looks for NULL)... and that's why an automated update of existing code = would have more luck updating the sinks rather than the sources (e.g. it = knows which sinks are expecting a string, so it can add in a = `strval($var)`, or `(string) $var`, or `$var ?? ""`). >=20 >=20 > That's a fair point, although "sinks" are often themselves the next = "source", which is what makes static analysis possible as often as it = is. Yep, it can be complicated... and that's why I like your "$foo ?|> = htmlspecialchars(...)" suggestion, because existing code (like the = example above) expects functions like trim() to always return a string = (rather than returning a NULL when provided with a NULL), and that will = continue to work as expected/documented, but you get to introduce a new = syntax to allow NULL to propagate though expressions. > Despite all of the above, I am honestly torn on this issue. It is a = disruptive change, and I'm not a fan of errors for errors' sake; but I = can see the value in the decision made back in 7.0 to exclude nulls by = default. Thanks Rowan, I'll just add that I think the fatal Type Error for NULL = will be much more disruptive, and I would rather relax that requirement = for user defined functions, so NULL coercion works in all contexts. Craig