Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117605 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 99989 invoked from network); 25 Apr 2022 19:32:11 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 25 Apr 2022 19:32:11 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 12EF9180384 for ; Mon, 25 Apr 2022 14:07:15 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 25 Apr 2022 14:07:14 -0700 (PDT) Received: by mail-wm1-f50.google.com with SMTP id x3so9995615wmj.5 for ; Mon, 25 Apr 2022 14:07:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :references:from:in-reply-to:content-transfer-encoding; bh=jnej2D51MfK9wzB4QtXtkYzR9x6dtOTsAaPzf8tnrfg=; b=E/Z0LKych8X2C1MUNpjhnjQlvzHpksdscwu7WF0qL52d1h04uT23j9Nec44U5HNHF0 fTbmvVmyVfIfQrioxJyNkiVATjKFDXzm0+K0fn++fRB3O4VD5dxj6wgE4XRGo8SDJWKZ /K1fGzcbxYg3yzbREcJM+z30wiCtNzqE3KbUgOh8LkY0DLpGTicB1hwqMGlMrXnD8Yvz SIc6wH1cDEOHFahTtHLx3QeyKhT3WrkCFuIvkHE9UqZ+4uzdvmbSry58IFq34XFxy7Gg cyTVimSil7ppgBo0GBJHzNMx99yqEgLJG8/wADChNL4vMHT5ucS2hBF1aeYh3eRBe+lk VVUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=jnej2D51MfK9wzB4QtXtkYzR9x6dtOTsAaPzf8tnrfg=; b=QnJMsfPwLPh8GRnUMIHFr0sqHzBmdVCcNQvv5DBB5EScUTzybsNI9ddP24yep49s8P y6f2sdnxrb84Vryh/yQk+891+T3idmFJEkRkPztG8K04BlI4tYqx+/EOEWcrxktiGw65 iHLq4A8YiVXx1d64iXKtjo8y2m6ljwavugUwIjHtTeJDejY/ZdEOl94DM1PiXyNOPe1+ 746sA8zLln+eT5jlcNMbZgBrf43faerUcYgJW4ZZ4OggYQIPpEED4mSPQUEAxUiNvgj4 CjCwKc3hcr5hvcotjhR6fA7TVol6ciHdgpN2nV5IQJDtuejJbmKNSRYmvgEroggkjeyJ Oxlg== X-Gm-Message-State: AOAM533ugSqCGvvCB8p84t0tDqqT9YbAhj3isbFHZ6LN6X9pBTHA+Ng/ 34DO+phXsEOGMMkUmQnSEf3vFyhMbno= X-Google-Smtp-Source: ABdhPJw2pe33ADzMFCEpq5M3XFGRsLCrjPk1pinmixetHhsfDJvZeC9KSd6t0j9NDQfNGfX7JfoE+A== X-Received: by 2002:a05:600c:4f12:b0:393:ed40:5fa7 with SMTP id l18-20020a05600c4f1200b00393ed405fa7mr4730374wmq.70.1650920833313; Mon, 25 Apr 2022 14:07:13 -0700 (PDT) Received: from [192.168.0.22] (cpc104104-brig22-2-0-cust548.3-3.cable.virginm.net. [82.10.58.37]) by smtp.googlemail.com with ESMTPSA id j8-20020a05600c190800b00393e80a7970sm5474022wmq.7.2022.04.25.14.07.12 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 25 Apr 2022 14:07:12 -0700 (PDT) Message-ID: Date: Mon, 25 Apr 2022 22:07:11 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Content-Language: en-GB To: PHP internals References: <42D0A480-F262-4F72-9C4D-887762A8D800@gmail.com> <0b061f28-a087-efd3-8602-424ee03458e0@gmail.com> <7DB0A01F-04FB-420D-9025-E027E5DE02F7@craigfrancis.co.uk> In-Reply-To: <7DB0A01F-04FB-420D-9025-E027E5DE02F7@craigfrancis.co.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] NULL Coercion Consistency From: rowan.collins@gmail.com (Rowan Tommins) On 25/04/2022 10:33, Craig Francis wrote: >> The fact that internal functions have parameter parsing behaviour that is almost impossible to implement in userland, and often not even consistent between functions, is a wart of engine internals, not a design decision. > Bit of a tangent, but do you have some examples? would be nice to clean some of these up, or at least have them in mind as we discuss this RFC. Fundamentally, the internal parameter handling system (ZPP) is completely separate from the way function signatures work in userland, and evolved based on a different set of requirements. The emphasis of ZPP is on unwrapping zval structs to values that can be manipulated directly in C; so, for instance, it has always had support for integer parameters. Since 7.0, userland signatures have evolved an essentially parallel set of features with an emphasis on designing a consistent and useful dynamic typing system. Increasingly, ZPP is being aligned with the userland language, which also allows reflection information to be generated based on PHP stubs. For instance: * Making rejected parameters throw TypeError rather than raise a Warning and return null * Giving optional parameters an explicit default in the signature rather than inspecting the argument count * Using union types, rather than ad hoc if/switch on zval type The currently proposed change to how internal functions handle nulls in 9.0 is just another part of that process - the userland behaviour is well-established, and we're making the ZPP behaviour match. Off the top of my head, I don't know what other inconsistencies remain, but my point was that in every case so far, internal functions have been adapted to match userland, not vice versa. > So I'll spend 1 more... I think it's fair to say that developers using `strict_types=1` are more likely to be using static analysis; and if `strict_types=1` is going to eventually disappear, those developers won't lose any functionality with the stricter checking being done by static analysis, which can check all possible variable types (more reliable than runtime), and (with the appropriate level of strictness) static analysis can do things like rejecting the string '5' being passed to an integer parameter and null being passed to a non-nullable parameter. There's an unhelpful implication here, and in your discussion of testing, that PHP users can be divided into two camps: those who check program correctness with static analysis tools, unit tests, etc; and those who don't care about program correctness. Instead, how about we think about those who are writing new code and want PHP to tell them early when they do something silly; and those who are maintaining large code bases and have to deal with compatibility problems. Neither of these groups is helped enough by static analysers - as you've rightly pointed out elsewhere, static checks are *not* reliable in a dynamic language, and are not likely to be built-in any time soon. I'm by no means the strongest advocate of strictness in PHP - I think there is a risk of throwing out good features with the bad. But I would love to see strict_types=1 become unnecessary - not because "everyone's running static analysers anyway, so who cares" but because the default behaviour provides a good balance of safety and usability. That makes me very hesitant to use the strict_types modes as a crutch for this compatibility break - it only puts off the question of what we think the sensible behaviour actually is. > Thank you; and you're right, if you write new code today, you could do that, but that assumes you don't need to tell the difference between an empty value vs a missing value As I've said multiple times now, as soon as you pass it to a function that doesn't have specific handling for nulls, you lose that distinction anyway. There is literally zero difference in behaviour between "$foo = htmlspecialchars($_GET['foo'] ?? null)" and "$foo = htmlspecialchars($_GET['foo'] ?? '')". Telling users when they've passed null to a non-nullable parameter is precisely about *preserving* that distinction: if you want null to mean something specific, treating it as a string is a bug. > But, updating existing code, while that would make automated updates easier, it's likely to cause problems, because you're editing the value source, with no idea about checks later on (like your example which looks for NULL)... and that's why an automated update of existing code would have more luck updating the sinks rather than the sources (e.g. it knows which sinks are expecting a string, so it can add in a `strval($var)`, or `(string) $var`, or `$var ?? ""`). That's a fair point, although "sinks" are often themselves the next "source", which is what makes static analysis possible as often as it is. Despite all of the above, I am honestly torn on this issue. It is a disruptive change, and I'm not a fan of errors for errors' sake; but I can see the value in the decision made back in 7.0 to exclude nulls by default. Regards, -- Rowan Tommins [IMSoP]