Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:116757 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 90721 invoked from network); 2 Jan 2022 06:10:46 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 2 Jan 2022 06:10:46 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id E0F671804D4 for ; Sat, 1 Jan 2022 23:17:24 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.4 required=5.0 tests=BAYES_00,BODY_8BITS, DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_NONE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-vk1-f180.google.com (mail-vk1-f180.google.com [209.85.221.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 1 Jan 2022 23:17:24 -0800 (PST) Received: by mail-vk1-f180.google.com with SMTP id c10so16953763vkn.2 for ; Sat, 01 Jan 2022 23:17:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dqxtech-net.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=5DLGuzob08Ypct0k/vY8Oh682HPxGXrLcBLFIRMRYLE=; b=epeaVag7jkJyqg9qCbbR+ShB7s4L6u/K4fmuoIFN5JycvAJuSQBqdcQXeff7olbsSK DwQJ9qao6v6ie9dQCBf1PH1AFKOKzigMqzgkJbHc0auCakBmPhYEB5hYXPQvLn4H6BM2 CbtjDNJG+K4NiGBOMh7ugWgyeITl+oSKUjnI4r5picIYvnmVlyRar2PE9xhf5HAzCqRz grvd3eEmY66BcjImKjJT9sjNEWs8fZ/FXUGm8y31Vl6E1Hvcvk1uhAnQKkvnbyNcq8Lo JePw91l17cpQah3aoppY4ogNsVorPn6JNbua+URkQJu80uNat7yswVxEtBK4FQu2TQs5 3yGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=5DLGuzob08Ypct0k/vY8Oh682HPxGXrLcBLFIRMRYLE=; b=nf9oeE5sFcc/6D/6WC592c4fo2dk7S1IsSW8kfYWdP6wXlGcUkz+jVKh+qGb+6W57m wy59ZoHKw+HQ/qJQDPdmP6BOlELTOJEXOUYFgQYDCRJ/3z3naXdtbfdUsLiXs0e7BXnj uuJ/lUQy5htnZoCmmbmA+bs1nVpaGjDfcNI7lSvNaqFas78cqycRYLzHNq1/whkqFCX0 UQxYv7BxIOaTgLd6WjAhtJ66GkbA+vSO5s2jA04GKPTkLmc3+kQrXk1qBv7KSumr4Yf/ Z4OKmCQ7dvCL69rTPo8znmB2xFdGN51bFO5Y24M4DzFzIKe6kCpE0rMDOGGNDWiyA+F5 ehkA== X-Gm-Message-State: AOAM533UYp6aTWRnplKlr18U4zLCcRbPqbrIp6Xzmk/PqCojGNkhCNF6 p+KCz3xjzvBPjaKF5s2okbXIJWN4h5SfADfbdKPoxw== X-Google-Smtp-Source: ABdhPJxKiuC9YWygs7cAtubXQwLb76yoxy+xBPtRid9oqOwh/KUAYTGobCIduBV/4DbUA68t4voT/uG1Ox+TOswYjBI= X-Received: by 2002:a05:6122:d0f:: with SMTP id az15mr14039965vkb.28.1641107843429; Sat, 01 Jan 2022 23:17:23 -0800 (PST) MIME-Version: 1.0 References: <1640910093.890171965@f721.i.mail.ru> <1641095231.967164658@f750.i.mail.ru> In-Reply-To: Date: Sun, 2 Jan 2022 08:17:12 +0100 Message-ID: To: Michael Morris Cc: Kirill Nesmeyanov , internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] RFC: Stop to automatically cast numeric-string to int when using them as array-key From: andreas@dqxtech.net (Andreas Hennings) On Sun, 2 Jan 2022 at 06:20, Michael Morris wrote: > > On Sat, Jan 1, 2022 at 10:47 PM Kirill Nesmeyanov wrote: > > > > > >=D0=A1=D1=83=D0=B1=D0=B1=D0=BE=D1=82=D0=B0, 1 =D1=8F=D0=BD=D0=B2=D0=B0= =D1=80=D1=8F 2022, 17:41 +03:00 =D0=BE=D1=82 Rowan Tommins < > > rowan.collins@gmail.com>: > > > > > >On 31/12/2021 00:21, Kirill Nesmeyanov wrote: > > >> I support this behavior fix because in its current form, due to a > > similar problem (almost?), all PSR-7 implementations contain bugs that > > violate RFC7230 (section 3.2: > > https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus, > > physically, by the standard, all headers can have the name "0" (like = =C2=AB0: > > value=C2=BB), but when stored inside implementations, it is converted t= o a > > string and a problem arises ($message->getHeaders() // > > returns array instead of array). The solution is to cast the keys back to string when reading from the array, IF the type matters. foreach ($headers as $k =3D> $values) { $name =3D (string) $k; } We could introduce an alternative to array_keys() that would do this automatically, e.g. "array_keys_str()". > > > > > >You appear to be technically correct - the RFC defines a header name > > >only as "token", which implies the following would all be valid HTTP > > >headers: > > > > > >42: The Answer > > >!: Bang > > >^_^: Surprised > > > > > >In practice, it would be a bad idea to use any of these. > > > > > >Every single one of the field names registered with IANA [1] starts wi= th > > >a letter, and proceeds with only letters, digits, and hyphen ('-'). [T= he > > >exception is "*", listed there as "reserved" to specifically prevent i= ts > > >use conflicting with the wild-card value in "Vary" lists.] > > > > > >I'm actually surprised this definition hasn't been updated with > > >interoperability advice in recent revisions of the standard. I did fin= d > > >this general advice for internet message headers in RFC 3864 [2]: > > > > > > > Thus, for maximum flexibility, header field names SHOULD further be > > > > restricted to just letters, digits, hyphen ('-') and underscore ('= _') > > > > characters, with the first character being a letter or underscore. > > > > > >The additional restriction on underscore ('_') in HTTP arises from CGI= , > > >which maps headers to environment variables. For instance, Apache http= d > > >silently drops headers with anything other than letters, digits, and > > >hyphen [3] to avoid security issues caused by environment manipulation= . > > > > > >If I was developing a PSR-7 or similar library, I would be inclined to > > >drop any header composed only of digits, and issue a diagnostic warnin= g, > > >so that it wouldn't escalate to a type error later. It certainly doesn= 't > > >seem reasonable to change the entire language to work around that > > >inconvenience. > > > > > >[1] https://www.iana.org/assignments/http-fields/http-fields.xhtml > > >[2] https://datatracker.ietf.org/doc/html/rfc3864#section-4.1 > > >[3] https://httpd.apache.org/docs/trunk/env.html#setting > > > > > >Regards, > > > > > >-- > > >Rowan Tommins > > >[IMSoP] > > > > > >-- > > >PHP Internals - PHP Runtime Development Mailing List > > >To unsubscribe, visit: https://www.php.net/unsub.php > > > > I just gave an example of what at the moment can cause an exception in = any > > application that is based on the PSR. It is enough to send the header "= 0: > > Farewell to the server". In some cases (for example, as is the case wit= h > > RoadRunner) - this can cause a physical stop and restart of the server. > > > > Just in case, I will repeat my thesis: I cannot imagine that anyone is > > using this functionality consciously and that it is part of the real lo= gic > > of the application. It is not really relevant weather this is used _consciously_. > > > You don't have a lot of experience with legacy code then. PHP, particular= ly > old PHP (like 4, 5.1 era) was used by a lot of idiots. > > I was one of those idiots (Perhaps I still am an idiot - jury is > deliberating on that but I digress). We don't need to assume incompetence. Any code that deals with arrays _must_ consider this behavior, unless the array keys are known to be only integers, or only non-integer-like strings. One obvious BC break: What would be the value of $a in the following snippet? $a =3D []; $a['5] =3D 's'; $a[5] =3D 'n'; $a[7] =3D 'n'; $a['7'] =3D 's'; Currently it would be [5 =3D> 'n', 7 =3D> 's']. With the "new" behavior, we'd have to decide what happens. Can keys '5' and 5 coexist? ['5' =3D> 's', 5 =3D> 'n', 7 =3D> 'n', '7' =3D>= 's']? Or would assignment change the key type? [5 =3D> 'n', '7' =3D> 's']? Or does the initial key type remain, and only the value changes? ['5' =3D> 'n', 7 =3D> 's']? I would argue that the current behavior might still be the best we can get for a general-purpose structure that can act as a vector or a map or a mix of both. The perceived awkwardness is just a result of trying to do everything at on= ce. Possible solutions: - Dedicated array-reading methods that cast all keys to string on read. - New structures, alternative to array, that either allow separate entries for 5 and '5', or that are restricted to one key type. --- Andreas > > Snark aside though, PHP has more than its fair share of self taught > programmers (again, not trying to be insulting as I am one myself), and > they do things with the code that veterans and formally trained programme= rs > would never think to try, let alone implement. > > I guarantee fixing how key handling is done will break something - either > in the form of code exploiting the weird behavior, or code that is guardi= ng > against the weird behavior; not to mention any tests that might be writte= n > - though amateurs rarely write test code (again, speaking from past > experience I've grown beyond). > > > > > And fixing this behavior, I believe, will automatically fix many librar= ies > > (not necessarily PSR) that do not take this behavior into account. > > > > > > And blow up who knows how many old code bases - many of which don't have > unit test suites to discover if there is a break ahead of time. This is > the sort of BC break that would cause a cliff of users unable to migrate = to > the major version that implements it. A Python 2 vs. 3 style of break. > > Even with that all said it may indeed be worth fixing - but this will > require the same sort of kid gloves approach removing register globals ha= d > (for the newer folks, there was a time when $_REQUEST["var"] would auto > populate $var with lovely security snarls). IIRC PHP 3 had register > globals always on, 4 created a config toggle to turn them off, and PHP 5.= 0 > turned that toggle off by default, finally PHP 5.3 (6 without unicode mor= e > or less) removed support for register globals entirely (My memory could b= e > off - it's in the changelogs for the curious). > > I leave the decision making to the maintainers and contribs who do the > actual work. Hell, I personally don't even use PHP that much these days > having gotten a job where I focus on writing Cucumber tests in JavaScript > that run on node.js. I keep up with PHP and this list though cause one > never knows what the next job will entail. I just dropped out of lurk mod= e > to underscore along with others up thread the massive ramifications of wh= at > is being proposed. As someone who wrote stupid code I can see this > breaking, tread lightly. And hell, I don't even know how much of that cod= e > is still in use since I've changed employers many times since it was > written. This situation is not unique and can create huge headaches for > companies running projects on legacy code bases.