Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:110812 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 93895 invoked from network); 2 Jul 2020 11:29:04 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 2 Jul 2020 11:29:04 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 08395180088 for ; Thu, 2 Jul 2020 03:18:28 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 2 Jul 2020 03:18:27 -0700 (PDT) Received: by mail-lj1-f177.google.com with SMTP id h22so23996483lji.9 for ; Thu, 02 Jul 2020 03:18:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:cc; bh=pZQQs9xfaRraxFw8XpCmVK1UFFRFQqurk/Dy5G04DR0=; b=LR0h+k07bjBejB44kyorJNI/0ZtzA2oc/9SLGupcpCDrDGjILDyghzGdu3oui8C4nz IHQFOD4RfhDuxYSqEOEKsysMHrN4arnGObeLJSsCqh4rIsXYoZKrJHCNG1/ZJeJuKp0i PSGPcvqcR0Lz4l68QNLIFWtkIXmqSctztwLtprlRrbxh9DDsivqiqOPrLWRRjbIkjqYw HQDmZqs+GyoyBavoChyG30QswUXYw+99lYiwTnzK1HioejBBhl2jH4fD2YpRNYSZy266 xiEqDp9Y85w2HpqpK0dwZwRLDSKMcZR77pRP9JPl7qb8Xu9XvDORqcxpmBlZzaX356tH Vkrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:cc; bh=pZQQs9xfaRraxFw8XpCmVK1UFFRFQqurk/Dy5G04DR0=; b=nI1lNlXmec/nuRjcjfZ3WGveY3UGJ0it96M0v+KAungBdxFbaTsYGY5OtgY4qD8xmi 4EpRPJutX0i05wZNXbe/Rcir0CMreoS6aY8d+JWpi3aHQnk0Eft1Ad9/dQe9EpAHzAZu Apz7KWstpvUTTdjE3Vdlaf/tELnBcDJXLU5DgW8cGAHh7Zms+BEORN8xnV6b8Hycg/Dx hTvjmrDm3OvkeHjLXKvNEY8P4AqM0hlg9tn4Mng86KRf2X5SvT2JOLh9j7XuFUFGfTJu O14AAZ1YwewHOEtdE4hwqh3NKIbWCBe3Oez+4AIwS+K9mcRdQzel7LE3e9Xq53KA62+c SIaA== X-Gm-Message-State: AOAM531wxDHqeOEMVH+OOEwieNXa24Y0JGm+5Gw0K7mgtNcP6fruUxUN Vai3atK1OboigJ83VK8wDjD6pMVszwkST834Xqu2tF9Y X-Google-Smtp-Source: ABdhPJy/Xnf+0PL/e/MsPBbgVeVTSGcvfnXC/6Geu6iVKeibhdVOwGJETKClBEBudL9bXUKQ6V9gI3LBnabCNPkDlAY= X-Received: by 2002:a2e:6e05:: with SMTP id j5mr15486497ljc.135.1593685105043; Thu, 02 Jul 2020 03:18:25 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Thu, 2 Jul 2020 12:18:09 +0200 Message-ID: Cc: PHP internals Content-Type: multipart/alternative; boundary="00000000000091f20905a972ba07" Subject: Re: [PHP-DEV] [RFC] Saner string to number comparisons From: nikita.ppv@gmail.com (Nikita Popov) --00000000000091f20905a972ba07 Content-Type: text/plain; charset="UTF-8" On Thu, Jul 2, 2020 at 10:09 AM Nikita Popov wrote: > On Mon, Mar 4, 2019 at 6:00 PM Nikita Popov wrote: > >> On Wed, Feb 27, 2019 at 10:23 AM Zeev Suraski wrote: >> >>> >>> >>> On Tue, Feb 26, 2019 at 2:27 PM Nikita Popov >>> wrote: >>> >>>> Hi internals, >>>> >>>> I think it is well known that == in PHP is a pretty big footgun. It >>>> doesn't >>>> have to be. I think that type juggling comparisons in a language like >>>> PHP >>>> have some merit, it's just that the particular semantics of == in PHP >>>> make >>>> it so dangerous. The biggest WTF factor is probably that 0 == "foobar" >>>> returns true. >>>> >>>> I'd like to bring forward an RFC for PHP 8 to change the semantics of == >>>> and other non-strict comparisons, when used between a number and a >>>> string: >>>> >>>> https://wiki.php.net/rfc/string_to_number_comparison >>>> >>>> The tl;dr is that if you compare a number and a numeric string, they'll >>>> be >>>> compared as numbers. Otherwise, the number is converted into a string >>>> and >>>> they'll be compared as strings. >>>> >>>> This is a very significant change -- not so much because the actual BC >>>> breakage is expected to be particularly large, but because it is a >>>> silent >>>> change in core language semantics, which makes it hard to determine >>>> whether >>>> or not code is affected by the change. There are things we can do about >>>> this, for example the RFC suggests that we might want to have a >>>> transition >>>> mode where we perform the comparison using both the old and the new >>>> semantics and warn if the result differs. >>>> >>>> I think we should give serious consideration to making such a change. >>>> I'd >>>> be interested to hear whether other people think this is worthwhile, and >>>> how we could go about doing it, while minimizing breakage. >>>> >>> >>> I generally like the direction and think we should seriously consider it. >>> >>> I think that before we make any decisions on this, or even dive too deep >>> into the discussion - we actually need to implement this behavior, >>> including the proposed INI setting you mentioned we might add in 7.4 - and >>> see what happens in some real world apps, at least in terms of potential >>> danger (as you say, figuring out whether there's actual breakage would >>> require a full audit of every potentially problematic sample. Ultimately, >>> I think there's no question that if we were to start from scratch, we'd be >>> going for something along these lines. But since we're not starting from >>> scratch - scoping the level of breakage is key here. >>> >>> Zeev >>> >> >> Totally agree that assessing the amount of breakage in real code is key >> here. I have now implemented a warning for PHP 7.4 (for now unconditional, >> no ini setting) that is thrown whenever the result of a comparison is going >> to change under the currently proposed rules: >> https://github.com/php/php-src/pull/3917 >> >> I've done a few initial tests by running this against the Laravel, >> Symfony and pear-core. The warning was thrown 2 times for Laravel, 1 times >> for Symfony and 2 times for pear-core. (See PR for the results.) >> >> Both of the warnings in pear-core pointed to implementation bugs. The >> Symfony warning was due to trailing whitespace not being allowed in numeric >> strings (something we should definitely change). One of the Laravel >> warnings is ultimately a false-positive (does not change behavior), though >> code could be improved to avoid it. I wasn't able to tell whether the other >> one is problematic, as it affects sorting order. >> >> I have to say that this is a lot less warnings than I expected. Makes me >> wonder if I didn't make an implementation mistake ^^ >> >> Regards, >> Nikita >> > > As we're moving closer to PHP 8 feature freeze, I want to give this RFC a > bump. I've updated the text to account for some changes that have happened > in the meantime, such as the removal of locale-sensitivity for float to > string conversions. > > It's been quite a while since we discussed this last, and back then the > discussion was fairly positive. Some experiments with a warning mode also > showed that the impact, at least in framework/library code, appears to be > fairly low in practice, contrary to what one might intuitively expect. > > Now would be the time to decide whether or not we want to pursue this > change for PHP 8. > Thinking about the detailed behavior of the RFC, I think it might be better to change the semantics to: > If one of the operands is NaN, return uncomparable (1). > Otherwise, convert the int/float operand to string with precision=-1 and compare both operands as strings (with the usual "smart" logic). This differs from what the RFC currently specifies only in the behavior of comparison between a float and a non-numeric strings (because the string representation of the float currently depends on precision), but I think specifying it this way simplifies the mental model behind it, and makes the intended semantics more obvious. The actual implementation would of course be closer to what is specified in the RFC right now. Regards, Nikita --00000000000091f20905a972ba07--