Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:113368 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 44946 invoked from network); 4 Mar 2021 12:12:21 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 4 Mar 2021 12:12:21 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 489DA180503 for ; Thu, 4 Mar 2021 04:03:06 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 4 Mar 2021 04:03:05 -0800 (PST) Received: by mail-wm1-f49.google.com with SMTP id o2so8900610wme.5 for ; Thu, 04 Mar 2021 04:03:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=oDHsLOXCfDKN4SGUw050hXi0KQ1J1PzUcjq43IgNvVg=; b=IwKb/JeLYfcy7xCpZeJBl2d6qLsDffV5vsgzLSe9+i1FUBqDcdHrgiiC7ROd40c07w PkDpc6l1evnic4RJTgDhGyd3daBwPvlovBGUbT4LjW+ceNscRknNmQvrXEEKYCfW7c/n Iz/fSZbQ7AN3bf5WDTVLCTnjtNHrAmIhixKYMkrjCQXFYag93u6zmY2x1K9omZK3RCz9 1GYcgVa00+3Czq9JBQTwCt3I9TWubqTnEBpbdauYUlFvlE93dBVgHp1PqNUw9B3QkjAs IfC8CII66CtMTNcFvaafoCMzmNcrV5OLUYYQJz8KKsbXxaLvecWEJEIMRxU2CZKMdgCK jSFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=oDHsLOXCfDKN4SGUw050hXi0KQ1J1PzUcjq43IgNvVg=; b=grV2bFuYDpB36ePppx1CYpyI/c6N577ajyL6aV4OD8PAHM44AeZ106OfCI6NIolFy4 MQyup2gg33fjOl0i9RkTQ9HUdjXvTHK1DSJRU1YGPVKw4gHWZV7bggILyuAbC0VwWArZ 5woAqOT7F+bWPm5zny0keAr551FIEtMc0FEw80XBE57/MCM4k1hfYJo4b9fVx+VFKQBr BKN8EBhmNuQQU8JDCrq3RBvu1s5FTAlw3SHj4YlyM+uQhbo4Yl1HdFDPurLJlOS27JgJ +wfAycTXred2I6fc05Cjc+mAJ8a93gxwCSySw0eEfFdsStKApwXJDUP4B1JWhQeSIxWM k/Cw== X-Gm-Message-State: AOAM532VCFr7oyOryaQ2kv+a+tBpaG51gh7Djn7YwTYMk8uKFfci1bb3 qRAimv8HYkqd/X010YYG3Tka5SwSCC4= X-Google-Smtp-Source: ABdhPJwi8LlwFOvqds9NGA3tnbonpZ2v2L4srB58yRK/tHpE5QXRvE/SPeSVcfWlvZ43Bfmzv/zRbw== X-Received: by 2002:a7b:ce91:: with SMTP id q17mr3777061wmj.28.1614859382572; Thu, 04 Mar 2021 04:03:02 -0800 (PST) Received: from [192.168.0.22] (cpc104104-brig22-2-0-cust548.3-3.cable.virginm.net. [82.10.58.37]) by smtp.googlemail.com with ESMTPSA id i8sm24429104wrx.43.2021.03.04.04.03.01 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 04 Mar 2021 04:03:02 -0800 (PST) To: internals@lists.php.net References: <424A5E98-2110-4AFE-9C53-8636A6140313@benramsey.com> Message-ID: Date: Thu, 4 Mar 2021 12:03:02 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-GB Subject: Re: [PHP-DEV] Don't compare zero exponentials in strings as equal From: rowan.collins@gmail.com (Rowan Tommins) On 04/03/2021 10:54, Nikita Popov wrote: > The main one that comes to mind is something like '0' == '0.0'. However, > the real problem is something else: Comparison behavior doesn't affect just > == and !=, but also < and >. And I can see how people would want '2' < '10' > to be true (numeric comparison) rather than false (lexicographical > comparison). That's a very good point, and I think the existence of the <=> makes this even more complicated. Considering your two options: > 1. Decouple equality comparison from relational comparison. Don't handle > numeric strings for == and !=, but do handle them for <, >, etc. What would then be the result of '0' <=> '0.0'? Would the operator need to special case the fact that they are numerically equal but lexicographically unequal? > 2. Don't allow relational comparison on strings. If you want to compare > them lexicographically, use strcmp(), otherwise cast to number first. This is easy to *implement* for the <=> operator, but makes it much less useful. Part of the appeal of the operator is that you can write code like $sortCallback = fn($a,$b) => $a[$sortField] <=> $b[$sortField]; without needing different cases for different data types. Granted, that's not going to use an appropriate sorting collation for many languages, but nor is strcmp(). I think further narrowing the definition of "numeric string" is a more useful course. If we were designing from scratch, the straight-forward definition would be: - all digits: /^\d+$/ - or, all digits with leading hyphen-minus: /^-\d+$/ - or, at least one digit, a dot, and at least one more digit: /^\d+\.\d+$/ - or, as above, but with leading hyphen-minus: /^-\d+\.\d+$/ I think anything beyond that list needs to be carefully justified. - Leading and trailing spaces are probably OK. Other whitespace (newlines, tabs, etc) probably not. - Alternative notations like hexadecimal and exponentials are easy to have false positive matches, and how common are they in practice? - Leading and trailing dots (".5", "1.") might be used sometimes, but I'd probably lean against So, ignoring BC concerns, I would be happy with "numeric string" defined as "maybe space, maybe hyphen, some digits, maybe a dot and more digits, maybe space", which I think in regex form looks like /^ *-?\d+(\.\d+)? *$/ Regards, -- Rowan Tommins [IMSoP]