Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:113373 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 56840 invoked from network); 4 Mar 2021 13:27:04 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 4 Mar 2021 13:27:04 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 6ED2C180505 for ; Thu, 4 Mar 2021 05:17:45 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 4 Mar 2021 05:17:44 -0800 (PST) Received: by mail-lf1-f50.google.com with SMTP id v5so43086647lft.13 for ; Thu, 04 Mar 2021 05:17:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9igDPb69Slpl1GgWR9NAXUKcEu+uF9/x1ZmBHgmxPvU=; b=j58bAm7N4ftyIOEFR0hMSL6y3NR1qpW/MUCfhLjiMX/yCECQAJuNdgEwbDx34LnRNx 3nIBrt0BGMciVtyQ3ss1yiUP40oAPHJrNEsbQ7eVOOKmXalL7RJ9n4fA+D6aJ8ITjUaB /VpChePV26zRp9t5LWjIiBhrTjKDojaGrAsFT5v60m2C1PqZHU4TkJhf0eFpCd5v8cBg OVfXE1VQbmPqna8GvaNFVsiDr/M7zjEgA9a1QA2DF+8BCbdnPx5LO6diQNjDuOHayLnY HT94nrQ7PneGuvlNksD/pqXAQ9nFaaIcfPEiQsO1JoolD9rSn62N71MAbyGmmkF8uZT8 HcNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9igDPb69Slpl1GgWR9NAXUKcEu+uF9/x1ZmBHgmxPvU=; b=bYgDiHvsnoBP8qh3UCHwVHZWS40NwPKTtC1k+VnbQXt1wAtXWTCiGb4sBiabq6BPkc dbiQBkzLc1BHYoeagv/T/MZkFiUXEveTZpISgryCSTODpZXpSlgTArin1STyfRs+c95X bydsuUNBeyYaOVdvOe781hb8XY39fcKf/yScl8uhskiqDFXy/M0/W8YN4B6Wrjpy6dFa Q+hpYwAjAy6YP3tyy1P6wX7lrQx1fh3tatf8k6Zrp/Jr90zD++JOn4+gUTSN2dgahEs3 i7Kj2Ob0ZJSsrUSpyJeqwyfDIf1TTLM8yvNZLU5inpTaPTlhI+yp8Njcdl0utcXhuqO9 dhBA== X-Gm-Message-State: AOAM530C5B21PG/7tjS4EUYNxuj5JuFRXCusAxJu8X5HcGP7jxpSpHB/ Y9hXp3jmLc6dCEh81FhY27s1TJGnj+/wLSSKIKLO5D5v8R0= X-Google-Smtp-Source: ABdhPJxu0J0LGTCCObWdDZpY9ul4KstMA3zU27MLJuss9Kkluhwlv5VRD+HIuWNNi/AwrnwwFPBTDDz1T1z+OqBZ/U0= X-Received: by 2002:a05:6512:3a86:: with SMTP id q6mr2440769lfu.286.1614863863322; Thu, 04 Mar 2021 05:17:43 -0800 (PST) MIME-Version: 1.0 References: <424A5E98-2110-4AFE-9C53-8636A6140313@benramsey.com> In-Reply-To: Date: Thu, 4 Mar 2021 14:17:26 +0100 Message-ID: To: Rowan Tommins Cc: PHP internals Content-Type: multipart/alternative; boundary="000000000000ef1dca05bcb5ca89" Subject: Re: [PHP-DEV] Don't compare zero exponentials in strings as equal From: nikita.ppv@gmail.com (Nikita Popov) --000000000000ef1dca05bcb5ca89 Content-Type: text/plain; charset="UTF-8" On Thu, Mar 4, 2021 at 1:03 PM Rowan Tommins wrote: > On 04/03/2021 10:54, Nikita Popov wrote: > > The main one that comes to mind is something like '0' == '0.0'. However, > > the real problem is something else: Comparison behavior doesn't affect > just > > == and !=, but also < and >. And I can see how people would want '2' < > '10' > > to be true (numeric comparison) rather than false (lexicographical > > comparison). > > > That's a very good point, and I think the existence of the <=> makes > this even more complicated. > > > Considering your two options: > > > 1. Decouple equality comparison from relational comparison. Don't handle > > numeric strings for == and !=, but do handle them for <, >, etc. > > > What would then be the result of '0' <=> '0.0'? Would the operator need > to special case the fact that they are numerically equal but > lexicographically unequal? > Both '0' <=> '0.0' and '0.0' <=> '0' would return 1 in that case, which is PHP's indication that values are non-comparable. It's definitely not a good option. > > 2. Don't allow relational comparison on strings. If you want to compare > > them lexicographically, use strcmp(), otherwise cast to number first. > > > This is easy to *implement* for the <=> operator, but makes it much less > useful. Part of the appeal of the operator is that you can write code > like $sortCallback = fn($a,$b) => $a[$sortField] <=> $b[$sortField]; > without needing different cases for different data types. > > Granted, that's not going to use an appropriate sorting collation for > many languages, but nor is strcmp(). > > > I think further narrowing the definition of "numeric string" is a more > useful course. If we were designing from scratch, the straight-forward > definition would be: > > - all digits: /^\d+$/ > - or, all digits with leading hyphen-minus: /^-\d+$/ > - or, at least one digit, a dot, and at least one more digit: /^\d+\.\d+$/ > - or, as above, but with leading hyphen-minus: /^-\d+\.\d+$/ > > I think anything beyond that list needs to be carefully justified. > > - Leading and trailing spaces are probably OK. Other whitespace > (newlines, tabs, etc) probably not. > - Alternative notations like hexadecimal and exponentials are easy to > have false positive matches, and how common are they in practice? > - Leading and trailing dots (".5", "1.") might be used sometimes, but > I'd probably lean against > > So, ignoring BC concerns, I would be happy with "numeric string" defined > as "maybe space, maybe hyphen, some digits, maybe a dot and more digits, > maybe space", which I think in regex form looks like /^ *-?\d+(\.\d+)? *$/ > A disadvantage of narrowing the definition in such a fashion is that it introduces a discrepancy with (float) casts. I believe these currently recognize the same values, with the exception that (float) discards trailing garbage. Another disadvantage is that exponential notation is commonly returned for large numbers by various data source -- e.g. if you stored a large float in a database, I'd expect you'd get it back in exponential notation (if you get it back as a string). This means that your code could suddenly break because the range of a value passes some heuristic threshold for how it gets printed. Regards, Nikita --000000000000ef1dca05bcb5ca89--