Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:108861 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 39142 invoked from network); 5 Mar 2020 12:06:51 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 5 Mar 2020 12:06:51 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 3E1111804E4 for ; Thu, 5 Mar 2020 02:26:30 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 5 Mar 2020 02:26:29 -0800 (PST) Received: by mail-lf1-f48.google.com with SMTP id v6so4125964lfo.13 for ; Thu, 05 Mar 2020 02:26:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=A6Rq5P1Fw+65kNC36NineqieLnn++Rqd0j6vAa70s/0=; b=PYX3e4pp4xNGoHrAupz8V06PPCK7z5Q+e3sETxa1swOq2paeTrWGd+Vu0RxnTp85Iz im/HDX8PPViV3zLP+dla3n1psrovgpm0rWPm4dOOvHi/wOBC9XvgOfC6ky2zp1ZeiGi5 P3gwpKQSK3oKQnfDjJCg+2IS9IDntnshdNCHjtBI8KlwflVE2TgTDHQ4+0Je+yF4BxbB +UTDrpIZTUaOH7cmD+DVdlzZc7U8Xwd47+acaEZiQosRqxYTXZ7rFZk0kFaWjMvgrjCU DDq+aPv0U+UjlYBHDNJAgXR6MyDj/HERvGo7aKBcp3yKr9PpAfKtAb3h+csVil0NXXiC s1Ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=A6Rq5P1Fw+65kNC36NineqieLnn++Rqd0j6vAa70s/0=; b=FrB9ApGnc9PsOL8CQxBPv6mr2s9gF2AAXEJjzfUH2enlNMpUBgJRUn5vyPcNWiF4Jl w8rzlBzmDZKRbwTfIMh7vXoVk8jShxcZ36gGhUOWk9hEPFN18Pktg4vbIobtvvBjiQnu i7SQQjYOEQvGrDK18oy0H9fwrHQpWE+sEHcyc/CJqK9P5nrlpDdPrV9k5YUimj2H9iOg cCl2RjGhZrkMSDxjHocvRH+cOJpUgK/dZuR3YalUn9nvdUHxWZQxOQpHq2RWVPVCvaTm 4RvINltyQzb0yr8cNE0VStDqSg5hVgsUyvJjJ1T0KdSWT8lkSPFKVD9z2RWxmRjsE1uE VcvQ== X-Gm-Message-State: ANhLgQ2ib5DHQgkkjdsJCdyiF2XEhPdDOfPLispnUbRq6iXdcCIGO05j PVqOqea+a4YK6xCa5OhZZXu+xO/LCnPnRpIYbwM= X-Google-Smtp-Source: ADFU+vu5cS9CnOzawvgM3h5w9dkPJUtO6oV6/bMRNgaRiMHtQQJj/96rmXNoutFLL+jJrzbsoIJ1bg0WzI1FkmKydfU= X-Received: by 2002:ac2:5c5d:: with SMTP id s29mr5124200lfp.191.1583403987387; Thu, 05 Mar 2020 02:26:27 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: Date: Thu, 5 Mar 2020 11:26:11 +0100 Message-ID: To: Dmitry Stogov Cc: PHP internals Content-Type: multipart/alternative; boundary="00000000000034499905a018f87a" Subject: Re: [PHP-DEV] Make sorting stable From: nikita.ppv@gmail.com (Nikita Popov) --00000000000034499905a018f87a Content-Type: text/plain; charset="UTF-8" On Wed, Mar 4, 2020 at 9:39 PM Dmitry Stogov wrote: > Hi Nikita, > > I suspect, this way of stable sorting won't come for free. > Most probably, It'll require more comparisons and swaps. > > I would prefer, extend sort() functions with optional $stable argument, > that would lead to use stable sorting algorithm in first place. > I've updated the implementation to fix some performance issues (turns out, comparison is extremely sensitive to code layout details), and did some simple micro benchmarks using the script at https://gist.github.com/nikic/5d44cb5d0d7c1f414f455090a0193567. It's sorting an array with varying fraction of duplicate elements. UNSTABLE 0.00: float(1.5692839622497559) 0.01: float(1.572052001953125) 0.05: float(1.5424528121948242) 0.10: float(1.5431289672851562) 0.20: float(1.4708871841430664) 0.50: float(1.2815279960632324) 1.00: float(0.9397969245910645) STABLE 0.00: float(1.5857720375061035) 0.01: float(1.577707052230835) 0.05: float(1.581636905670166) 0.10: float(1.588440179824829) 0.20: float(1.600801944732666) 0.50: float(1.5892488956451416) 1.00: float(1.0026319026947021) This shows that for arrays without duplicate elements (so unstable / stable sort makes no difference on the result), the performance impact is ~1%, so negligible. At 10% duplicate elements, it's 3%. At 50% duplicate elements, it's 25%. (Generally the effect is that the stable sort performance stays mostly stable regardless of number of duplicates, while unstable sort improves with number of duplicates. As one would expect.) For real-world use cases, I expect that it's unusual to have a very large fraction of duplicate elements, so the impact should be towards the 1-3% range. I think as long as the impact is not too large, it is better to have a single behavior, rather than adding more flags to our (many) sort functions. Regards, Nikita > ------------------------------ > *From:* Nikita Popov > *Sent:* Wednesday, March 4, 2020 19:42 > *To:* PHP internals > *Subject:* [PHP-DEV] Make sorting stable > > Hi internals, > > Sorting function in PHP currently do not guarantee stability, which means > that the result order of elements that compare equal is not defined. > > To achieve a stable sort, you need to do something like this (untested): > > $arrayAndPos = []; > $pos = 0; > foreach ($array as $value) { > $arrayAndPos[] = [$value, $pos++]; > } > usort($arrayAndPos, function($a, $b) use($compare) { > return $compare($a[0], $b[0]) ?: $a[1] <=> $b[1]; > }); > $array = []; > foreach ($arrayAndPos as $elem) { > $array[] = $elem[0]; > } > > This is both cumbersome and very inefficient. Those temporary arrays > significantly increase memory usage, and hurt performance of the sort. > > I believe that it is important for us to provide a stable sorting option in > the standard library. We could introduce a separate stable function(s) for > this purpose, but I believe we can just as well make all our existing sorts > stable. > > This does not require actually switching sorting algorithms to something > like Timsort. We can essentially do the same as the above PHP code, just > much more efficiently. Due to certain implementation details, we can do > this without memory usage increase, and with minimal performance impact. > I've put https://github.com/php/php-src/pull/5236 up as a prototype. > > The only issue I ran into is that this change has a negative impact on code > using illegal comparison callbacks like this: > > usort($array, function($a, $b) { > return $a > $b; > }); > > The return value of the sorting callback should be $a <=> $b, but using $a > > $b also works right now, due to implementation details of the sorting > implementation (only the outcome of $compare($a, $b) > 0 ends up being used > by the sort). > > This kind of incorrect code will break under the proposed implementation, > because we will now compare by original position if the comparison function > reports equality. Because the comparator reports equality inconsistently > (it says that $a == $b, but $b != $a), the sort results are also > inconsistent. > > What do people think about this? Is there interest in making sorting > stable? Is it okay to break code using illegal comparison callbacks? > > Regards, > Nikita > > > CAUTION: This email originated from outside of the organization. Do not > click on links or open attachments unless you recognize the sender and know > the content is safe. > > This e-mail may contain information that is privileged or confidential. If > you are not the intended recipient, please delete the e-mail and any > attachments and notify us immediately. > > --00000000000034499905a018f87a--