Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118952 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 70442 invoked from network); 3 Nov 2022 16:04:50 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 3 Nov 2022 16:04:50 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id C2AFD18004D for ; Thu, 3 Nov 2022 09:04:49 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 3 Nov 2022 09:04:46 -0700 (PDT) Received: by mail-pl1-f175.google.com with SMTP id k7so2358252pll.6 for ; Thu, 03 Nov 2022 09:04:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=NOvNDqY5g04DDYUJ02Y1YzlGksVQ1p0yeuY3HV7WJio=; b=Wk8y6d4lNPjq77SDj5ilCG3834Zevb4kEHWFfjiV1RqgUkQWS5QfDBbVwhnDgg68kc F0/0xcj1igIH+3Tgkp6tH9kGoYEkwNewjXB8uB8v+pGbpvAF7DOuzrxlozzv35Cz+VFT dUG6BgkDGth9wCaurfy3mNjUt4izA6IE+HW4qvtCKDzpQdaaFuWGuu7j6oFJW0lJUDd/ hJWu2crelga8s7VwGYIp6U1196D4I4JmxPGy1eb+UqocrKo/vUYyQK3QKLPESYgrAupP rtAeIgDOBpvjD5dzh73LA95ev8IbT0Tv/uHysiK0tkijdLszSPGNRB8nLha4kyzS2Dsa sO7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=NOvNDqY5g04DDYUJ02Y1YzlGksVQ1p0yeuY3HV7WJio=; b=BqDoJA9dGEA2Enyun/H5GxDABbOwSwIyYk1wip2tUDw7kZxB0gRdk6/s5i8v0y+mT8 M4f+maAjG5yq+7wcKL5triisIRg7fh2pqeonL9JK98nPWVWKQb5LaJiCQRC3ea3Y+amS idsqOZtICDPES/EtTicessjrcdqZgkAXBzelGmP1MlAgufAnNKTLznpO3K6tKBHquBSZ HR4g/9S9hhuCKQuTx5h1vfYZCtxXzGRkm7QC9uKNZxKBZgSUMfwsa46UszeODa82xJZt eFPBSxF6c//dNN3zdNUA1nx2M+kfHD+HtI4SQaqJGpzIGcFYf4LgIpn0xh+r7X/MbTLG Igow== X-Gm-Message-State: ACrzQf1GLj9HXRmwS9ZUqEzPT6T9ZUREiCj40nBfMwS7AnYzZG/hsPhg eXqQgUDQewj5xoIQ5FDkJORtI0Dn5OG0ev22b8TkMBiQeGnnMg== X-Google-Smtp-Source: AMsMyM4B9cXxeslxshL/xwdNqYX4ZERrOXwuHzP3I1k4pLbsTtbxERxheNLeWFzI8Y9iaSG/Cm4zE2vAaazVBkX2XpI= X-Received: by 2002:a17:90b:4f8a:b0:213:48f0:296f with SMTP id qe10-20020a17090b4f8a00b0021348f0296fmr49091507pjb.140.1667491485058; Thu, 03 Nov 2022 09:04:45 -0700 (PDT) MIME-Version: 1.0 Date: Thu, 3 Nov 2022 17:04:33 +0100 Message-ID: To: PHP internals Content-Type: text/plain; charset="UTF-8" Subject: ARRAY_UNIQUE_IDENTICAL option From: tovilo.ilija@gmail.com (Ilija Tovilo) Hi internals There's an open bug report that array_unique doesn't work for enums: https://github.com/php/php-src/issues/9775 This comes down to the fact that array_unique internally sorts the array before iterating over it to remove duplicates, and that enums are intentionally incomparable. Foo::Bar < Foo::Baz // false Foo::Baz < Foo::Bar // false Unfortunately, this means that array_unique might coincidentally work fine if the array is already sorted, or gets correctly sorted by chance while breaking otherwise. To solve this, I propose adding an ARRAY_UNIQUE_IDENTICAL option that can be passed to array_uniques $flags which uses identical operator (===) semantics. Internally it uses a new hashmap that allows using arbitrary PHP values as keys to efficiently remove duplicates. This is slightly over-engineered for this use case. However, this data structure will be required for implementing ADTs to deduplicate instances with the same values. This hashmap is a heavily minimized version of the teds extensions StrictHashMap [1]. Time complexity of this function is O(n). With the exception of SORT_STRING (which uses PHPs existing hashmap in a very similar fashion and also has O(n)) it should scale better than the other sort options which are O(n log n). Here's a link to the implementation: https://github.com/php/php-src/pull/9882/files If there are no concerns or complaints I'd like to merge this into PHP 8.3. Otherwise I will create an RFC. Looking forward to your feedback. Ilija [1| https://github.com/TysonAndre/pecl-teds/blob/main/teds_stricthashmap.c