Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:128958 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id A9EA01A00BC for ; Sat, 25 Oct 2025 00:52:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1761353556; bh=vk/c3Fy/8/s+bNLw2ZIfG5DP3HjVPgVmYnEVpUYE7ro=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=GA+2UxuyB7C9lip0Oc6HWPhcD4Arb6zDx029P7vFXTAwMc0YAPSkq92uF/8drsitJ Ut9MFxb2U7xU90EJ/l8RLYb3fhWYWkUN40Lu6ewDzG17WR0c/TpfVSKQn+SvQ42pS5 ouFV7WU6AY+Mjs1sg+2YRb7HNevTe6dPtIhlPUDr34J1nBDN4xNZXC8lvBVZufv7qZ ieJXrzQsyXzN3p2DVCxzfvk0sM6pBWsQG4f88eisMk4P4422w9x8lj/2ethCi4Ti+r YZPZ1L9dKdLV7CQOME7++OdtE5jLlR5GmgGCeJa5GyzoL2CzPbJgQNwnU5lYQ3YGOo NYeoDZYGqfXkw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 7135118006E for ; Sat, 25 Oct 2025 00:52:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_40,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 25 Oct 2025 00:52:35 +0000 (UTC) Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-71d6014810fso26289057b3.0 for ; Fri, 24 Oct 2025 17:52:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intuitivetechnology.com; s=google; t=1761353549; x=1761958349; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=3oLt49VnkEywiTE4qz+THE9BshuaFOjV+xpFJ5eYQwc=; b=dJ8zvuclVpLWqzrfLPsRpDUs//+oQlDxuZxFapVTl4YYXoABZFnhXPv/Y9jQZRXUoS 9DJ9wBOQovVRZgxaA3d8sjqkXGDzx9wf10kgBameWTJd53q5m7cKv8w9damIECa1U3xB /yLIR+WAcYVMWoEx3GkyaEeVOQ85G9tmRjuJRtEEI4LKVlvKhcJCz3gm0+uZDiieidou MCSCPHkcc/QMBOS5oMSMU3dzIcQ/vvF/Ovw4Li7SbyuRYYaDc/CikY4J+xk2+aVDnawK VjB8H3iYquFlyr3TSesG4m3VrHXDL3l3OuTERj+Sx+LrzPo2PKydfvJWEp3M9BV9Xd54 QuWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761353549; x=1761958349; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3oLt49VnkEywiTE4qz+THE9BshuaFOjV+xpFJ5eYQwc=; b=XTy1kKcwCvFNudBHMeWC6x6mEaeX36CP0vR/La1jiNYkJAzKGGdZtThj3lEaD0eih1 VZmVq39ylruj39YxBbfT/E4/l+GyF1PsGPMK/cO+1io7StD1ob1YGWODfkVzxUNAWj2h VC1thJzFRyXD9gHosE88EEAURlPT74n8EmhoX7jG7PrkuYwrE8MVpfelGAD/L1vNj/rB lTJXTJapDlq/hO1ZaTNKSk+uUkVqFXdZLKrQxq+21FNmjHzs0fLqxFFjUWG68UsPokdd TLCPJHNih33HL0aeXw274HsH4RvHmOZx67yrScFc1lKZgPsqYAWGFhvn2ByK80nbgY39 0AKg== X-Gm-Message-State: AOJu0YxMpkMqxExf23x0Y8/ZXyFZlGP257aj3ThaVOhfIEEiKsO9AkFV 7f0V5T4yJECWouh2qskiE3nBZQKaZf7IzAursAi2lZr0jsFOGRriRvQkEmW2A40Z1aFMykeahxv zTosPH5fxnSys/dL1qgfk86LHcpLA/Xsp/s1u+Vk4tmjjbCiazRpqV0E= X-Gm-Gg: ASbGncux//WYuZx/+b/DyZ/+NpgULZgUZcxzBFMK59gmtn+R2TPoJ8gLunyD1Zgu01c N7hxnoPp1iNvFj1sP/thfQtuyjpZCpLbfpcVrYpvth6yWlDgzw3yw5EzvbfunNV1Coz+bORba5D ez9AteVm0QNs7iBvszRCE0PxaIjDLhQm+4xe+RSBOpyWKH0RFVXZdE7fOY3kdmWet/2lmm0349+ 0/+ew4LAnVYrLw2D1Fy6n/LjJLSdMyuExmsMy1sKhURMd2mOUxBXrOXyU+yUw== X-Google-Smtp-Source: AGHT+IEmgHV47vaAmuJi+kGjICswtpCTalmFeUJCHHCrwiqaOOgXrT3Myi841wFiJuuDoxeBtSxvkIzlkgXBHTK11cw= X-Received: by 2002:a53:a008:0:b0:63c:f5a6:f2e9 with SMTP id 956f58d0204a3-63f378e1272mr4136373d50.59.1761353549387; Fri, 24 Oct 2025 17:52:29 -0700 (PDT) Precedence: list list-help: list-unsubscribe: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 References: <4b605609-47c2-4df5-be71-8b962ad9ff1e@varteg.nz> In-Reply-To: Date: Fri, 24 Oct 2025 17:52:18 -0700 X-Gm-Features: AS18NWC06nXglsniW6suehKZDop_YxaFbDhZRB0ZE_W2DAvjTS_00YRCAPxWBr4 Message-ID: Subject: Re: [PHP-DEV] RFC proposal for adding SORT_STRICT flag to array_unique() To: Weedpacket@varteg.nz Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary="000000000000a2a7d60641f114fa" From: jmarble@intuitivetechnology.com (Jason Marble) --000000000000a2a7d60641f114fa Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Quick POC: https://github.com/jmarble/php-src/tree/feature/array-unique-sort-strict ~1.4x faster than this simple userland implementation on my local machine. I purposefully avoided implementing a hash-bucket because I had already tried that and encountered too many edge cases LOL: https://gist.github.com/jmarble/1e08eb15274cd434e867baf96ffa301d On Fri, Oct 24, 2025 at 4:51=E2=80=AFPM Jason Marble < jmarble@intuitivetechnology.com> wrote: > Correct! Basically: > - SORT_STRINGS: reliable and predictable when you understand the value > will be converted to a string > - SORT_NUMERIC: same but _risky_, you should be certain you're working > with numbers > - SORT_REGULAR: the sort is unstable and will inevitably cause a bug that > no one will understand LOL > > With the proposed SORT_STRICT, we will get super fast, reliable and > predictable deduplication. > > > On Fri, Oct 24, 2025 at 3:16=E2=80=AFPM Morgan wro= te: > >> On 2025-10-25 08:34, Jason Marble wrote: >> > Hello everybody! >> > >> > >> > The potential for a `SORT_NATURAL` flag also came to mind as another >> > useful addition, but I believe `SORT_STRICT` is the more critical >> > feature to discuss first. >> > >> >> I know I find array_unique generally useless due to its insistence on >> stringifying everything for comparison. >> >> ``` >> $uniques =3D []; >> foreach($source_array as $a) { >> if(!in_array($a, $uniques, true)) { >> $uniques[] =3D $a; >> } >> } >> ``` >> >> I seem to recall part of the issue is that array_unique works by sorting >> its elements so that "equal" values are adjacent. I know this would be >> done on O(n log(n)) vs. O(n^2) grounds, but that could be addressed at >> least in part by a smarter sort criterion that sorts by type/class (in >> some arbitrary order) before sorting by value. For uncomparable types >> (i.e., instances of most classes) this would be by object ID, because we >> don't _actually_ care about ordering. >> > --000000000000a2a7d60641f114fa Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Quick POC:
https://github.com/jmarble/php-src= /tree/feature/array-unique-sort-strict

~1.4x faster than=C2=A0th= is simple userland implementation on my local machine. I purposefully avoid= ed implementing a hash-bucket because I had already tried that and encounte= red too many edge cases LOL:
https://gist.github.com/jmarble/1e08eb152= 74cd434e867baf96ffa301d

On Fri, Oct 24, 2025 at 4:= 51=E2=80=AFPM Jason Marble <jmarble@intuitivetechnology.com> wrote:
Correct! Basica= lly:
- SORT_STRINGS: reliable and predictable when you understand the va= lue will be converted to a string
- SORT_NUMERIC: same but _risky_, you = should be certain you're working with numbers
- SORT_REGULAR: the so= rt is unstable and will inevitably cause a bug that no one will understand = LOL

With the proposed SORT_STRICT, we will get super fast, re= liable and predictable deduplication.


On Fri, Oct 24, 2025 at 3:16= =E2=80=AFPM Morgan <Weedpacket@varteg.nz> wrote:
On 2025-10-25 08:34, Jason Marble wrote:
> Hello everybody!
>
>
> The potential for a `SORT_NATURAL` flag also came to mind as another <= br> > useful addition, but I believe `SORT_STRICT` is the more critical
> feature to discuss first.
>

I know I find array_unique generally useless due to its insistence on
stringifying everything for comparison.

```
$uniques =3D [];
foreach($source_array as $a) {
=C2=A0 =C2=A0 =C2=A0if(!in_array($a, $uniques, true)) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0$uniques[] =3D $a;
=C2=A0 =C2=A0 =C2=A0}
}
```

I seem to recall part of the issue is that array_unique works by sorting its elements so that "equal" values are adjacent. I know this wou= ld be
done on O(n log(n)) vs. O(n^2) grounds, but that could be addressed at
least in part by a smarter sort criterion that sorts by type/class (in
some arbitrary order) before sorting by value. For uncomparable types
(i.e., instances of most classes) this would be by object ID, because we don't _actually_ care about ordering.
--000000000000a2a7d60641f114fa--