Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:128962 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id DCC861A00BC for ; Sat, 25 Oct 2025 08:41:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1761381679; bh=1tsr7l44oOTxVjw9RNAZLs/v1uZRrnWj5WMRJFT6nz8=; h=Date:From:To:In-Reply-To:References:Subject:From; b=ko0xC442cfjQjXoxKc2luNqTgmKGu2CIfU67XjIprkLItzYawwIsFZr5x8AU7WWct LkbFlCyVg5/1okpja7ZWLPR3pwnRgek7l0bhHkcFt8TFuMc5GZOqpYC/pQJ4amwzT+ o65gKzkuyhKZk0kU2j8ZQGXL0r+77iMIIJB/S7w/VU4l2VeNg2bLyIKzvzO7ygn9nJ VVRLVUS0L5YY0KMCli6HLBvRwwGi4ifF9whfjgQA8ovgF0ub8FUvj+vHBBzkl/T36T N65KUlIPl5cnyekJ6bYYoc+ucjIajP4KuW0ralhFDyqPEl1Ki/pIygsVNtz1uIH+/3 +rFXx7A4ohoaA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 6D965180039 for ; Sat, 25 Oct 2025 08:41:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from fhigh-b6-smtp.messagingengine.com (fhigh-b6-smtp.messagingengine.com [202.12.124.157]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 25 Oct 2025 08:41:15 +0000 (UTC) Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailfhigh.stl.internal (Postfix) with ESMTP id CE78C7A02A5 for ; Sat, 25 Oct 2025 04:41:09 -0400 (EDT) Received: from phl-imap-05 ([10.202.2.95]) by phl-compute-05.internal (MEProxy); Sat, 25 Oct 2025 04:41:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bottled.codes; h=cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1761381669; x=1761468069; bh=jD0N8tcqgh WOJOEMJRh44aWmbTx9e9bpDy4DXDysxR4=; b=a+jsKDkfc1yC0552+vS2hnuPUc XoJNvUMtfzCP9O5m7AcxiH0fnmfG5iPkhUaqmuZ40LwUrvoc5uKC3Ysby84eR0cZ rf3dvz3yIsb/mn6Qf+YklPB2u3rENiFhdYNpchrlt5UyZMe3XT9huc1BwyZqLSoL kVPBJhfD9MYBo3t4yq3uzA56pofFYKirZQiRHj+hLHYIAOAjpW0FHWylK0TYiRJ3 geNwlZ0BdCsF4d/0YX0ounOH1hVlV/e8r0hkaffIcxnRc3g0SfnCcRyveIO90j8N N52/cZDziKkDq3Oe7aHper7wyHaypLszH+en+jOJQZb5+2ZBNFP5NrXlh89A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1761381669; x=1761468069; bh=jD0N8tcqghWOJOEMJRh44aWmbTx9e9bpDy4 DXDysxR4=; b=L0kkDBp+COBvJh5ijj5tBvOVqdbpjmbVEHMBakjA49OX0ZAK+q6 9/GWix8ABKFyh3qHQ20+hU8q7EEGQvBrHunTNYNGvxR+v/LvsXn45DajfSBLmg9X YtdEzJ3abciDjMEorvT3gxxYAMlMNZeMA0YHiegyGILW6XJQ/tNMDK5fsb7Z+Rem PFgN1pfqHeBqOkPyr1Fi57Nxe2ppJukU7yAGVEGKuxEQ9zqmMm8VQLhIfrVwuXOW 7TAri9kLWfE0xnMtPljG3zf3HXNjS24qb4FrBsfecA913D6jcgeMTfj8pvwSieUO u2kLtQkXDRq7N4BS0itC4H22XYOJ41mr99Q== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdeggdduhedujeejucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefoggffhffvkfgjfhfutgesrgdtreerre dtjeenucfhrhhomhepfdftohgsucfnrghnuggvrhhsfdcuoehrohgssegsohhtthhlvggu rdgtohguvghsqeenucggtffrrghtthgvrhhnpeeitdfhhfdvfffhtedtgfevfefgueegge duueekjeehieeggffhieevleeffeeufeenucffohhmrghinhepghhithhhuhgsrdgtohhm necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprhhosg essghothhtlhgvugdrtghouggvshdpnhgspghrtghpthhtohepuddpmhhouggvpehsmhht phhouhhtpdhrtghpthhtohepihhnthgvrhhnrghlsheslhhishhtshdrphhhphdrnhgvth X-ME-Proxy: Feedback-ID: ifab94697:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id 657561820054; Sat, 25 Oct 2025 04:41:09 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface Precedence: list list-help: list-unsubscribe: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 X-ThreadId: AiBMg20Z7Ek3 Date: Sat, 25 Oct 2025 10:40:49 +0200 To: internals@lists.php.net Message-ID: In-Reply-To: <48b2d4f5-2baa-4dac-9f13-163ac68a4adf@app.fastmail.com> References: <48b2d4f5-2baa-4dac-9f13-163ac68a4adf@app.fastmail.com> Subject: Re: [PHP-DEV] RFC proposal for adding SORT_STRICT flag to array_unique() Content-Type: multipart/alternative; boundary=064498101e9c4e3185d43e2f42571ff0 From: rob@bottled.codes ("Rob Landers") --064498101e9c4e3185d43e2f42571ff0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Sat, Oct 25, 2025, at 10:23, Rob Landers wrote: > On Fri, Oct 24, 2025, at 21:34, Jason Marble wrote: >> Hello everybody! >>=20 >> I'd like to open a discussion regarding the behavior of `array_unique= ()` with the `SORT_REGULAR` flag when used on arrays containing mixed ty= pes. >>=20 >> Currently, `SORT_REGULAR` uses non-strict comparisons, which can lead= to unintentional data loss when values like `100` and `"100"` are treat= ed as duplicates. This forces developers to implement user-land workarou= nds. >>=20 >> Here is a common scenario where this behavior is problematic: >>=20 >> ```php >> $events =3D [ >> ['id' =3D> 100, 'type' =3D> 'user.login'], // User event (= int) >> ['id' =3D> "100", 'type' =3D> 'system.migration'], // System eve= nt (string) >> ['id' =3D> 100, 'type' =3D> 'user.login'], // Duplicate us= er event >> ]; >>=20 >> $event_ids =3D array_column($events, 'id'); // [100, "100", 100] >>=20 >> // Current behavior with SORT_REGULAR >> $unique_ids =3D array_unique($event_ids, SORT_REGULAR); // Result: [1= 00] >> // The string "100" is lost due to type coercion. >> ``` >>=20 >> To address this, I propose adding a new flag, `SORT_STRICT`, which wo= uld use strict (`=3D=3D=3D`) comparisons to differentiate between values= of different types. >>=20 >> With the new flag, the result would be: >>=20 >> ```php >> // Proposed behavior with SORT_STRICT >> $unique_ids =3D array_unique($event_ids, SORT_STRICT); // Result: [10= 0, "100"] >> // Both integer and string values are preserved. >> ``` >>=20 >> I've already submitted a PR to correct the bug I just highlighted: >> PR: https://github.com/php/php-src/pull/20273 >> The potential for a `SORT_NATURAL` flag also came to mind as another = useful addition, but I believe `SORT_STRICT` is the more critical featur= e to discuss first. >>=20 >> I look forward to your feedback. >>=20 >> Thanks, =20 >> - Jason >=20 > Hi Jason, >=20 > Other than the bytes in memory and how they=E2=80=99re laid out, I fai= l to see how 100 is different from 100. They=E2=80=99re conceptually ide= ntical, and array_* functions generally behave by value, not by identity= . I think it=E2=80=99s probably wise to take a step back here and evalua= te the knock-on effects of something like this: >=20 > SORT_REGULAR has some warts, it isn=E2=80=99t perfect. Having a SORT_S= TRICT sounds kinda nice until you start thinking about it a bit. This pa= rameter has traditionally been used to indicate a "comparison mode" that= describes how to compare values. Strict identity is on a completely dif= ferent axis (they can=E2=80=99t be less/greater than; objects aren=E2=80= =99t *strictly* comparable, but they=E2=80=99re loosely comparable, 1.0 = is strictly comparable to 1 or "1"). Further, it begs the question: "can= I get a SORT_STRICT_NUMERIC" or "can I get a SORT_STRICT_STRING", which= further indicates this is a completely different axis altogether than "= just" a different comparison mode. >=20 > As to your example, it conflates two namespaces of Ids =E2=80=94 user = ids and system ids =E2=80=94 into a single untyped bag, then asks array_= unique() to preserve that boundary. This is a domain distinction, not a = language problem. Simply removing your array_column() step in your examp= le arrives at your desired solution. >=20 > =E2=80=94 Rob I mis-typed this: > they can=E2=80=99t be less/greater than; objects aren=E2=80=99t *stric= tly* comparable, but they=E2=80=99re loosely comparable, 1.0 is strictly= comparable to 1 or "1" It should have read: > they can=E2=80=99t be less/greater than; objects aren=E2=80=99t *stric= tly* comparable, but they=E2=80=99re loosely comparable, 1.0 is *not* st= rictly comparable to 1 or "1" PS. Speaking of "bytes in memory", it might be better to propose a SORT_= BINARY. It has the same effect you=E2=80=99re looking for, but arrays of= bytes have a lexicographical ordering. =E2=80=94 Rob --064498101e9c4e3185d43e2f42571ff0 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable


On Sat, Oct 25, 2025, at 10:23, Rob Landers wrote:
On Fri, Oct 24, 20= 25, at 21:34, Jason Marble wrote:
Hello everybody!
<= div>
I'd like to open a discussion regarding the behavior = of `array_unique()` with the `SORT_REGULAR` flag when used on arrays con= taining mixed types.

Currently, `SORT_REGULAR` = uses non-strict comparisons, which can lead to unintentional data loss w= hen values like `100` and `"100"` are treated as duplicates. This forces= developers to implement user-land workarounds.

Here is a common scenario where this behavior is problematic:

```php
$events =3D [
    [= 'id' =3D> 100, 'type' =3D> 'user.login'],      &nbs= p; // User event (int)
    ['id' =3D> "100", 'typ= e' =3D> 'system.migration'],  // System event (string)
    ['id' =3D> 100, 'type' =3D> 'user.login'],  &n= bsp;     // Duplicate user event
];

$event_ids =3D array_column($events, 'id'); // [100, "100", 100= ]

// Current behavior with SORT_REGULAR
$unique_ids =3D array_unique($event_ids, SORT_REGULAR); // Result: [1= 00]
// The string "100" is lost due to type coercion.
```

To address this, I propose adding a new f= lag, `SORT_STRICT`, which would use strict (`=3D=3D=3D`) comparisons to = differentiate between values of different types.

With the new flag, the result would be:

```ph= p
// Proposed behavior with SORT_STRICT
$unique_ids = =3D array_unique($event_ids, SORT_STRICT); // Result: [100, "100"]
=
// Both integer and string values are preserved.
```

I've already submitted a PR to correct the bug I ju= st highlighted:
The potential for a `SORT_NATURAL` flag al= so came to mind as another useful addition, but I believe `SORT_STRICT` = is the more critical feature to discuss first.

= I look forward to your feedback.

Thanks, &= nbsp;
- = Jason

Hi = Jason,

Other than the bytes in memory and how t= hey=E2=80=99re laid out, I fail to see how 100 is different from 100. Th= ey=E2=80=99re conceptually identical, and array_* functions generally be= have by value, not by identity. I think it=E2=80=99s probably wise to ta= ke a step back here and evaluate the knock-on effects of something like = this:

SORT_REGULAR has some warts, it isn=E2=80= =99t perfect. Having a SORT_STRICT sounds kinda nice until you start thi= nking about it a bit. This parameter has traditionally been used to indi= cate a "comparison mode" that describes how to compare values. Strict id= entity is on a completely different axis (they can=E2=80=99t be less/gre= ater than; objects aren=E2=80=99t strictly comparable, but they=E2= =80=99re loosely comparable, 1.0 is strictly comparable to 1 or "1"). Fu= rther, it begs the question: "can I get a SORT_STRICT_NUMERIC" or "can I= get a SORT_STRICT_STRING", which further indicates this is a completely= different axis altogether than "just" a different comparison mode.

As to your example, it conflates two namespaces of = Ids =E2=80=94 user ids and system ids =E2=80=94 into a single untyped ba= g, then asks array_unique() to preserve that boundary. This is a domain = distinction, not a language problem. Simply removing your array_column()= step in your example arrives at your desired solution.

=E2=80=94 Rob
I mis-typed this:

they can=E2=80=99t be less/greater than; objects aren=E2=80=99t= strictly comparable, but they=E2=80=99re loosely comparable, 1.0= is strictly comparable to 1 or "1"

It should have read:

they can=E2=80=99t be less/greater than; objects aren=E2=80=99t strictly comparable, but they=E2=80=99re loosely comparable, 1.0 is=  not strictly comparable to 1 or "1"
<= div>
PS. Speaking of "bytes in memory", it might be better= to propose a SORT_BINARY. It has the same effect you=E2=80=99re looking= for, but arrays of bytes have a lexicographical ordering.
=E2=80=94 Rob
--064498101e9c4e3185d43e2f42571ff0--