Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:128961 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id 60B3A1A00BC for ; Sat, 25 Oct 2025 08:23:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1761380629; bh=oKZlHcjE8ON7HYiINOjcG/r2Axxtvkfisyorvvg1aAk=; h=Date:From:To:In-Reply-To:References:Subject:From; b=nLHFyF5nGH+3kOw3m7QzkWrsaaBiLhx9b4VxgHUvcGAkyVBVLAdnWEdc1QH8KQUWc Fu9VPUpYx1nGG5zS/VoiOBozG5emGOfoykproAoeaF76oraTSbgNR6koWmjkCc2Xbk WneuKhJaNtyNbZSDJK4FxAGDJcRJIxBQE8VraGFOC5yR8tAaPhMSW7SWM690QyvtMS JN+udzwZGYWuPSIGAxafIvFydD6LvCCOg5YQqTkYtY+1T1WluS623BoTaX/WorOP9v KApz4f1227E5mJaFXSik1DS+T52x4x8x5b3UiRNp9hnY4ldsNNTf99XforCs+rXRN2 0EXD8L08jjrew== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 3856F180055 for ; Sat, 25 Oct 2025 08:23:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from fhigh-b6-smtp.messagingengine.com (fhigh-b6-smtp.messagingengine.com [202.12.124.157]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 25 Oct 2025 08:23:37 +0000 (UTC) Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailfhigh.stl.internal (Postfix) with ESMTP id 593D07A00E5 for ; Sat, 25 Oct 2025 04:23:32 -0400 (EDT) Received: from phl-imap-05 ([10.202.2.95]) by phl-compute-05.internal (MEProxy); Sat, 25 Oct 2025 04:23:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bottled.codes; h=cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1761380612; x=1761467012; bh=blybkX+8Em wLU/nK7Z30g8lrYwkNG/zZeRM7s0KZ9ng=; b=iNzjTmbswwc9hVZGUONSdmf/gQ FdooQe1dncnz2n58/VBIcDEdHyB2t1xv/Mz+DDQo9qnJVekjYck7ifT6zueAzTNJ H4MeDlHYATCBEANUDVOT111iE3EdoMh/z7JRruJbkShP0mw/wecIiDfrR5hP7r9O CL7JP8BESKL31Oaz67+nqN8EToF8qSYoet9pi0CbY9sJL9HPAlxqLa+3oAfwVLAn T56bK0PYqXj6sCLd82MChfap4SP7MRGvZ9fPREfHVhaCJ3gxDHO2vTbXwRkozdS7 Vlmd1xF1pgFke9rlhpJIwq883cwUTfHXt7NF3goe0Fctx8emNXcw/xrfl2fQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1761380612; x=1761467012; bh=blybkX+8EmwLU/nK7Z30g8lrYwkNG/zZeRM 7s0KZ9ng=; b=WP/YBqx7nUV20G6rpuVlc5RGL89GMxmIlO+bcUXccBOgtsDBI7S eAhoyzh7e/ZVzxBatNzZNOeerabhVPDDwC3qSJjnWnc5mEc0CoxIuO3yfwauxUb7 nU0q4RoCzwP1InTaFm+WILz4NPLEuXazsPEkcUroR9DJwvVoYxVqoe90EiskOzOZ xpMTXnBu6k6nfk3zZvOmHrtQJxoKHpQt0HzOZqYY/p1BRUUiaEmsvS2s/xw3VCLm AHfEMbq2IMpUMMBCwVGcGNcpVMVey50F716gGT/Mo7l0Vk1otTZ7iR+Y/wDWW2fz 2q8OR0AE4a1Hk/+R+nLac/0V0y5LMhFN3gA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdeggdduhedujeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpefoggffhffvkfgjfhfutgesrgdtreerre dtjeenucfhrhhomhepfdftohgsucfnrghnuggvrhhsfdcuoehrohgssegsohhtthhlvggu rdgtohguvghsqeenucggtffrrghtthgvrhhnpeeitdfhhfdvfffhtedtgfevfefgueegge duueekjeehieeggffhieevleeffeeufeenucffohhmrghinhepghhithhhuhgsrdgtohhm necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprhhosg essghothhtlhgvugdrtghouggvshdpnhgspghrtghpthhtohepuddpmhhouggvpehsmhht phhouhhtpdhrtghpthhtohepihhnthgvrhhnrghlsheslhhishhtshdrphhhphdrnhgvth X-ME-Proxy: Feedback-ID: ifab94697:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id D80841820054; Sat, 25 Oct 2025 04:23:31 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface Precedence: list list-help: list-unsubscribe: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 X-ThreadId: AiBMg20Z7Ek3 Date: Sat, 25 Oct 2025 10:23:06 +0200 To: internals@lists.php.net Message-ID: <48b2d4f5-2baa-4dac-9f13-163ac68a4adf@app.fastmail.com> In-Reply-To: References: Subject: Re: [PHP-DEV] RFC proposal for adding SORT_STRICT flag to array_unique() Content-Type: multipart/alternative; boundary=9b8f16086c724537aa71284cff4fdc1a From: rob@bottled.codes ("Rob Landers") --9b8f16086c724537aa71284cff4fdc1a Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Fri, Oct 24, 2025, at 21:34, Jason Marble wrote: > Hello everybody! >=20 > I'd like to open a discussion regarding the behavior of `array_unique(= )` with the `SORT_REGULAR` flag when used on arrays containing mixed typ= es. >=20 > Currently, `SORT_REGULAR` uses non-strict comparisons, which can lead = to unintentional data loss when values like `100` and `"100"` are treate= d as duplicates. This forces developers to implement user-land workaroun= ds. >=20 > Here is a common scenario where this behavior is problematic: >=20 > ```php > $events =3D [ > ['id' =3D> 100, 'type' =3D> 'user.login'], // User event (i= nt) > ['id' =3D> "100", 'type' =3D> 'system.migration'], // System even= t (string) > ['id' =3D> 100, 'type' =3D> 'user.login'], // Duplicate use= r event > ]; >=20 > $event_ids =3D array_column($events, 'id'); // [100, "100", 100] >=20 > // Current behavior with SORT_REGULAR > $unique_ids =3D array_unique($event_ids, SORT_REGULAR); // Result: [10= 0] > // The string "100" is lost due to type coercion. > ``` >=20 > To address this, I propose adding a new flag, `SORT_STRICT`, which wou= ld use strict (`=3D=3D=3D`) comparisons to differentiate between values = of different types. >=20 > With the new flag, the result would be: >=20 > ```php > // Proposed behavior with SORT_STRICT > $unique_ids =3D array_unique($event_ids, SORT_STRICT); // Result: [100= , "100"] > // Both integer and string values are preserved. > ``` >=20 > I've already submitted a PR to correct the bug I just highlighted: > PR: https://github.com/php/php-src/pull/20273 > The potential for a `SORT_NATURAL` flag also came to mind as another u= seful addition, but I believe `SORT_STRICT` is the more critical feature= to discuss first. >=20 > I look forward to your feedback. >=20 > Thanks, =20 > - Jason Hi Jason, Other than the bytes in memory and how they=E2=80=99re laid out, I fail = to see how 100 is different from 100. They=E2=80=99re conceptually ident= ical, and array_* functions generally behave by value, not by identity. = I think it=E2=80=99s probably wise to take a step back here and evaluate= the knock-on effects of something like this: SORT_REGULAR has some warts, it isn=E2=80=99t perfect. Having a SORT_STR= ICT sounds kinda nice until you start thinking about it a bit. This para= meter has traditionally been used to indicate a "comparison mode" that d= escribes how to compare values. Strict identity is on a completely diffe= rent axis (they can=E2=80=99t be less/greater than; objects aren=E2=80=99= t *strictly* comparable, but they=E2=80=99re loosely comparable, 1.0 is = strictly comparable to 1 or "1"). Further, it begs the question: "can I = get a SORT_STRICT_NUMERIC" or "can I get a SORT_STRICT_STRING", which fu= rther indicates this is a completely different axis altogether than "jus= t" a different comparison mode. As to your example, it conflates two namespaces of Ids =E2=80=94 user id= s and system ids =E2=80=94 into a single untyped bag, then asks array_un= ique() to preserve that boundary. This is a domain distinction, not a la= nguage problem. Simply removing your array_column() step in your example= arrives at your desired solution. =E2=80=94 Rob --9b8f16086c724537aa71284cff4fdc1a Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
On Fri, Oct = 24, 2025, at 21:34, Jason Marble wrote:
Hello everybody!

I'd like to open a discussion regarding the behavi= or of `array_unique()` with the `SORT_REGULAR` flag when used on arrays = containing mixed types.

Currently, `SORT_REGULA= R` uses non-strict comparisons, which can lead to unintentional data los= s when values like `100` and `"100"` are treated as duplicates. This for= ces developers to implement user-land workarounds.

<= div>Here is a common scenario where this behavior is problematic:
<= div>
```php
$events =3D [
   = ; ['id' =3D> 100, 'type' =3D> 'user.login'],      &= nbsp; // User event (int)
    ['id' =3D> "100", '= type' =3D> 'system.migration'],  // System event (string)
<= div>    ['id' =3D> 100, 'type' =3D> 'user.login'], =       // Duplicate user event
];
$event_ids =3D array_column($events, 'id'); // [100, "100", = 100]

// Current behavior with SORT_REGULAR
$unique_ids =3D array_unique($event_ids, SORT_REGULAR); // Result:= [100]
// The string "100" is lost due to type coercion.
=
```

To address this, I propose adding a ne= w flag, `SORT_STRICT`, which would use strict (`=3D=3D=3D`) comparisons = to differentiate between values of different types.

=
With the new flag, the result would be:

``= `php
// Proposed behavior with SORT_STRICT
$unique_i= ds =3D array_unique($event_ids, SORT_STRICT); // Result: [100, "100"]
// Both integer and string values are preserved.
```

I've already submitted a PR to correct the bug I= just highlighted:
The potential for a `SORT_NATURAL` flag= also came to mind as another useful addition, but I believe `SORT_STRIC= T` is the more critical feature to discuss first.

I look forward to your feedback.

Thanks,&nbs= p; 
- = Jason

Hi = Jason,

Other than the bytes in memory and how t= hey=E2=80=99re laid out, I fail to see how 100 is different from 100. Th= ey=E2=80=99re conceptually identical, and array_* functions generally be= have by value, not by identity. I think it=E2=80=99s probably wise to ta= ke a step back here and evaluate the knock-on effects of something like = this:

SORT_REGULAR has some warts, it isn=E2=80= =99t perfect. Having a SORT_STRICT sounds kinda nice until you start thi= nking about it a bit. This parameter has traditionally been used to indi= cate a "comparison mode" that describes how to compare values. Strict id= entity is on a completely different axis (they can=E2=80=99t be less/gre= ater than; objects aren=E2=80=99t strictly comparable, but they=E2= =80=99re loosely comparable, 1.0 is strictly comparable to 1 or "1"). Fu= rther, it begs the question: "can I get a SORT_STRICT_NUMERIC" or "can I= get a SORT_STRICT_STRING", which further indicates this is a completely= different axis altogether than "just" a different comparison mode.

As to your example, it conflates two namespaces of = Ids =E2=80=94 user ids and system ids =E2=80=94 into a single untyped ba= g, then asks array_unique() to preserve that boundary. This is a domain = distinction, not a language problem. Simply removing your array_column()= step in your example arrives at your desired solution.

=E2=80=94 Rob
--9b8f16086c724537aa71284cff4fdc1a--