Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121661 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 95442 invoked from network); 12 Nov 2023 20:20:33 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 12 Nov 2023 20:20:33 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 590B8180511 for ; Sun, 12 Nov 2023 12:20:33 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-yw1-f180.google.com (mail-yw1-f180.google.com [209.85.128.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 12 Nov 2023 12:20:32 -0800 (PST) Received: by mail-yw1-f180.google.com with SMTP id 00721157ae682-5a8628e54d4so31161087b3.0 for ; Sun, 12 Nov 2023 12:20:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dqxtech.net; s=google; t=1699820432; x=1700425232; darn=lists.php.net; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UmtQW17BabIDH36+luPm31lK+1ljDuwZX5p6lF3sDAo=; b=Qiw6h+A9QtCLiWuVrq9TqXq6+mwFnUPYD1q0qX1FVi/UQaDtbgSCQa6HM87m48kcSw CrSv0VOUvGBp5AR1N79d0Koci9gIPg8hX1KYFHjNF2G1Yi+mtu8kqIwykqzpY2hhjpQU UaDhgvJe6tMfVuDQXK84iWxzUqLz+CPxgow6XL8W5NqcC+9u2cR30j/tpRkh1ZSSEk6w BBS/Gt5Y9IiLS0nEzHrA3aty+GvqvTOeDcPq1KX9HIvq06UhKzSz/4h933vbF070W6Sj bNZuRzRmtH7KyettqorTVKB+l7yIkiSTPCNoKpGxiwvLiQZegfc4sPzJp8z0L8klbvh3 UGmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699820432; x=1700425232; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UmtQW17BabIDH36+luPm31lK+1ljDuwZX5p6lF3sDAo=; b=NeHq8M6L9n84rOIflVyDZIVrPsVGAlpSn4H6OGhrFNgg+vEfxV/3ZL+JrA/y21LzjJ cOf+sL3uv8faCyJQH6r6O7NTGpTzBb4FVRvNOteF+SWieLw3VwTAXSVBHZfT9y0yt80v cerw446Wd5T++lT4b4PEOI8jztgNhPpPe9h8K69ktkGwTiA3htR8OwDRRmBJBEZze9ON APWv4QYNPbkgyoSanh8jJVFhJHwMQVCvPCYcxrmMlvFwlK2WaIWYl0mIR1OMtqnbLWfB d9NF4I14kj+rH3zVdI7eN9frqqjBUyJ/xi0LGZ3HFeFmUixR3zi7wpDTqlNCkNs2V/Gt ylFA== X-Gm-Message-State: AOJu0Yx2ry70cyOn/l3m2eJNPpgzsoO5Up2wl43NmuLpqEO05J3jDyMk ERkXD45oCSHw6aTHw6A4PCRN/1Gy9PvlavQVnEpoXQ== X-Google-Smtp-Source: AGHT+IENur/oBAt6cPlJem1ZImTwNbVhKbHCek8D3vq92mWX3cAZnRdPghJacRmylSFpSdt25xsyquOJ0djVoWpePOg= X-Received: by 2002:a81:9883:0:b0:5c0:fe52:229d with SMTP id p125-20020a819883000000b005c0fe52229dmr2260489ywg.14.1699820431815; Sun, 12 Nov 2023 12:20:31 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: Date: Sun, 12 Nov 2023 21:20:20 +0100 Message-ID: To: David Gebler Cc: PHP internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Array functions with strict comparison From: andreas@dqxtech.net (Andreas Hennings) On Sat, 11 Nov 2023 at 20:43, Andreas Hennings wrote: > > Hello David, > > On Sat, 11 Nov 2023 at 20:04, David Gebler wrote: > > > > On Sat, Nov 11, 2023 at 6:05=E2=80=AFPM Andreas Hennings > > wrote: > > > > > Hello internals, > > > I noticed that array functions like array_diff(), array_intersect() > > > etc use weak comparison. > > > > > > > > That's not quite correct. Using the example of array_diff, the comparis= on > > is a strict equality check on a string cast of the values. So > > array_diff([""], [false]) will indeed be empty > > but array_diff(["0"],[false]) will return ["0"]. > > Thanks, good to know! > So in other words, it is still some kind of weak comparison, but with > different casting rules than '=3D=3D'. > Still this is not desirable in many cases. > > > > > Tbh any use case for whatever array function but with strict comparison= is > > such an easy thing to implement in userland[1] I'm not bothered about > > supporting it in core. But that's just me. I don't generally like the i= dea > > of adding new array_* or str_* functions to the global namespace withou= t > > very good cause. There is a precedent for it though, in terms of change= s > > which have gone through in PHP 8, such as array_is_list or str_starts_w= ith. > > > I would argue that the strict variants of these functions would be > about as useful as the non-strict ones. > Or in my opinion, they would become preferable over the old functions > for most use cases. > > In other words, we could say the old/existing functions should not > have been added to the language. > (of course this does not mean we can or should remove them now) > > Regarding performance, I measure something like factor 2 for a diff of > range(0, 500) minus [5], comparing array_diff() vs array_diff_strict() > as proposed here. > So for large arrays or repeated calls it does make a difference. Some more results on this. With the right array having only one element, i can actually optimize the userland function to be almost as fast as the native function. However, if I pump up the right array, the difference becomes quite bad. function array_diff_userland(array $array1, array $array2 =3D [], array ...$arrays): array { if ($arrays) { // Process additional arrays only when they exist. $arrays =3D array_map('array_values', $arrays); $array2 =3D array_merge($array2, ...$arrays); } // This is actually slower, it seems. #return array_filter($array1, fn ($value) =3D> !in_array($value, $array2, TRUE)); $diff =3D []; foreach ($array1 as $k =3D> $value) { // Use non-strict in_array(), to get a fair comparison with the native function. if (!in_array($value, $array2)) { $diff[$k] =3D $value; } } return $diff; } $arr =3D range(0, 500); $arr2 =3D range(0, 1500, 2); $dts =3D []; $t =3D microtime(TRUE); $diff_native =3D array_diff_userland($arr, $arr2); $t +=3D $dts['userland'] =3D (microtime(TRUE) - $t); $diff_userland =3D array_diff($arr, $arr2); $t +=3D $dts['native'] =3D (microtime(TRUE) - $t); assert($diff_userland =3D=3D=3D $diff_native); // Run both again to detect differences due to warm-up. $t =3D microtime(TRUE); $diff_native =3D array_diff_userland($arr, $arr2); $t +=3D $dts['userland.1'] =3D (microtime(TRUE) - $t); $diff_userland =3D array_diff($arr, $arr2); $t +=3D $dts['native.1'] =3D (microtime(TRUE) - $t); assert($diff_userland =3D=3D=3D $diff_native); // Now use a right array that has no overlap with the left array. $t =3D microtime(TRUE); $arr2 =3D range(501, 1500, 2); $diff_native =3D array_diff_userland($arr, $arr2); $t +=3D $dts['userland.2'] =3D (microtime(TRUE) - $t); $diff_userland =3D array_diff($arr, $arr2); $t +=3D $dts['native.2'] =3D (microtime(TRUE) - $t); assert($diff_userland =3D=3D=3D $diff_native); var_export(array_map(fn ($dt) =3D> $dt * 1000 * 1000 . ' ns', $dts)); I see differences of factor 5 up to factor 10. So to me, this alone is an argument to implement this natively. The other argument is that it is kind of sad how the current functions don't behave as one would expect. > > Regarding the cost of more native functions: > Is the concern more about polluting the global namespace, or about > adding more functions that need to be maintained? > I can see both arguments, but I don't have a clear opinion how these > costs should be weighed. The most straightforward option seems to just name the new functions like array_diff_strict() etc. But I am happy for other proposals. > > Cheers > Andreas > > > > > > > [1] Example: > > > > function array_diff_strict(array $array1, array ...$arrays): array > > { > > $diff =3D []; > > foreach ($array1 as $value) { > > $found =3D false; > > foreach ($arrays as $array) { > > if (in_array($value, $array, true)) { > > $found =3D true; > > break; > > } > > } > > if (!$found) { > > $diff[] =3D $value; > > } > > } > > return $diff; > > }