Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:119447 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 66620 invoked from network); 1 Feb 2023 18:54:04 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 1 Feb 2023 18:54:04 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 79D8A1804F7 for ; Wed, 1 Feb 2023 10:54:03 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 1 Feb 2023 10:54:03 -0800 (PST) Received: by mail-pg1-f182.google.com with SMTP id 78so13198778pgb.8 for ; Wed, 01 Feb 2023 10:54:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=PwHce60pJIzIGD0Ecbfu6E8OD6WHJoHh3icWmV1SABQ=; b=lRhhmSGo+TP07VayaZ3ZuXSj2zhUSGPVao3z1yB+Ck/MSWsP4CBgPNc8/2jRCAONXH Cgs2Nx3ekL76rfJXCdgfbMZWNrpy8TlLtweLQsULnD2nEv3KA5qeo2XX1gjakJ5TSM8Z 0XRHLziRubguPGl98iKUVPFX6S8YrwAh/WodKjJBfknOyXV0zEMp2JdjTyd7dqGpgQEF WAebNWhETuYEnXN6cmYD3OlG/m9bfneZnasjHnAfXSzeuiCRWqbIM6g4tTeNxV65M7DD uhPZikLMELWhLKeZ75CL0pI0nhHueXit8C0pg0V60o/TaWjTI78sQ1T30EI/rB3X1lPn 9csQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PwHce60pJIzIGD0Ecbfu6E8OD6WHJoHh3icWmV1SABQ=; b=h91WsUwZyy58sz6RPJgYFndSKpJvoQoD3y2q4n6Qk3bQCY0mH69G1PiN2xU2/HwV9U /7ecW3a7bVQX7+ecOygENNT8ep4nb4G5j5d4KpIHBriBbeTmeAx/hx824+yGQHC4d/cX ikZgn996ntDwX+t+LjJ5J7V8yg/6dEbB60Ukxjf4GFoO9kXvZNLZP3kpCW9lJxdEmNED uOvntAGMwR+h389pWciZ9/xzsTIiiy8E7i+LoNmNMlk8LPQYIkPlTRk6ZHqiF9rLXdyb Ny00KyWlf7a8chq2oYdaYMpE1eaz0Qilyd4O7dBTtLN2KUQVg95hfBSIAWUDDtTLeiYd NKLw== X-Gm-Message-State: AO0yUKUASmHOB6K3bNJ2SPxXlFgg3f7K9H6b+MLqA7MoCZkLVhiD55zV MmE/eL4JosIYcEbZJxzON/U8Mv1PuzzsXoadW1jFW70LOmc= X-Google-Smtp-Source: AK7set9IkDMhUCJ51xnR7UlDbyENxH3UJnqOQsj5yY01kYQ3li85wB+wk5pclVMNVMywUTb4VgKgO9seXI8QBwNo/90= X-Received: by 2002:a05:6a00:2342:b0:593:9029:fcf6 with SMTP id j2-20020a056a00234200b005939029fcf6mr767607pfj.52.1675277641495; Wed, 01 Feb 2023 10:54:01 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: Date: Wed, 1 Feb 2023 20:53:49 +0200 Message-ID: To: Sergii Shymko Cc: Marco Pivetta , PHP internals Content-Type: multipart/alternative; boundary="000000000000b8bbf005f3a7f80c" Subject: Re: [PHP-DEV] RFC proposal: function array_filter_list() to avoid subtle bugs/workarounds From: drealecs@gmail.com (=?UTF-8?Q?Alexandru_P=C4=83tr=C4=83nescu?=) --000000000000b8bbf005f3a7f80c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Feb 1, 2023, 19:59 Sergii Shymko wrote: > Hi Marco, > > ________________________________ > From: Marco Pivetta > Sent: Wednesday, February 1, 2023 9:25 AM > To: Sergii Shymko > Cc: internals@lists.php.net > Subject: Re: [PHP-DEV] RFC proposal: function array_filter_list() to avoi= d > subtle bugs/workarounds > > Hey Sergii, > > > On Wed, 1 Feb 2023 at 18:22, Sergii Shymko sergey@shymko.net>> wrote: > Hi, > > After programming in PHP for two decades, my goal for 2023 is to try to > contribute to the language. > The plan is to start small and, if successful, work my way up increasing > complexity of proposals. > This topic has been chosen for starters, because IMO it strikes a good > balance between simplicity and usefulness. > I should be able to implement the RFC myself, unless some deep OPcache/JI= T > nuances pop up. > > Let me give you a brief overview of the problem and the proposed solution= . > Function array_is_list() added in PHP 8.1 introduces the concept of a > "list" =E2=80=93 array having 0..count-1 indexes. > The function is awesome and array "lists" are completely compatible with > all array_* functions! > However, function array_filter() exhibits a nuanced behavior when > filtering lists. > For instance, it preserves array keys which may (or may not) create gaps > in sequential indexes. > These gaps mean that a filtered list is not a list anymore as validated b= y > array_is_list(). > > For example: > $originalList =3D ['first', '', 'last']; > $filteredList =3D array_filter($originalList); > var_export(filteredList); // array(0 =3D> 'first', 2 =3D> 'last') > var_export(array_is_list($originalList)); // true > var_export(array_is_list($filteredList)); // false > > The behavior is counterintuitive and can lead to subtle bugs, such as > encoding issues: > echo json_encode($originalList); // ["first", "", "last"] > echo json_encode($filteredList); // {"0": "first", "2": "last"} > > The workaround is to post-process the filtered array with array_values() > to reset the indexes. > The proposal is to introduce a function array_filter_list() that would > work solely on lists. > It will have the same signature as array_filter() and will always return = a > valid list. > > See a draft RFC with more details here: > > https://dev.to/sshymko/php-rfc-arrayfilterlist-function-35mb-temp-slug-70= 74000?preview=3D21d6760126a02464b0511498bbb95749150afb17a7ff6377c458ee54a8f= 57cfe00d4e258aa06bad3232c0dd9e73a2d62138fc990048987e9e2339a3d > > I just registered a wiki account "sshymko" with the intention of > submitting the RFC. > Could someone please approve the account and give it some karma? > > Looking forward to collaborating with the internals team! =F0=9F=99=82 > > > I don't want to shoot this down too early, but: > > 1. why in the language, when a simple userland function suffices? > 2. what's wrong with writing `array_values(array_filter(...))`? > > Marco Pivetta > > https://twitter.com/Ocramius > > https://ocramius.github.io/ > > > I do understand the aversion to adding more functions to PHP core =F0=9F= =99=82 > I think that's rather simple from the discussions I've seen here in the past years: C code is harder to write, maintain and understand while PHP code is simpler and is also not bound to php versioning so it can evolve faster. Of course, given that the performance is similar. When the performance is greatly better when having the function native, it makes more sense, like it was the case with array_is_list(), where the information exists as a flag internally almost always. I would suggest you check https://github.com/azjezz/psl/blob/next/src/Psl/Vec/filter.php#L34 and maybe use the library for other features as well. Or you can actually do the implementation and show also some numbers regarding the performance to have a better chance of having the rfc pass. That's what I observed it helps. Regards, Alex Let me try to articulate the answers to your questions: > > 1. A userland function is enough, it's almost always possible to creat= e > a userland function. > However, pretty much every single PHP project would need it and use prett= y > often. > It would be great to standardize the function name, especially now that i= t > compliments array_is_list(). > The same userland argument can be made for array_is_list(), yet people ar= e > appreciating the function. > The reasoning for adding array_filter_list() to the core is similar to > array_is_list(). > 2. The array_values(array_filter()) workaroud is usable and is indeed > used on the projects I'm involved with. > It's a relatively short one-liner allowing to not go to the extent of > wrapping it in a userland function. > Some of the disadvantages include having to always mentally translate it > in your head to "filter list". > The worst thing is that 9/10 times developers (me included) would overloo= k > this nuanced behavior. > Logically, people assume that a subset of a list is a list, which > unfortunately isn't always the case. > They'd forget to implement array_values() and cause bugs that are not > immediately detectable. > At least 3-4 of projects in my recent memory experienced this issue, it > got overlooked at code review. > Even automated tests did not catch it as the issue remains hidden when > filtering out the tail end of a list. > After brainstorming with developers, we concluded formalizing the concept > of a "list" more would help. > Having functions array_is_list() and array_filter_list() would make you > think in terms of lists more. > One would use array_filter_list() instead of array_filter() by default > unless they work with assoc arrays. > > Maybe there could be an even more advanced approach: formalizing the list > data type. > But it would be an overkill to propose a major change like this at this > time. > > Regards, > Sergii > --000000000000b8bbf005f3a7f80c--