Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:120451 Return-Path: <buritomath@gmail.com> Delivered-To: mailing list internals@lists.php.net Received: (qmail 26375 invoked from network); 30 May 2023 11:35:03 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 30 May 2023 11:35:03 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id AF0EB180505 for <internals@lists.php.net>; Tue, 30 May 2023 04:35:02 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: <buritomath@gmail.com> Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for <internals@lists.php.net>; Tue, 30 May 2023 04:35:02 -0700 (PDT) Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-9700219be87so802797766b.1 for <internals@lists.php.net>; Tue, 30 May 2023 04:35:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685446500; x=1688038500; h=to:date:message-id:subject:mime-version:from:from:to:cc:subject :date:message-id:reply-to; bh=kXPn+DPb5QYnKvXViBY9rrV6NqRF6qMHO9iFN+UXyp4=; b=bOUVwkyOX8SspDafngRB3jRgVNtxVx0lyfcFjXdVfmSxGW+2PBG6fhqsRSZC2BJ6lr S0iBa5XoWKsa+9kEuJJJAvkw7PZ0xNGuLjqPxaTprZ2Aaw/QU61aLEFMszNuVloaVtiW 0aSKVTnx1sbiKZi8HWUe+G1IuKMUlJl5a0Tjj0EP8go05JCIRJ2Hww9RXoj3yv8cGkMC MdCG2DDsAp4LgCgnsRU0UmUtJ+Qi64ujpqnOAuRqk0yxlYRJ57MxFIL7XulhgMs4+Vto 13QmhKiino/+WUOuLORGWfVjtsgLB0qVwKIeYbFUCtwUu0f4d5VG2cfP/ADGyru1CZ/R obNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685446500; x=1688038500; h=to:date:message-id:subject:mime-version:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=kXPn+DPb5QYnKvXViBY9rrV6NqRF6qMHO9iFN+UXyp4=; b=kaj/vnHsOiTr8lgKNVWnOv/znqLiI8AA3iKsdmOf0GEaVqmARmqI2LhPnlsXR8NxO9 egKkGEOgJK1AfM20AMMKc3yfG9GtLxmMt9tF8mgcpG52peidH1B2zIFFiNivmu/Zv3Fm Wp0wF6OveeOiv5IunJJth/hZp1oVFWBDx0hGE63Mch6fSMdELjuK+0de072htTLkGFH3 6h/peAAwd/+vgNb4qf/5KtWaNLGpZKnPZRtbmpuKj8mcSaY4Dny+xE+C+2paUkSrOzXu pbJfOwJXhBsQFQJgogxewnQ4BCJkc4OtWThCVRXLsJ6AmmhdeDiHBCdcdIQCWeZ7TzTa 69LQ== X-Gm-Message-State: AC+VfDxGwHVFIzFDKgnLJSBLE7UJ7j2kKCd1qKEY9auw6DAuPkIIJ4xl TtFoCTbUwa/zCYITKcw5HpAXfSDJ3Ro= X-Google-Smtp-Source: ACHHUZ6eKMKQcHsxQLXo9pHM19cq83rhNsXIMEmi+2YwqR2Asmq6fMVuL2xo8/QJqKUdnGTtKV1XtQ== X-Received: by 2002:a17:906:58c4:b0:969:bea8:e1c7 with SMTP id e4-20020a17090658c400b00969bea8e1c7mr2012985ejs.37.1685446500407; Tue, 30 May 2023 04:35:00 -0700 (PDT) Received: from smtpclient.apple ([46.217.211.121]) by smtp.gmail.com with ESMTPSA id p13-20020a170906838d00b00965d4b2bd4csm7353427ejx.141.2023.05.30.04.34.59 for <internals@lists.php.net> (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 May 2023 04:34:59 -0700 (PDT) Content-Type: multipart/alternative; boundary="Apple-Mail=_3B4D8BC6-6256-4AB6-BC9D-B71E2D803898" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Message-ID: <289E585B-EF8B-4B17-89BE-BE8295FD9FE1@gmail.com> Date: Tue, 30 May 2023 13:34:49 +0200 To: internals@lists.php.net X-Mailer: Apple Mail (2.3731.600.7) Subject: [RFC] [Discussion] Add new function `array_group` From: buritomath@gmail.com (Boro Sitnikovski) --Apple-Mail=_3B4D8BC6-6256-4AB6-BC9D-B71E2D803898 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hello all, As per the How To Create an RFC <https://wiki.php.net/rfc/howto> = instructions, I am sending this e-mail in order to get your feedback on = my proposal. I propose introducing a function to PHP core named `array_group`. This = function takes an array and a function and returns an array that = contains arrays - groups of consecutive elements. This is very similar = to Haskell's `groupBy` function = <https://hackage.haskell.org/package/groupBy-0.1.0.0/docs/Data-List-GroupB= y.html>. For some background as to why - usually, when people want to do grouping = in PHP, they use hash maps, so something like: ``` <?php $array =3D [ [ 'id' =3D> 1, 'value' =3D> 'foo' ], [ 'id' =3D> 1, 'value' =3D> 'bar' ], [ 'id' =3D> 2, 'value' =3D> 'baz' ], ]; $groups =3D []; foreach ( $array as $element ) { $groups[ $element['id'] ][] =3D $element; } var_dump( $groups ); ``` This can now be achieved as follows (not preserving keys): ``` <?php $array =3D [ [ 'id' =3D> 1, 'value' =3D> 'foo' ], [ 'id' =3D> 1, 'value' =3D> 'bar' ], [ 'id' =3D> 2, 'value' =3D> 'baz' ], ]; $groups =3D array_group( $array, function( $a, $b ) { return $a['id'] =3D=3D $b['id']; } ); ``` The disadvantage of the first approach is that we are only limited to = using equality check, and we cannot group by, say, `<` or other = functions. Similarly, the advantage of the first approach is that the keys are = preserved, and elements needn't be consecutive. In any case, I think a utility function such as `array_group` will be = widely useful. Please find attached a patch with a proposed implementation. Curious = about your feedback. Best, Boro Sitnikovski <https://people.php.net/bor0> =EF=BF=BC= --Apple-Mail=_3B4D8BC6-6256-4AB6-BC9D-B71E2D803898 Content-Type: multipart/mixed; boundary="Apple-Mail=_7312EF78-3546-44DC-B65E-7C2DCBF69A36" --Apple-Mail=_7312EF78-3546-44DC-B65E-7C2DCBF69A36 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii <html><head><meta http-equiv=3D"content-type" content=3D"text/html; = charset=3Dus-ascii"></head><body style=3D"overflow-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;"><div>Hello = all,</div><div><br></div><div>As per the <a = href=3D"https://wiki.php.net/rfc/howto">How To Create an = RFC</a> instructions, I am sending this e-mail in order to get your = feedback on my proposal.</div><div><br></div><div>I propose introducing = a function to PHP core named `array_group`. This function takes an array = and a function and returns an array that contains arrays - groups of = consecutive elements. This is very similar to Haskell's <a = href=3D"https://hackage.haskell.org/package/groupBy-0.1.0.0/docs/Data-List= -GroupBy.html">`groupBy` function</a>.</div><div><br></div><div>For some = background as to why - usually, when people want to do grouping in PHP, = they use hash maps, so something = like:</div><div><br></div><div>```</div><div><div><?php</div><div>$arra= y =3D [</div><div><span class=3D"Apple-tab-span" style=3D"white-space: = pre;"> </span>[ 'id' =3D> 1, 'value' =3D> 'foo' = ],</div><div><span class=3D"Apple-tab-span" style=3D"white-space: pre;"> = </span>[ 'id' =3D> 1, 'value' =3D> 'bar' ],</div><div><span = class=3D"Apple-tab-span" style=3D"white-space: pre;"> </span>[ 'id' = =3D> 2, 'value' =3D> 'baz' = ],</div><div>];</div><div><br></div><div>$groups =3D = [];</div><div>foreach ( $array as $element ) {</div><div> = $groups[ $element['id'] ][] =3D = $element;</div><div>}</div><div><br></div><div>var_dump( $groups = );</div></div><div>```</div><div><br></div><div>This can now be achieved = as follows (not preserving = keys):</div><div><br></div><div>```</div><div><div><?php</div><div>$arr= ay =3D [</div><div><span class=3D"Apple-tab-span" style=3D"white-space: = pre;"> </span>[ 'id' =3D> 1, 'value' =3D> 'foo' = ],</div><div><span class=3D"Apple-tab-span" style=3D"white-space: pre;"> = </span>[ 'id' =3D> 1, 'value' =3D> 'bar' ],</div><div><span = class=3D"Apple-tab-span" style=3D"white-space: pre;"> </span>[ 'id' = =3D> 2, 'value' =3D> 'baz' = ],</div><div>];</div><div><br></div><div>$groups =3D array_group( = $array, function( $a, $b ) {</div><div><span class=3D"Apple-tab-span" = style=3D"white-space: pre;"> </span>return $a['id'] =3D=3D = $b['id'];</div><div>} = );</div><div>```</div></div><div><br></div><div>The disadvantage of the = first approach is that we are only limited to using equality check, and = we cannot group by, say, `<` or other functions.</div><div>Similarly, = the advantage of the first approach is that the keys are preserved, and = elements needn't be consecutive.</div><div><br></div><div>In any case, I = think a utility function such as `array_group` will be widely = useful.</div><div><br></div><div>Please find attached a patch with a = proposed implementation. Curious about your = feedback.</div><div><br></div><div>Best,</div><div><br></div><div><a = href=3D"https://people.php.net/bor0">Boro = Sitnikovski</a></div><div><br></div></body></html>= --Apple-Mail=_7312EF78-3546-44DC-B65E-7C2DCBF69A36 Content-Disposition: attachment; filename=array_group.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="array_group.patch" Content-Transfer-Encoding: 7bit diff --git a/Zend/zend_hash.h b/Zend/zend_hash.h index 5726c8a919..c462de2850 100644 --- a/Zend/zend_hash.h +++ b/Zend/zend_hash.h @@ -1087,6 +1087,10 @@ static zend_always_inline void *zend_hash_get_current_data_ptr_ex(HashTable *ht, _ZEND_HASH_FOREACH_VAL(ht); \ _val = _z; +#define ZEND_HASH_FOREACH_VAL_FROM(ht, _val, _from) \ + ZEND_HASH_FOREACH_FROM(ht, 0, _from); \ + _val = _z; + #define ZEND_HASH_REVERSE_FOREACH_VAL(ht, _val) \ _ZEND_HASH_REVERSE_FOREACH_VAL(ht); \ _val = _z; diff --git a/ext/standard/array.c b/ext/standard/array.c index 46c2c882b8..171482113a 100644 --- a/ext/standard/array.c +++ b/ext/standard/array.c @@ -6394,6 +6394,75 @@ PHP_FUNCTION(array_map) } /* }}} */ +/* {{{ Groups consecutive elements from the array via the callback. */ +PHP_FUNCTION(array_group) +{ + zval *array; + zend_fcall_info fci; + zend_fcall_info_cache fci_cache = empty_fcall_info_cache; + + zval args[2]; + zval *prev_val; + zval *curr_val; + zval chunk; + zval retval; + + ZEND_PARSE_PARAMETERS_START(2, 2) + Z_PARAM_ARRAY(array) + Z_PARAM_FUNC(fci, fci_cache) + ZEND_PARSE_PARAMETERS_END(); + + if (zend_hash_num_elements(Z_ARRVAL_P(array)) == 0) { + RETVAL_EMPTY_ARRAY(); + return; + } + + // The array is guaranteed to have at least one element. + prev_val = ZEND_HASH_ELEMENT(Z_ARRVAL_P(array), 0); + + // Generate the initial group. + array_init(&chunk); + zend_hash_next_index_insert_new(Z_ARRVAL_P(&chunk), prev_val); + + array_init(return_value); + + fci.retval = &retval; + fci.param_count = 2; + + ZEND_HASH_FOREACH_VAL_FROM(Z_ARRVAL_P(array), curr_val, 1) { + ZVAL_COPY(&args[0], prev_val); + ZVAL_COPY(&args[1], curr_val); + fci.params = args; + + if (zend_call_function(&fci, &fci_cache) == SUCCESS && Z_TYPE(retval) != IS_UNDEF) { + int retval_true; + + zval_ptr_dtor(&args[1]); + zval_ptr_dtor(&args[0]); + + retval_true = zend_is_true(&retval); + + zval_ptr_dtor(&retval); + + // Perform grouping - add the current group and create a new one. + if (!retval_true) { + zend_hash_next_index_insert_new(Z_ARRVAL_P(return_value), &chunk); + array_init(&chunk); + } + + zend_hash_next_index_insert_new(Z_ARRVAL_P(&chunk), curr_val); + } else { + zval_ptr_dtor(&args[1]); + zval_ptr_dtor(&args[0]); + RETURN_NULL(); + } + } ZEND_HASH_FOREACH_END(); + + // Add the last group. + zend_hash_next_index_insert_new(Z_ARRVAL_P(return_value), &chunk); +} +/* }}} */ + /* {{{ Checks if the given key or index exists in the array */ PHP_FUNCTION(array_key_exists) { diff --git a/ext/standard/basic_functions.stub.php b/ext/standard/basic_functions.stub.php index effb05ff9f..a6a8408dc9 100755 --- a/ext/standard/basic_functions.stub.php +++ b/ext/standard/basic_functions.stub.php @@ -1870,6 +1870,8 @@ function array_filter(array $array, ?callable $callback = null, int $mode = 0): function array_map(?callable $callback, array $array, array ...$arrays): array {} +function array_group(array $array, callable $callback): array {} + /** * @param string|int $key * @compile-time-eval diff --git a/ext/standard/basic_functions_arginfo.h b/ext/standard/basic_functions_arginfo.h index 5612ee2186..40600762f7 100644 --- a/ext/standard/basic_functions_arginfo.h +++ b/ext/standard/basic_functions_arginfo.h @@ -342,6 +342,11 @@ ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_array_map, 0, 2, IS_ARRAY, 0) ZEND_ARG_VARIADIC_TYPE_INFO(0, arrays, IS_ARRAY, 0) ZEND_END_ARG_INFO() +ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_array_group, 0, 2, IS_ARRAY, 0) + ZEND_ARG_TYPE_INFO(0, array, IS_ARRAY, 0) + ZEND_ARG_TYPE_INFO(0, callback, IS_CALLABLE, 0) +ZEND_END_ARG_INFO() + ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_array_key_exists, 0, 2, _IS_BOOL, 0) ZEND_ARG_INFO(0, key) ZEND_ARG_TYPE_INFO(0, array, IS_ARRAY, 0) @@ -2292,6 +2297,7 @@ ZEND_FUNCTION(array_product); ZEND_FUNCTION(array_reduce); ZEND_FUNCTION(array_filter); ZEND_FUNCTION(array_map); +ZEND_FUNCTION(array_group); ZEND_FUNCTION(array_key_exists); ZEND_FUNCTION(array_chunk); ZEND_FUNCTION(array_combine); @@ -2915,6 +2921,7 @@ static const zend_function_entry ext_functions[] = { ZEND_FE(array_reduce, arginfo_array_reduce) ZEND_FE(array_filter, arginfo_array_filter) ZEND_FE(array_map, arginfo_array_map) + ZEND_FE(array_group, arginfo_array_group) ZEND_SUPPORTS_COMPILE_TIME_EVAL_FE(array_key_exists, arginfo_array_key_exists) ZEND_FALIAS(key_exists, array_key_exists, arginfo_key_exists) ZEND_SUPPORTS_COMPILE_TIME_EVAL_FE(array_chunk, arginfo_array_chunk) diff --git a/ext/standard/tests/array/array_group_basic.phpt b/ext/standard/tests/array/array_group_basic.phpt new file mode 100644 index 0000000000..9db0e7afe3 --- /dev/null +++ b/ext/standard/tests/array/array_group_basic.phpt @@ -0,0 +1,130 @@ +--TEST-- +Test array_group() function : basic functionality +--FILE-- +<?php +echo "*** Testing array_group() : basic functionality ***\n"; + +function less_than( $a, $b ) { + return $a < $b; +} + +function equal( $a, $b ) { + return $a == $b; +} + +function equal_obj( $a, $b ) { + return $a->name == $b->name; +} + +$arr1 = array(1, 2, 3); + +echo "-- With an integer array for < --\n"; +var_dump( array_group($arr1, 'less_than') ); + +echo "-- With an integer array for == --\n"; +var_dump( array_group($arr1, 'equal') ); + +echo "-- With an empty integer array for == --\n"; +var_dump( array_group($arr1, 'equal') ); + +echo "-- With a singleton integer array for == --\n"; +var_dump( array_group(array(1), 'equal') ); + +$obj1 = (object)array('id'=>3,'name'=>'foo'); +$obj2 = (object)array('id'=>4,'name'=>'foo'); +$obj3 = (object)array('id'=>5,'name'=>'baz'); + +echo "-- With a singleton integer array for == --\n"; +var_dump( array_group(array($obj1, $obj2, $obj3), 'equal_obj') ); + +echo "Done"; +?> +--EXPECT-- +*** Testing array_group() : basic functionality *** +-- With an integer array for < -- +array(1) { + [0]=> + array(3) { + [0]=> + int(1) + [1]=> + int(2) + [2]=> + int(3) + } +} +-- With an integer array for == -- +array(3) { + [0]=> + array(1) { + [0]=> + int(1) + } + [1]=> + array(1) { + [0]=> + int(2) + } + [2]=> + array(1) { + [0]=> + int(3) + } +} +-- With an empty integer array for == -- +array(3) { + [0]=> + array(1) { + [0]=> + int(1) + } + [1]=> + array(1) { + [0]=> + int(2) + } + [2]=> + array(1) { + [0]=> + int(3) + } +} +-- With a singleton integer array for == -- +array(1) { + [0]=> + array(1) { + [0]=> + int(1) + } +} +-- With a singleton integer array for == -- +array(2) { + [0]=> + array(2) { + [0]=> + object(stdClass)#1 (2) { + ["id"]=> + int(3) + ["name"]=> + string(3) "foo" + } + [1]=> + object(stdClass)#2 (2) { + ["id"]=> + int(4) + ["name"]=> + string(3) "foo" + } + } + [1]=> + array(1) { + [0]=> + object(stdClass)#3 (2) { + ["id"]=> + int(5) + ["name"]=> + string(3) "baz" + } + } +} +Done --Apple-Mail=_7312EF78-3546-44DC-B65E-7C2DCBF69A36 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=us-ascii <html><head><meta http-equiv="content-type" content="text/html; charset=us-ascii"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"></body></html> --Apple-Mail=_7312EF78-3546-44DC-B65E-7C2DCBF69A36-- --Apple-Mail=_3B4D8BC6-6256-4AB6-BC9D-B71E2D803898--