Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:120454 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 35786 invoked from network); 30 May 2023 13:13:34 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 30 May 2023 13:13:34 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id DC8C3180538 for ; Tue, 30 May 2023 06:13:32 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 30 May 2023 06:13:32 -0700 (PDT) Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-5149390b20aso5471834a12.3 for ; Tue, 30 May 2023 06:13:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685452411; x=1688044411; h=message-id:in-reply-to:to:references:date:subject:mime-version:from :from:to:cc:subject:date:message-id:reply-to; bh=UwMEfWIGBIi7Qj43naZVYsSU/VmA0JNxtBckV3nwYZ8=; b=nnmH7Ngm3DMmcHHAE/tR6PsO0V2a/Wn0ccVMEUneBtBrsb0IGmPznhAjZ+c7FvJFhC Xj45bp8qZIN+KClv/w6GnkOrWcctMM38RbRmdy7nmCCu2uqGGTFzA+h5XbSktp/OsQ1Z dY+pRAIMZI2EmTh5QxrBNNNrAaCk6/6ERNygDwoBYyMFuwek5uT/IYAZBdyxtOQiQKNC S6IyQWhwtnhlyiOBufWwdNpTdsMFsfdhezjhp0KB9enHAnSqp/vSxSm+m8vAHcOV3Tyb JxfQ8Pn2AlBgH4WPrWo91lAdq6XVBrLlsb56bh7/QIcjcfYQFuc8NBTnhZdwiIuzHe/9 w/yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685452411; x=1688044411; h=message-id:in-reply-to:to:references:date:subject:mime-version:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UwMEfWIGBIi7Qj43naZVYsSU/VmA0JNxtBckV3nwYZ8=; b=h+YYV4QtnIYSj7LxW7LaO3AL4eBfL4vrjNpsxFoJs5DA3EmB+QS1MsDlCmZH1q5laY 7j+ZwL0U4Ky8VAvzzLspE74KE4XxIVgH5UULYXoiP8MIQRsSYH2BaxgwuJHzCcnMxzSq S78aLgKG8fcEFmvIpQnVmIvZGLpwiebOYuxelRRK2+keuLyChZJBzT3pE7nf7QsuQmbf DtuTDOvpsiQeoJyuaHCNqgQe7PHh4wWxfq+cN/24YEIKJtRQ9mMm5vh279psdcgKFUoc cPmAPj8VhQuos5cAW+roKPoh1tUX+2QFKseBns8o6IFoaPaX5feTHpPAS3KulEG8KyLO 9jGg== X-Gm-Message-State: AC+VfDy3a3hkNp1hk/MreF6dy+Fr5kWm7teiX/vaOji2hIxcoLEmHjsp NnRoJ2ErhFnrRleBYygEGJW92XIw2is= X-Google-Smtp-Source: ACHHUZ4gIUJEiXx4pRFUnoJxMrm2eKumVdMSi6ylt4Dl6HcAJYt4FBjn699B0XBr+sbwPUt6Wo47xw== X-Received: by 2002:aa7:c1d4:0:b0:514:9cef:6fd5 with SMTP id d20-20020aa7c1d4000000b005149cef6fd5mr1419695edp.30.1685452410632; Tue, 30 May 2023 06:13:30 -0700 (PDT) Received: from smtpclient.apple ([46.217.211.121]) by smtp.gmail.com with ESMTPSA id d23-20020a50fe97000000b0051458c4ae68sm4361555edt.77.2023.05.30.06.13.29 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 May 2023 06:13:29 -0700 (PDT) Content-Type: multipart/alternative; boundary="Apple-Mail=_D1DF8B65-921D-4993-B3DE-7EC9B12F6227" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Date: Tue, 30 May 2023 15:13:18 +0200 References: <289E585B-EF8B-4B17-89BE-BE8295FD9FE1@gmail.com> To: internals@lists.php.net In-Reply-To: <289E585B-EF8B-4B17-89BE-BE8295FD9FE1@gmail.com> Message-ID: <831B6DDD-B017-482A-9288-E31D44FC0298@gmail.com> X-Mailer: Apple Mail (2.3731.600.7) Subject: Re: [RFC] [Discussion] Add new function `array_group` From: buritomath@gmail.com (Boro Sitnikovski) --Apple-Mail=_D1DF8B65-921D-4993-B3DE-7EC9B12F6227 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Updated the patch: added a test about increasing subsequences example, = and a minor bugfix. =EF=BF=BC > On 30.5.2023, at 13:34, Boro Sitnikovski wrote: >=20 > Hello all, >=20 > As per the How To Create an RFC = instructions, I am sending this e-mail in order to get your feedback on = my proposal. >=20 > I propose introducing a function to PHP core named `array_group`. This = function takes an array and a function and returns an array that = contains arrays - groups of consecutive elements. This is very similar = to Haskell's `groupBy` function = . >=20 > For some background as to why - usually, when people want to do = grouping in PHP, they use hash maps, so something like: >=20 > ``` > $array =3D [ > [ 'id' =3D> 1, 'value' =3D> 'foo' ], > [ 'id' =3D> 1, 'value' =3D> 'bar' ], > [ 'id' =3D> 2, 'value' =3D> 'baz' ], > ]; >=20 > $groups =3D []; > foreach ( $array as $element ) { > $groups[ $element['id'] ][] =3D $element; > } >=20 > var_dump( $groups ); > ``` >=20 > This can now be achieved as follows (not preserving keys): >=20 > ``` > $array =3D [ > [ 'id' =3D> 1, 'value' =3D> 'foo' ], > [ 'id' =3D> 1, 'value' =3D> 'bar' ], > [ 'id' =3D> 2, 'value' =3D> 'baz' ], > ]; >=20 > $groups =3D array_group( $array, function( $a, $b ) { > return $a['id'] =3D=3D $b['id']; > } ); > ``` >=20 > The disadvantage of the first approach is that we are only limited to = using equality check, and we cannot group by, say, `<` or other = functions. > Similarly, the advantage of the first approach is that the keys are = preserved, and elements needn't be consecutive. >=20 > In any case, I think a utility function such as `array_group` will be = widely useful. >=20 > Please find attached a patch with a proposed implementation. Curious = about your feedback. >=20 > Best, >=20 > Boro Sitnikovski >=20 > --Apple-Mail=_D1DF8B65-921D-4993-B3DE-7EC9B12F6227 Content-Type: multipart/mixed; boundary="Apple-Mail=_1ABD9816-B5A0-4B20-BAF3-1E20B2BC4C08" --Apple-Mail=_1ABD9816-B5A0-4B20-BAF3-1E20B2BC4C08 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=us-ascii Updated the patch: added a test about increasing subsequences example, and a minor bugfix.

--Apple-Mail=_1ABD9816-B5A0-4B20-BAF3-1E20B2BC4C08 Content-Disposition: attachment; filename=array_group.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="array_group.patch" Content-Transfer-Encoding: 7bit diff --git a/Zend/zend_hash.h b/Zend/zend_hash.h index 5726c8a919..c462de2850 100644 --- a/Zend/zend_hash.h +++ b/Zend/zend_hash.h @@ -1087,6 +1087,10 @@ static zend_always_inline void *zend_hash_get_current_data_ptr_ex(HashTable *ht, _ZEND_HASH_FOREACH_VAL(ht); \ _val = _z; +#define ZEND_HASH_FOREACH_VAL_FROM(ht, _val, _from) \ + ZEND_HASH_FOREACH_FROM(ht, 0, _from); \ + _val = _z; + #define ZEND_HASH_REVERSE_FOREACH_VAL(ht, _val) \ _ZEND_HASH_REVERSE_FOREACH_VAL(ht); \ _val = _z; diff --git a/ext/standard/array.c b/ext/standard/array.c index 46c2c882b8..bb8627e311 100644 --- a/ext/standard/array.c +++ b/ext/standard/array.c @@ -6394,6 +6394,77 @@ PHP_FUNCTION(array_map) } /* }}} */ +/* {{{ Groups consecutive elements from the array via the callback. */ +PHP_FUNCTION(array_group) +{ + zval *array; + zend_fcall_info fci; + zend_fcall_info_cache fci_cache = empty_fcall_info_cache; + + zval args[2]; + zval *prev_val; + zval *curr_val; + zval chunk; + zval retval; + + ZEND_PARSE_PARAMETERS_START(2, 2) + Z_PARAM_ARRAY(array) + Z_PARAM_FUNC(fci, fci_cache) + ZEND_PARSE_PARAMETERS_END(); + + if (zend_hash_num_elements(Z_ARRVAL_P(array)) == 0) { + RETVAL_EMPTY_ARRAY(); + return; + } + + // The array is guaranteed to have at least one element. + prev_val = ZEND_HASH_ELEMENT(Z_ARRVAL_P(array), 0); + + // Generate the initial group. + array_init(&chunk); + zend_hash_next_index_insert_new(Z_ARRVAL_P(&chunk), prev_val); + + array_init(return_value); + + fci.retval = &retval; + fci.param_count = 2; + + ZEND_HASH_FOREACH_VAL_FROM(Z_ARRVAL_P(array), curr_val, 1) { + ZVAL_COPY(&args[0], prev_val); + ZVAL_COPY(&args[1], curr_val); + fci.params = args; + + if (zend_call_function(&fci, &fci_cache) == SUCCESS && Z_TYPE(retval) != IS_UNDEF) { + int retval_true; + + zval_ptr_dtor(&args[1]); + zval_ptr_dtor(&args[0]); + + retval_true = zend_is_true(&retval); + + zval_ptr_dtor(&retval); + + // Perform grouping - add the current group and create a new one. + if (!retval_true) { + zend_hash_next_index_insert_new(Z_ARRVAL_P(return_value), &chunk); + array_init(&chunk); + } + + zend_hash_next_index_insert_new(Z_ARRVAL_P(&chunk), curr_val); + + prev_val = curr_val; + } else { + zval_ptr_dtor(&args[1]); + zval_ptr_dtor(&args[0]); + RETURN_NULL(); + } + } ZEND_HASH_FOREACH_END(); + + // Add the last group. + zend_hash_next_index_insert_new(Z_ARRVAL_P(return_value), &chunk); +} +/* }}} */ + /* {{{ Checks if the given key or index exists in the array */ PHP_FUNCTION(array_key_exists) { diff --git a/ext/standard/basic_functions.stub.php b/ext/standard/basic_functions.stub.php index effb05ff9f..a6a8408dc9 100755 --- a/ext/standard/basic_functions.stub.php +++ b/ext/standard/basic_functions.stub.php @@ -1870,6 +1870,8 @@ function array_filter(array $array, ?callable $callback = null, int $mode = 0): function array_map(?callable $callback, array $array, array ...$arrays): array {} +function array_group(array $array, callable $callback): array {} + /** * @param string|int $key * @compile-time-eval diff --git a/ext/standard/basic_functions_arginfo.h b/ext/standard/basic_functions_arginfo.h index 5612ee2186..40600762f7 100644 --- a/ext/standard/basic_functions_arginfo.h +++ b/ext/standard/basic_functions_arginfo.h @@ -342,6 +342,11 @@ ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_array_map, 0, 2, IS_ARRAY, 0) ZEND_ARG_VARIADIC_TYPE_INFO(0, arrays, IS_ARRAY, 0) ZEND_END_ARG_INFO() +ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_array_group, 0, 2, IS_ARRAY, 0) + ZEND_ARG_TYPE_INFO(0, array, IS_ARRAY, 0) + ZEND_ARG_TYPE_INFO(0, callback, IS_CALLABLE, 0) +ZEND_END_ARG_INFO() + ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_array_key_exists, 0, 2, _IS_BOOL, 0) ZEND_ARG_INFO(0, key) ZEND_ARG_TYPE_INFO(0, array, IS_ARRAY, 0) @@ -2292,6 +2297,7 @@ ZEND_FUNCTION(array_product); ZEND_FUNCTION(array_reduce); ZEND_FUNCTION(array_filter); ZEND_FUNCTION(array_map); +ZEND_FUNCTION(array_group); ZEND_FUNCTION(array_key_exists); ZEND_FUNCTION(array_chunk); ZEND_FUNCTION(array_combine); @@ -2915,6 +2921,7 @@ static const zend_function_entry ext_functions[] = { ZEND_FE(array_reduce, arginfo_array_reduce) ZEND_FE(array_filter, arginfo_array_filter) ZEND_FE(array_map, arginfo_array_map) + ZEND_FE(array_group, arginfo_array_group) ZEND_SUPPORTS_COMPILE_TIME_EVAL_FE(array_key_exists, arginfo_array_key_exists) ZEND_FALIAS(key_exists, array_key_exists, arginfo_key_exists) ZEND_SUPPORTS_COMPILE_TIME_EVAL_FE(array_chunk, arginfo_array_chunk) diff --git a/ext/standard/tests/array/array_group_basic.phpt b/ext/standard/tests/array/array_group_basic.phpt new file mode 100644 index 0000000000..8f23a6b941 --- /dev/null +++ b/ext/standard/tests/array/array_group_basic.phpt @@ -0,0 +1,115 @@ +--TEST-- +Test array_group() function : basic functionality +--FILE-- +name == $b->name; +} + +$arr1 = array(1, 2, 3); + +echo "-- With an integer array for < --\n"; +var_dump( array_group($arr1, 'less_than') ); + +echo "-- With an integer array for == --\n"; +var_dump( array_group($arr1, 'equal') ); + +echo "-- With an empty array for == --\n"; +var_dump( array_group(array(), 'equal') ); + +echo "-- With a singleton integer array for == --\n"; +var_dump( array_group(array(1), 'equal') ); + +$obj1 = (object)array('id'=>3,'name'=>'foo'); +$obj2 = (object)array('id'=>4,'name'=>'foo'); +$obj3 = (object)array('id'=>5,'name'=>'baz'); + +echo "-- With an integer array of objects for == --\n"; +var_dump( array_group(array($obj1, $obj2, $obj3), 'equal_obj') ); + +echo "Done"; +?> +--EXPECT-- +*** Testing array_group() : basic functionality *** +-- With an integer array for < -- +array(1) { + [0]=> + array(3) { + [0]=> + int(1) + [1]=> + int(2) + [2]=> + int(3) + } +} +-- With an integer array for == -- +array(3) { + [0]=> + array(1) { + [0]=> + int(1) + } + [1]=> + array(1) { + [0]=> + int(2) + } + [2]=> + array(1) { + [0]=> + int(3) + } +} +-- With an empty array for == -- +array(0) { +} +-- With a singleton integer array for == -- +array(1) { + [0]=> + array(1) { + [0]=> + int(1) + } +} +-- With an integer array of objects for == -- +array(2) { + [0]=> + array(2) { + [0]=> + object(stdClass)#1 (2) { + ["id"]=> + int(3) + ["name"]=> + string(3) "foo" + } + [1]=> + object(stdClass)#2 (2) { + ["id"]=> + int(4) + ["name"]=> + string(3) "foo" + } + } + [1]=> + array(1) { + [0]=> + object(stdClass)#3 (2) { + ["id"]=> + int(5) + ["name"]=> + string(3) "baz" + } + } +} +Done diff --git a/ext/standard/tests/array/array_group_incr_subseqs.phpt b/ext/standard/tests/array/array_group_incr_subseqs.phpt new file mode 100644 index 0000000000..48b8575c69 --- /dev/null +++ b/ext/standard/tests/array/array_group_incr_subseqs.phpt @@ -0,0 +1,53 @@ +--TEST-- +Test array_group() function : increasing subsequences +--FILE-- + +--EXPECT-- +*** Testing array_group() : increasing subsequences *** +array(4) { + [0]=> + array(4) { + [0]=> + int(1) + [1]=> + int(2) + [2]=> + int(2) + [3]=> + int(3) + } + [1]=> + array(2) { + [0]=> + int(1) + [1]=> + int(2) + } + [2]=> + array(3) { + [0]=> + int(0) + [1]=> + int(4) + [2]=> + int(5) + } + [3]=> + array(1) { + [0]=> + int(2) + } +} +Done --Apple-Mail=_1ABD9816-B5A0-4B20-BAF3-1E20B2BC4C08 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii


On = 30.5.2023, at 13:34, Boro Sitnikovski <buritomath@gmail.com> = wrote:

Hello = all,

As per the How To Create an = RFC instructions, I am sending this e-mail in order to get your = feedback on my proposal.

I propose introducing = a function to PHP core named `array_group`. This function takes an array = and a function and returns an array that contains arrays - groups of = consecutive elements. This is very similar to Haskell's `groupBy` function.

For some = background as to why - usually, when people want to do grouping in PHP, = they use hash maps, so something = like:

```
<?php
$arra= y =3D [
[ 'id' =3D> 1, 'value' =3D> 'foo' = ],
= [ 'id' =3D> 1, 'value' =3D> 'bar' ],
[ 'id' = =3D> 2, 'value' =3D> 'baz' = ],
];

$groups =3D = [];
foreach ( $array as $element ) {
    = $groups[ $element['id'] ][] =3D = $element;
}

var_dump( $groups = );
```

This can now be achieved = as follows (not preserving = keys):

```
<?php
$arr= ay =3D [
[ 'id' =3D> 1, 'value' =3D> 'foo' = ],
= [ 'id' =3D> 1, 'value' =3D> 'bar' ],
[ 'id' = =3D> 2, 'value' =3D> 'baz' = ],
];

$groups =3D array_group( = $array, function( $a, $b ) {
return $a['id'] =3D=3D = $b['id'];
} = );
```

The disadvantage of the = first approach is that we are only limited to using equality check, and = we cannot group by, say, `<` or other functions.
Similarly, = the advantage of the first approach is that the keys are preserved, and = elements needn't be consecutive.

In any case, I = think a utility function such as `array_group` will be widely = useful.

Please find attached a patch with a = proposed implementation. Curious about your = feedback.

Best,


<array_group.patch><= /span>

= --Apple-Mail=_1ABD9816-B5A0-4B20-BAF3-1E20B2BC4C08-- --Apple-Mail=_D1DF8B65-921D-4993-B3DE-7EC9B12F6227--