Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:120451
Return-Path: <buritomath@gmail.com>
Delivered-To: mailing list internals@lists.php.net
Received: (qmail 26375 invoked from network); 30 May 2023 11:35:03 -0000
Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5)
  by pb1.pair.com with SMTP; 30 May 2023 11:35:03 -0000
Received: from php-smtp4.php.net (localhost [127.0.0.1])
	by php-smtp4.php.net (Postfix) with ESMTP id AF0EB180505
	for <internals@lists.php.net>; Tue, 30 May 2023 04:35:02 -0700 (PDT)
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net
X-Spam-Level: 
X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,
	RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,
	T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2
X-Spam-ASN: AS15169 209.85.128.0/17
X-Spam-Virus: No
X-Envelope-From: <buritomath@gmail.com>
Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by php-smtp4.php.net (Postfix) with ESMTPS
	for <internals@lists.php.net>; Tue, 30 May 2023 04:35:02 -0700 (PDT)
Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-9700219be87so802797766b.1
        for <internals@lists.php.net>; Tue, 30 May 2023 04:35:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1685446500; x=1688038500;
        h=to:date:message-id:subject:mime-version:from:from:to:cc:subject
         :date:message-id:reply-to;
        bh=kXPn+DPb5QYnKvXViBY9rrV6NqRF6qMHO9iFN+UXyp4=;
        b=bOUVwkyOX8SspDafngRB3jRgVNtxVx0lyfcFjXdVfmSxGW+2PBG6fhqsRSZC2BJ6lr
         S0iBa5XoWKsa+9kEuJJJAvkw7PZ0xNGuLjqPxaTprZ2Aaw/QU61aLEFMszNuVloaVtiW
         0aSKVTnx1sbiKZi8HWUe+G1IuKMUlJl5a0Tjj0EP8go05JCIRJ2Hww9RXoj3yv8cGkMC
         MdCG2DDsAp4LgCgnsRU0UmUtJ+Qi64ujpqnOAuRqk0yxlYRJ57MxFIL7XulhgMs4+Vto
         13QmhKiino/+WUOuLORGWfVjtsgLB0qVwKIeYbFUCtwUu0f4d5VG2cfP/ADGyru1CZ/R
         obNw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1685446500; x=1688038500;
        h=to:date:message-id:subject:mime-version:from:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=kXPn+DPb5QYnKvXViBY9rrV6NqRF6qMHO9iFN+UXyp4=;
        b=kaj/vnHsOiTr8lgKNVWnOv/znqLiI8AA3iKsdmOf0GEaVqmARmqI2LhPnlsXR8NxO9
         egKkGEOgJK1AfM20AMMKc3yfG9GtLxmMt9tF8mgcpG52peidH1B2zIFFiNivmu/Zv3Fm
         Wp0wF6OveeOiv5IunJJth/hZp1oVFWBDx0hGE63Mch6fSMdELjuK+0de072htTLkGFH3
         6h/peAAwd/+vgNb4qf/5KtWaNLGpZKnPZRtbmpuKj8mcSaY4Dny+xE+C+2paUkSrOzXu
         pbJfOwJXhBsQFQJgogxewnQ4BCJkc4OtWThCVRXLsJ6AmmhdeDiHBCdcdIQCWeZ7TzTa
         69LQ==
X-Gm-Message-State: AC+VfDxGwHVFIzFDKgnLJSBLE7UJ7j2kKCd1qKEY9auw6DAuPkIIJ4xl
	TtFoCTbUwa/zCYITKcw5HpAXfSDJ3Ro=
X-Google-Smtp-Source: ACHHUZ6eKMKQcHsxQLXo9pHM19cq83rhNsXIMEmi+2YwqR2Asmq6fMVuL2xo8/QJqKUdnGTtKV1XtQ==
X-Received: by 2002:a17:906:58c4:b0:969:bea8:e1c7 with SMTP id e4-20020a17090658c400b00969bea8e1c7mr2012985ejs.37.1685446500407;
        Tue, 30 May 2023 04:35:00 -0700 (PDT)
Received: from smtpclient.apple ([46.217.211.121])
        by smtp.gmail.com with ESMTPSA id p13-20020a170906838d00b00965d4b2bd4csm7353427ejx.141.2023.05.30.04.34.59
        for <internals@lists.php.net>
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Tue, 30 May 2023 04:34:59 -0700 (PDT)
Content-Type: multipart/alternative;
	boundary="Apple-Mail=_3B4D8BC6-6256-4AB6-BC9D-B71E2D803898"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\))
Message-ID: <289E585B-EF8B-4B17-89BE-BE8295FD9FE1@gmail.com>
Date: Tue, 30 May 2023 13:34:49 +0200
To: internals@lists.php.net
X-Mailer: Apple Mail (2.3731.600.7)
Subject: [RFC] [Discussion] Add new function `array_group`
From: buritomath@gmail.com (Boro Sitnikovski)

--Apple-Mail=_3B4D8BC6-6256-4AB6-BC9D-B71E2D803898
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Hello all,

As per the How To Create an RFC <https://wiki.php.net/rfc/howto> =
instructions, I am sending this e-mail in order to get your feedback on =
my proposal.

I propose introducing a function to PHP core named `array_group`. This =
function takes an array and a function and returns an array that =
contains arrays - groups of consecutive elements. This is very similar =
to Haskell's `groupBy` function =
<https://hackage.haskell.org/package/groupBy-0.1.0.0/docs/Data-List-GroupB=
y.html>.

For some background as to why - usually, when people want to do grouping =
in PHP, they use hash maps, so something like:

```
<?php
$array =3D [
	[ 'id' =3D> 1, 'value' =3D> 'foo' ],
	[ 'id' =3D> 1, 'value' =3D> 'bar' ],
	[ 'id' =3D> 2, 'value' =3D> 'baz' ],
];

$groups =3D [];
foreach ( $array as $element ) {
    $groups[ $element['id'] ][] =3D $element;
}

var_dump( $groups );
```

This can now be achieved as follows (not preserving keys):

```
<?php
$array =3D [
	[ 'id' =3D> 1, 'value' =3D> 'foo' ],
	[ 'id' =3D> 1, 'value' =3D> 'bar' ],
	[ 'id' =3D> 2, 'value' =3D> 'baz' ],
];

$groups =3D array_group( $array, function( $a, $b ) {
	return $a['id'] =3D=3D $b['id'];
} );
```

The disadvantage of the first approach is that we are only limited to =
using equality check, and we cannot group by, say, `<` or other =
functions.
Similarly, the advantage of the first approach is that the keys are =
preserved, and elements needn't be consecutive.

In any case, I think a utility function such as `array_group` will be =
widely useful.

Please find attached a patch with a proposed implementation. Curious =
about your feedback.

Best,

Boro Sitnikovski <https://people.php.net/bor0>

=EF=BF=BC=

--Apple-Mail=_3B4D8BC6-6256-4AB6-BC9D-B71E2D803898
Content-Type: multipart/mixed;
	boundary="Apple-Mail=_7312EF78-3546-44DC-B65E-7C2DCBF69A36"


--Apple-Mail=_7312EF78-3546-44DC-B65E-7C2DCBF69A36
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; =
charset=3Dus-ascii"></head><body style=3D"overflow-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;"><div>Hello =
all,</div><div><br></div><div>As per the&nbsp;<a =
href=3D"https://wiki.php.net/rfc/howto">How To Create an =
RFC</a>&nbsp;instructions, I am sending this e-mail in order to get your =
feedback on my proposal.</div><div><br></div><div>I propose introducing =
a function to PHP core named `array_group`. This function takes an array =
and a function and returns an array that contains arrays - groups of =
consecutive elements. This is very similar to Haskell's&nbsp;<a =
href=3D"https://hackage.haskell.org/package/groupBy-0.1.0.0/docs/Data-List=
-GroupBy.html">`groupBy` function</a>.</div><div><br></div><div>For some =
background as to why - usually, when people want to do grouping in PHP, =
they use hash maps, so something =
like:</div><div><br></div><div>```</div><div><div>&lt;?php</div><div>$arra=
y =3D [</div><div><span class=3D"Apple-tab-span" style=3D"white-space: =
pre;">	</span>[ 'id' =3D&gt; 1, 'value' =3D&gt; 'foo' =
],</div><div><span class=3D"Apple-tab-span" style=3D"white-space: pre;">	=
</span>[ 'id' =3D&gt; 1, 'value' =3D&gt; 'bar' ],</div><div><span =
class=3D"Apple-tab-span" style=3D"white-space: pre;">	</span>[ 'id' =
=3D&gt; 2, 'value' =3D&gt; 'baz' =
],</div><div>];</div><div><br></div><div>$groups =3D =
[];</div><div>foreach ( $array as $element ) {</div><div>&nbsp; &nbsp; =
$groups[ $element['id'] ][] =3D =
$element;</div><div>}</div><div><br></div><div>var_dump( $groups =
);</div></div><div>```</div><div><br></div><div>This can now be achieved =
as follows (not preserving =
keys):</div><div><br></div><div>```</div><div><div>&lt;?php</div><div>$arr=
ay =3D [</div><div><span class=3D"Apple-tab-span" style=3D"white-space: =
pre;">	</span>[ 'id' =3D&gt; 1, 'value' =3D&gt; 'foo' =
],</div><div><span class=3D"Apple-tab-span" style=3D"white-space: pre;">	=
</span>[ 'id' =3D&gt; 1, 'value' =3D&gt; 'bar' ],</div><div><span =
class=3D"Apple-tab-span" style=3D"white-space: pre;">	</span>[ 'id' =
=3D&gt; 2, 'value' =3D&gt; 'baz' =
],</div><div>];</div><div><br></div><div>$groups =3D array_group( =
$array, function( $a, $b ) {</div><div><span class=3D"Apple-tab-span" =
style=3D"white-space: pre;">	</span>return $a['id'] =3D=3D =
$b['id'];</div><div>} =
);</div><div>```</div></div><div><br></div><div>The disadvantage of the =
first approach is that we are only limited to using equality check, and =
we cannot group by, say, `&lt;` or other functions.</div><div>Similarly, =
the advantage of the first approach is that the keys are preserved, and =
elements needn't be consecutive.</div><div><br></div><div>In any case, I =
think a utility function such as `array_group` will be widely =
useful.</div><div><br></div><div>Please find attached a patch with a =
proposed implementation. Curious about your =
feedback.</div><div><br></div><div>Best,</div><div><br></div><div><a =
href=3D"https://people.php.net/bor0">Boro =
Sitnikovski</a></div><div><br></div></body></html>=

--Apple-Mail=_7312EF78-3546-44DC-B65E-7C2DCBF69A36
Content-Disposition: attachment;
	filename=array_group.patch
Content-Type: application/octet-stream;
	x-unix-mode=0644;
	name="array_group.patch"
Content-Transfer-Encoding: 7bit

diff --git a/Zend/zend_hash.h b/Zend/zend_hash.h
index 5726c8a919..c462de2850 100644
--- a/Zend/zend_hash.h
+++ b/Zend/zend_hash.h
@@ -1087,6 +1087,10 @@ static zend_always_inline void *zend_hash_get_current_data_ptr_ex(HashTable *ht,
 	_ZEND_HASH_FOREACH_VAL(ht); \
 	_val = _z;
 
+#define ZEND_HASH_FOREACH_VAL_FROM(ht, _val, _from) \
+	ZEND_HASH_FOREACH_FROM(ht, 0, _from); \
+	_val = _z;
+
 #define ZEND_HASH_REVERSE_FOREACH_VAL(ht, _val) \
 	_ZEND_HASH_REVERSE_FOREACH_VAL(ht); \
 	_val = _z;
diff --git a/ext/standard/array.c b/ext/standard/array.c
index 46c2c882b8..171482113a 100644
--- a/ext/standard/array.c
+++ b/ext/standard/array.c
@@ -6394,6 +6394,75 @@ PHP_FUNCTION(array_map)
 }
 /* }}} */
 
+/* {{{ Groups consecutive elements from the array via the callback. */
+PHP_FUNCTION(array_group)
+{
+	zval *array;
+	zend_fcall_info fci;
+	zend_fcall_info_cache fci_cache = empty_fcall_info_cache;
+
+	zval args[2];
+	zval *prev_val;
+	zval *curr_val;
+	zval chunk;
+	zval retval;
+
+	ZEND_PARSE_PARAMETERS_START(2, 2)
+		Z_PARAM_ARRAY(array)
+		Z_PARAM_FUNC(fci, fci_cache)
+	ZEND_PARSE_PARAMETERS_END();
+
+	if (zend_hash_num_elements(Z_ARRVAL_P(array)) == 0) {
+		RETVAL_EMPTY_ARRAY();
+		return;
+	}
+
+	// The array is guaranteed to have at least one element.
+	prev_val = ZEND_HASH_ELEMENT(Z_ARRVAL_P(array), 0);
+
+	// Generate the initial group.
+	array_init(&chunk);
+	zend_hash_next_index_insert_new(Z_ARRVAL_P(&chunk), prev_val);
+
+	array_init(return_value);
+
+	fci.retval = &retval;
+	fci.param_count = 2;
+
+	ZEND_HASH_FOREACH_VAL_FROM(Z_ARRVAL_P(array), curr_val, 1) {
+		ZVAL_COPY(&args[0], prev_val);
+		ZVAL_COPY(&args[1], curr_val);
+		fci.params = args;
+
+		if (zend_call_function(&fci, &fci_cache) == SUCCESS && Z_TYPE(retval) != IS_UNDEF) {
+			int retval_true;
+
+			zval_ptr_dtor(&args[1]);
+			zval_ptr_dtor(&args[0]);
+
+			retval_true = zend_is_true(&retval);
+
+			zval_ptr_dtor(&retval);
+
+			// Perform grouping - add the current group and create a new one.
+			if (!retval_true) {
+				zend_hash_next_index_insert_new(Z_ARRVAL_P(return_value), &chunk);
+				array_init(&chunk);
+			}
+
+			zend_hash_next_index_insert_new(Z_ARRVAL_P(&chunk), curr_val);
+		} else {
+			zval_ptr_dtor(&args[1]);
+			zval_ptr_dtor(&args[0]);
+			RETURN_NULL();
+		}
+	} ZEND_HASH_FOREACH_END();
+
+	// Add the last group.
+	zend_hash_next_index_insert_new(Z_ARRVAL_P(return_value), &chunk);
+}
+/* }}} */
+
 /* {{{ Checks if the given key or index exists in the array */
 PHP_FUNCTION(array_key_exists)
 {
diff --git a/ext/standard/basic_functions.stub.php b/ext/standard/basic_functions.stub.php
index effb05ff9f..a6a8408dc9 100755
--- a/ext/standard/basic_functions.stub.php
+++ b/ext/standard/basic_functions.stub.php
@@ -1870,6 +1870,8 @@ function array_filter(array $array, ?callable $callback = null, int $mode = 0):
 
 function array_map(?callable $callback, array $array, array ...$arrays): array {}
 
+function array_group(array $array, callable $callback): array {}
+
 /**
  * @param string|int $key
  * @compile-time-eval
diff --git a/ext/standard/basic_functions_arginfo.h b/ext/standard/basic_functions_arginfo.h
index 5612ee2186..40600762f7 100644
--- a/ext/standard/basic_functions_arginfo.h
+++ b/ext/standard/basic_functions_arginfo.h
@@ -342,6 +342,11 @@ ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_array_map, 0, 2, IS_ARRAY, 0)
 	ZEND_ARG_VARIADIC_TYPE_INFO(0, arrays, IS_ARRAY, 0)
 ZEND_END_ARG_INFO()
 
+ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_array_group, 0, 2, IS_ARRAY, 0)
+	ZEND_ARG_TYPE_INFO(0, array, IS_ARRAY, 0)
+	ZEND_ARG_TYPE_INFO(0, callback, IS_CALLABLE, 0)
+ZEND_END_ARG_INFO()
+
 ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_array_key_exists, 0, 2, _IS_BOOL, 0)
 	ZEND_ARG_INFO(0, key)
 	ZEND_ARG_TYPE_INFO(0, array, IS_ARRAY, 0)
@@ -2292,6 +2297,7 @@ ZEND_FUNCTION(array_product);
 ZEND_FUNCTION(array_reduce);
 ZEND_FUNCTION(array_filter);
 ZEND_FUNCTION(array_map);
+ZEND_FUNCTION(array_group);
 ZEND_FUNCTION(array_key_exists);
 ZEND_FUNCTION(array_chunk);
 ZEND_FUNCTION(array_combine);
@@ -2915,6 +2921,7 @@ static const zend_function_entry ext_functions[] = {
 	ZEND_FE(array_reduce, arginfo_array_reduce)
 	ZEND_FE(array_filter, arginfo_array_filter)
 	ZEND_FE(array_map, arginfo_array_map)
+	ZEND_FE(array_group, arginfo_array_group)
 	ZEND_SUPPORTS_COMPILE_TIME_EVAL_FE(array_key_exists, arginfo_array_key_exists)
 	ZEND_FALIAS(key_exists, array_key_exists, arginfo_key_exists)
 	ZEND_SUPPORTS_COMPILE_TIME_EVAL_FE(array_chunk, arginfo_array_chunk)
diff --git a/ext/standard/tests/array/array_group_basic.phpt b/ext/standard/tests/array/array_group_basic.phpt
new file mode 100644
index 0000000000..9db0e7afe3
--- /dev/null
+++ b/ext/standard/tests/array/array_group_basic.phpt
@@ -0,0 +1,130 @@
+--TEST--
+Test array_group() function : basic functionality
+--FILE--
+<?php
+echo "*** Testing array_group() : basic functionality ***\n";
+
+function less_than( $a, $b ) {
+	return $a < $b;
+}
+
+function equal( $a, $b ) {
+	return $a == $b;
+}
+
+function equal_obj( $a, $b ) {
+	return $a->name == $b->name;
+}
+
+$arr1 = array(1, 2, 3);
+
+echo "-- With an integer array for < --\n";
+var_dump( array_group($arr1, 'less_than') );
+
+echo "-- With an integer array for == --\n";
+var_dump( array_group($arr1, 'equal') );
+
+echo "-- With an empty integer array for == --\n";
+var_dump( array_group($arr1, 'equal') );
+
+echo "-- With a singleton integer array for == --\n";
+var_dump( array_group(array(1), 'equal') );
+
+$obj1 = (object)array('id'=>3,'name'=>'foo');
+$obj2 = (object)array('id'=>4,'name'=>'foo');
+$obj3 = (object)array('id'=>5,'name'=>'baz');
+
+echo "-- With a singleton integer array for == --\n";
+var_dump( array_group(array($obj1, $obj2, $obj3), 'equal_obj') );
+
+echo "Done";
+?>
+--EXPECT--
+*** Testing array_group() : basic functionality ***
+-- With an integer array for < --
+array(1) {
+  [0]=>
+  array(3) {
+    [0]=>
+    int(1)
+    [1]=>
+    int(2)
+    [2]=>
+    int(3)
+  }
+}
+-- With an integer array for == --
+array(3) {
+  [0]=>
+  array(1) {
+    [0]=>
+    int(1)
+  }
+  [1]=>
+  array(1) {
+    [0]=>
+    int(2)
+  }
+  [2]=>
+  array(1) {
+    [0]=>
+    int(3)
+  }
+}
+-- With an empty integer array for == --
+array(3) {
+  [0]=>
+  array(1) {
+    [0]=>
+    int(1)
+  }
+  [1]=>
+  array(1) {
+    [0]=>
+    int(2)
+  }
+  [2]=>
+  array(1) {
+    [0]=>
+    int(3)
+  }
+}
+-- With a singleton integer array for == --
+array(1) {
+  [0]=>
+  array(1) {
+    [0]=>
+    int(1)
+  }
+}
+-- With a singleton integer array for == --
+array(2) {
+  [0]=>
+  array(2) {
+    [0]=>
+    object(stdClass)#1 (2) {
+      ["id"]=>
+      int(3)
+      ["name"]=>
+      string(3) "foo"
+    }
+    [1]=>
+    object(stdClass)#2 (2) {
+      ["id"]=>
+      int(4)
+      ["name"]=>
+      string(3) "foo"
+    }
+  }
+  [1]=>
+  array(1) {
+    [0]=>
+    object(stdClass)#3 (2) {
+      ["id"]=>
+      int(5)
+      ["name"]=>
+      string(3) "baz"
+    }
+  }
+}
+Done

--Apple-Mail=_7312EF78-3546-44DC-B65E-7C2DCBF69A36
Content-Transfer-Encoding: 7bit
Content-Type: text/html;
	charset=us-ascii

<html><head><meta http-equiv="content-type" content="text/html; charset=us-ascii"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"></body></html>
--Apple-Mail=_7312EF78-3546-44DC-B65E-7C2DCBF69A36--

--Apple-Mail=_3B4D8BC6-6256-4AB6-BC9D-B71E2D803898--