Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:111750 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 29281 invoked from network); 31 Aug 2020 21:48:42 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 31 Aug 2020 21:48:42 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 3E787180538 for ; Mon, 31 Aug 2020 13:53:11 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,HTML_MESSAGE, SPF_HELO_NONE,SPF_NONE autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-mahalux.mvorisek.com (mail-mahalux.mvorisek.com [77.93.195.127]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 31 Aug 2020 13:53:09 -0700 (PDT) Received: from 69d7782c37a2 (10.228.0.130) by mail-mahalux.mvorisek.com (10.228.0.4) with Microsoft SMTP Server (TLS); Mon, 31 Aug 2020 22:53:02 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="=_070c973d94e4fb770d6e89c09a49209c" Date: Mon, 31 Aug 2020 22:53:02 +0200 To: =?UTF-8?Q?Riikka_Kalliom=C3=A4ki?= Cc: PHP internals In-Reply-To: References: Message-ID: <5330022558dfa7cbd154e3c765c509478fcb84084de8987274e24aa1ff017b85@mahalux.com> X-Mailer: SAP NetWeaver 7.03 Subject: Re: [PHP-DEV] Request for couple memory optimized array improvements From: vorismi3@fel.cvut.cz (=?UTF-8?Q?Michael_Vo=C5=99=C3=AD=C5=A1ek_-_=C4=8CVUT_FEL?=) --=_070c973d94e4fb770d6e89c09a49209c Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8; format=flowed Optimizing foreach (array_keys($arr) as $k) is very important, not only because of memory, but because of speed when not all elements needs to be iterated, like: foreach (array_keys($arr) as $k) { if ($k some condition) { break; } } please, can someone send a PR for this? With kind regards / Mit freundlichen Grüßen / S přátelským pozdravem, Michael Voříšek On 31 Aug 2020 20:13, Riikka Kalliomäki wrote: > Hello, > > For the past couple years I've been working with a PHP code base that > at times deals with quite large payloads memory wise. This has made me > pay more attention to some array operations in PHP that are rather > frustrating to deal with in userland PHP, but could perhaps be > optimized more in PHP core. > > A common pattern that I've seen that could dearly use PHP internal > optimization, if possible, would be: > > foreach (array_keys($array) as $key) { > } > > The problem with this pattern, of course, is the fact that it > needlessly duplicates the array passed to foreach, as can be seen from > this example: https://3v4l.org/MRSv6 > > I would be ever so grateful, if it would be possible to improve the > PHP engine to detect that fully qualified function name array_keys is > used with foreach, in which case it would simply perform a foreach > over the keys of the array without creating a copy. Optimizing this > wouldn't even require any userland changes. Not sure if the PHP engine > makes it at all feasible, though. > > Of course, you could just be using something like this in code: > > foreach ($array as $key => $_) { > } > > Which has actually become a pattern for us in some memory sensitive > places, but using array_keys inside foreach is a very intuitive and > common approach and doesn't require the unused variable, so it would > be nice to see the usage enshired. > > Another similar problem with creating array copies is the detection of > "indexed" arrays (as opposed to associative arrays). Particularly when > dealing with JSON, it's a common need to detect if an array has keys > from 0 to n-1 and in that order. My understanding is that at least in > some cases this would be trivial and fast to tell internally in PHP, > but the functionality is not exposed to userland. > > Current common practices include for example: > > array_keys($array) === range(0, count($array) - 1) > > Memory optimized way of dealing with this is via foreach, but it's > quite cumbersome and again, you must not use array_keys in the > foreach. The following example demonstrates that the worst case > scenario triples the memory usage using range: https://3v4l.org/FiWdk > > Interestingly, using "array_values($array) === $array" is the fastest > and most optimized way in best case scenarios, since php just returns > the array itself in cases it's "packed" and "without holes". However, > this could get hairy in worst case scenarios since it starts comparing > the values as well. > > So, it would be nice to have a core PHP function implementing this > test, because the userland way of doing it is unnecessarily > unoptimized. I don't know what the function should be called. In our > code base the function is called is_indexed_array, but PHP doesn't > really have a standard term for this, afaik. > > I regret my lack of C skills so I can't really propose > implementations, but I would be truly appreciative if these > suggestions would gain some traction. --=_070c973d94e4fb770d6e89c09a49209c--