Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:109793 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 84044 invoked from network); 23 Apr 2020 00:19:17 -0000 Received: from unknown (HELO localhost.localdomain) (76.75.200.58) by pb1.pair.com with SMTP; 23 Apr 2020 00:19:17 -0000 To: internals@lists.php.net References: Date: Thu, 23 Apr 2020 00:51:04 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Posted-By: 46.59.72.204 Subject: Re: Any interest in a list type? From: ajf@ajf.me (Andrea Faulds) Message-ID: Hello again everyone, Andrea Faulds wrote: > Hello again, > > Andrea Faulds wrote: >> Hi, >> >> Matthew Brown wrote: >>> I imagine such a "list" type would be a subtype of "array" – everywhere >>> that array was accepted, a list would be also, and it would have the >>> same >>> copy-on-write behaviour. >> >> IIRC with the modern implementation of arrays in the Zend Engine, you >> can check that an array has no string keys, has keys in the right >> order, and possibly even that they are all numbered 0 through >> (count($arr) - 1) without gaps. > > Sorry, what I meant to say was “you can check [quickly in O(1) time] > that an array has no string keys [etc]”. This is because of the Zend > Engine's “packed arrays”. > > Thanks, > Andrea Since it's pretty simple, I decided to write a proof-of-concept implementation of such a check. For simplicity's sake I haven't implemented a type declaration right now, just an is_list() function, but it could easily be extended to one. The implementation can be found at https://github.com/php/php-src/compare/master...hikari-no-yume:is_list (and here's a direct commit link to the current version as of this writing for the sake of historical record: https://github.com/php/php-src/commit/f6fc94fae8d97743d23d276d66071dac1d240a05) As I hoped, it can in several common cases give a TRUE result in O(1): $ sapi/cli/php -r 'var_dump(is_list([]));' No elements bool(true) $ sapi/cli/php -r 'var_dump(is_list([3, 2, 1]));' Packed and without holes! bool(true) $ sapi/cli/php -r '$x = [3,2,1]; sort($x); var_dump(is_list($x));' Packed and without holes! bool(true)  $ sapi/cli/php -r '$x = [1,2,3,4]; unset($x[3]); var_dump($x, is_list($x));' Packed and without holes! bool(true) Unfortunately, it turns out that it can't always be fast! :( The problem is that while an array can have the “packed” and “without holes” flags IFF it has no string keys and all keys are in order without gaps, that does not mean the reverse is true: that if an array lacks those flags, it must have string keys, out-of-order keys or gaps. The result is there are several cases where we must foreach over the array to check if it's a “list”, which unfortunately means as bad as O(n) time complexity depending on the array content. For example we can't be sure if it has string keys without foreaching over it: $ sapi/cli/php -r 'var_dump(is_list(["foo"=>"bar","bar"=>"baz"]));' Foreaching Hit string key bool(false) I want to say that this used to be possible in the common case without foreaching due to some flag on the HashTable, I think Dmitry added that. But it might have been a proposed patch that didn't get in, or it was reverted. If this is check will only be used as a type declaration on functions/return-types/properties and not as a general-purpose is_list() function, we might decide that the O(n) worst-case is acceptable so long as it only affects cases where the array is not a valid list, i.e. when a TypeError would be thrown, because in production code, values being type-checked should only rarely have the wrong type, so it should only cause slowness during debugging. It is not like we are trying to make throwing type errors fast… at least so I assume? Unfortunately, this O(n) worst-case applies even to some valid lists. For example: $ sapi/cli/php -r '$x = [0=>1,2=>3,1=>2]; unset($x[2]); var_dump($x, is_list($x));' Foreaching No irregularities encountered, true by slowest path array(2) { [0]=> int(1) [1]=> int(2) } bool(true) $ sapi/cli/php -r '$x = [1,2,3]; $x["foo"] = "bar"; unset($x["foo"]); var_dump(is_list($x));' Foreaching No irregularities encountered, true by slowest path bool(true) Maybe these kinds of cases are uncommon enough that we could deem the slowness acceptable. We could also use list type-checks as an opportunity to optimise valid lists that aren't already “packed” to be “packed” behind-the-scenes, speeding up subsequent checks. Also, it may be the case that the Zend Engine's hashtable implementation could be changed slightly to allow O(1) checks in more cases. Anyway, I hope this was interesting and can help inform discussion and perhaps provide inspiration! Thanks, Andrea