Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:96739 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 18382 invoked from network); 5 Nov 2016 21:50:41 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 5 Nov 2016 21:50:41 -0000 X-Host-Fingerprint: 2.216.52.233 unknown Received: from [2.216.52.233] ([2.216.52.233:11829] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D4/FD-07826-0345E185 for ; Sat, 05 Nov 2016 16:50:41 -0500 Message-ID: To: internals@lists.php.net References: Date: Sat, 5 Nov 2016 21:50:36 +0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Posted-By: 2.216.52.233 Subject: Re: [PHP-DEV] [VOTE] Convert numeric keys in object/array casts From: ajf@ajf.me (Andrea Faulds) Hi Yasuo, Yasuo Ohgaki wrote: > Hi Andrea, > > On Sun, Nov 6, 2016 at 1:30 AM, Andrea Faulds wrote: >> Two weeks have passed since this RFC was put to discussion here, and no >> significant issues have cropped up. Therefore, I'm going to put it to a vote >> for inclusion in PHP 7.2. >> >> Voting starts today, 2016-11-05, and ends the Monday after next, 2016-11-14. >> >> The RFC and voting widget can be found here: >> https://wiki.php.net/rfc/convert_numeric_keys_in_object_array_casts >> >> It's a normal 2/3 majority required vote. > > In short, array int index is converted to string numeric name > property, vice versa. Correct? Yes, that describes it succinctly. > At first, I thought this is good idea, but it seems we are better to > allow "string integer" array key access (array_get/set_var(), > perhaps) and change other related features accordingly. We could do that, theoretically. However, that would mean that now array indexing would sometimes require looking up two different keys. In particular, it would make checking for the existence of array keys slower. But I don't think we should have to do so in the first place. Numeric string keys existing in arrays are a bug, as are integer property names existing in objects. Almost all array and object operations assume that arrays don't have numeric string keys, and objects don't have integer property names. That ought to be a safe assumption. > Currently, inaccessible value could happen in array due to "int > like string conversion to int" also. > > https://3v4l.org/EpDuo > > Line 9: > var_dump($tmp, $tmp[0], $tmp['0']); > outputs > array(4) { > [0]=> > int(5) > [1]=> > int(6) > [2]=> > int(7) > ["0"]=> > string(3) "zzz" // <== String '0' indexed element is Inaccessible > } > int(5) > int(5) // <== Only long index 0 can be accessed You may know this, but your example is fixed by this RFC. Your code is: {'0'} = 'zzz'; var_dump($obj, $obj->{0}, $obj->{'0'}); $tmp = (array)$obj; var_dump($tmp, $tmp[0], $tmp['0']); and with the patch the output is: object(stdClass)#1 (3) { ["0"]=> int(5) ["1"]=> int(6) ["2"]=> int(7) } object(stdClass)#1 (3) { ["0"]=> string(3) "zzz" ["1"]=> int(6) ["2"]=> int(7) } string(3) "zzz" string(3) "zzz" array(3) { [0]=> string(3) "zzz" [1]=> int(6) [2]=> int(7) } string(3) "zzz" string(3) "zzz" > > Either before or after RFC has pros and cons. For instance, proposed > change will require string casts for numeric property iteration, correct? No, it won't. Object property names are usually strings, and you can usually rely on that when iterating over an object. This RFC fixes three places where this wasn't the case. Unless you're referring to your proposal? > for($i=0; $i < 100; $i++) { > $str_i = (string)$i; // <== "int" key to "string" key conversion > // requires cast. > echo $obj->{$str_i}; // This kind of expression with int/int like string > // index is not allowed now, but this > // could be valid. > // https://3v4l.org/e5L1T > } There's no need for the cast there. When you do $obj->{0}, say, PHP implicitly converts this to $obj->{'0'}. This isn't changed by my patch, this is the current behaviour. Likewise, when you do $arr['0'], PHP implicitly converts this to $arr[0]. This is also the current behaviour. It's because of this behaviour that the broken conversion of objects to arrays, and vice-versa, renders some keys or properties inaccessible. > > Another cons after RFC is BC that > $arr[0] = 123; > $obj=(object)$arr; > $obj->{0} became $obj->{'0'} No, it doesn't. $obj->{0} and $obj->{'0'} behave the same currently and will continue to do so with this RFC. > Simply allowing access to "numeric string index and long index" for > _both_ array and object seems cleaner resolution for problems we have. I disagree, I think this is the most complex solution I've seen proposed. This means array key and property lookups have to do two checks, not one, and could lead to subtle bugs depending on what order this happens (what if 0 and '0' both exist with different values, which value do we return?). Unless you're proposing that we consider 0 and '0' to be distinct keys and make $arr[0] and $arr['0'] behave differently. This seems like a bad idea in the face of weak typing: PHP usually considers 0 and '0' to be the same value, and it would be very surprising if we changed this behaviour of array indexing after more than two decades. > (Object allows distinct access to 0 and '0' indexes already, so make > array allow to access 0 and '0' indexed element.) This is not true. $obj->{0} and $obj->{'0'} both look up the object property "0", and $arr[0] and $arr['0'] both look up the array key 0. > e.g > // There is no way to get string '0' index element now, so add > // array_get_var(); > $obj->{0} === $arr[0]; > $obj->{'0'} === array_get_var($arr, '0'); // Get string '0' indexed value Numeric string keys existing at all is a bug. Why should we have a function for looking them up? > // Int and string index can have distinct value > $arr[0] !== $arr['0']; > $obj->{'0'} !== $arr[0]; > $obj->{0} !== array_get_var($arr, '0'); > $obj->{0} !== $obj->{'0'}; This will break existing code which relies on PHP's weak typing and general assumption that 0 and '0' are equivalent. > > // Currently, we don't have way to set string '0' indexed element except > // converting object with string '0' index (e.g. $obj->{'0'} = 123;) > // to array. So implement array_set_var() to allow array to have > // string '0' indexed element. > array_set_var($arr, '0', 123); // Set string '0' indexed value 123 > $obj = (object)$arr; > $ojb->{'0'} === array_get_var($arr, '0'); Why should we introduce a function to create malformed arrays? > Making array accessible to "string int index" and reorganize other features > accordingly seems result in more consistent spec. > > What do you think? I don't see the merit of this proposal. PHP has worked on the principle 0 and '0' are the same key for a long time. Edge-cases where they aren't are merely that, edge-cases, and should be considered a bug. I would rather embrace the intended behaviour than make the bug into a feature. If we wanted to avoid edge-cases like this entirely, we could make everything work like arrays (integers and non-numeric string keys only), or make everything work like objects (only string keys). That's suggested under Future Scope, and I believe HHVM might do the former. But it's a larger undertaking and not something I am going to address here. Thanks for your comments. -- Andrea Faulds https://ajf.me/