Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:76266 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 98960 invoked from network); 30 Jul 2014 07:46:30 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 Jul 2014 07:46:30 -0000 Authentication-Results: pb1.pair.com smtp.mail=ajf@ajf.me; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ajf@ajf.me; sender-id=pass Received-SPF: pass (pb1.pair.com: domain ajf.me designates 192.64.116.200 as permitted sender) X-PHP-List-Original-Sender: ajf@ajf.me X-Host-Fingerprint: 192.64.116.200 imap1-2.ox.privateemail.com Received: from [192.64.116.200] ([192.64.116.200:36731] helo=imap1-2.ox.privateemail.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 9C/39-29261-4D2A8D35 for ; Wed, 30 Jul 2014 03:46:29 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.privateemail.com (Postfix) with ESMTP id 52478B00087; Wed, 30 Jul 2014 03:46:42 -0400 (EDT) X-Virus-Scanned: Debian amavisd-new at imap1.ox.privateemail.com Received: from mail.privateemail.com ([127.0.0.1]) by localhost (imap1.ox.privateemail.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id S7wF5hClXr3X; Wed, 30 Jul 2014 03:46:42 -0400 (EDT) Received: from [192.168.0.15] (unknown [90.210.122.167]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.privateemail.com (Postfix) with ESMTPSA id 67D5CB00085; Wed, 30 Jul 2014 03:46:40 -0400 (EDT) Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Content-Type: text/plain; charset=windows-1252 X-Priority: 3 In-Reply-To: <204B9BC6-2DA9-452A-854B-7EA260A19543@gmail.com> Date: Wed, 30 Jul 2014 08:46:37 +0100 Cc: yohgaki@ohgaki.net, PHP Internals Content-Transfer-Encoding: quoted-printable Message-ID: <59D7EB3B-365A-4E56-A1A7-5654C5C9E86C@ajf.me> References: <633025718.351649.1406699307462.open-xchange@app2.ox.registrar-servers.com> <204B9BC6-2DA9-452A-854B-7EA260A19543@gmail.com> To: Tjerk Meesters X-Mailer: Apple Mail (2.1878.6) Subject: Re: [PHP-DEV] signed long hash index for PHP7? From: ajf@ajf.me (Andrea Faulds) On 30 Jul 2014, at 07:50, Tjerk Meesters = wrote: >> That would make sense, but doesn't solve all edge cases as your = maximum array >> index is still more than 2 times the largest positive integer on = 32-bit. >=20 > Is that by design, a bug or something else entirely? Could you explain = this edge case with some code? On a 32-bit platform, the maximum signed long is 0x7FFFFFFF, but the = maximum unsigned long is 0xFFFFFFFF, slightly more than twice as big. For example, this does what you=92d expect on my machine (OS X 64-bit = Intel Core i5): andreas-air:~ ajf$ php -r '$x =3D [0xFFFFFFFF =3D> 1]; $x[] =3D 2; = var_dump($x);' array(2) { [4294967295]=3D> int(1) [4294967296]=3D> int(2) } On my 32-bit Ubuntu VM (which I use precisely to test this kind of issue = when working on bigints), however, it wraps around: ajf@andrea-VirtualBox:~$ php -r '$x =3D [0xFFFFFFFF =3D> 1]; $x[] =3D 2; = var_dump($x);' array(2) { [-1]=3D> int(1) [0]=3D> int(2) } I think we should probably use an unsigned long internally, but prevent = negative values. > Forbidding negative indices is a bit harsh and imho quite unnecessary; Actually, I missed the bit of your email suggesting treating them as = strings the first time I read it. I=92d be fine with that. > turning =93out of range=94 indices into strings should work just fine = afaict. Is there a reason why it shouldn=92t? Well=85 there is one issue. Basically, some array functions treat = integer and string keys completely differently.=20 > A compromise could be to allow string keys that would otherwise have = converted into a negative integer, but disallow negative int/float = explicitly. It=92d be a complete BC break, but we could make negative indices work = like they do in Python and grab the (length + index)th item (i.e. -1 = returns item 4 in a list of 5, -2 returns item 3, and so on). However, = because our arrays are weird semi-indexed semi-hashmap things, this = probably isn=92t good, as it=92d prevent you from using strings like = =93-1=94 as keys. Alas, I can dream. To actually respond to your suggestion, I don=92t like the idea of = blocking -1 but allowing =93-1=94. In PHP, numeric strings, integers and = floats are supposed to be equivalent, and I=92m already unhappy that = large integer indexes and large numeric string indexes work differently. = Whatever we do, I=92d like PHP 7=92s arrays to treat integer, float and = numeric string indexes consistently. Thinking about it a little more, if we use a long for indexes, we don=92t = even need to make them strings. It would fit the principle of least = astonishment IMO if any valid PHP int is a valid index and won=92t be a = string. I was going to say that negative indexes don=92t work right = internally, but then I realised they could work fine for indexing into = the buckets if we just cast them to unsigned longs internally (hence = getting the 2=92s complement representation on modern CPUs) for indexing = and hashing, but only expose signed longs to the outside world, = including through the API. So in summary, I think we should use signed longs for indexes (or at = least whatever type PHP=92s basic int is), and anything outside of the = range of one should be treated as a string. This would make numeric = strings and ints consistent, would solve all the weird overflow issues, = and is the most intuitive approach IMO. -- Andrea Faulds http://ajf.me/