Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:80933 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 80446 invoked from network); 21 Jan 2015 14:22:53 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Jan 2015 14:22:53 -0000 Authentication-Results: pb1.pair.com header.from=ben.coutu@zeyos.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=ben.coutu@zeyos.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zeyos.com designates 109.70.220.166 as permitted sender) X-PHP-List-Original-Sender: ben.coutu@zeyos.com X-Host-Fingerprint: 109.70.220.166 unknown Received: from [109.70.220.166] ([109.70.220.166:38517] helo=mx.zeyon.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 53/66-49046-A36BFB45 for ; Wed, 21 Jan 2015 09:22:52 -0500 Received: from localhost (mx.zeyon.net [127.0.0.1]) by mx.zeyon.net (Postfix) with ESMTP id E1B2C5F85C for ; Wed, 21 Jan 2015 15:22:46 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at mx.zeyon.net Received: from mx.zeyon.net ([127.0.0.1]) by localhost (mx.zeyon.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xuG0PfgqD+GB for ; Wed, 21 Jan 2015 15:22:45 +0100 (CET) Received: from cloud.zeyos.com (unknown [109.70.220.163]) by mx.zeyon.net (Postfix) with ESMTPA id DAA995F7ED; Wed, 21 Jan 2015 15:22:39 +0100 (CET) Date: Wed, 21 Jan 2015 15:22:40 +0100 To: Dmitry Stogov , Yasuo Ohgaki , Nikita Popov Cc: Rowan Collins , "internals@lists.php.net" , Xinchen Hui MIME-Version: 1.0 X-Mailer: ZeyOS Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" Message-ID: <20150121142246.E1B2C5F85C@mx.zeyon.net> Subject: Re: [PHP-DEV] Fixing strange foreach behavior. From: ben.coutu@zeyos.com (Benjamin Coutu) Hi Dmitry,=0A=0AOne could use an external pointer for the iteration and fla= g the hashtable while being iterated. In order to preserve the constraint that for-each is conceptually working o= n a copy, one would have to duplicate (zend_array_dup) the hastable everyti= me one alters it (e.g. adding/changing buckets) if (and only if) the iterat= ion flag is set. Nested for-each would also have to check for the flag and duplicate if nece= ssary. Appart from saving the space for the internal pointer it would also = mean less copy-on-write semantics kicking in than with the current implemen= tation, because duplication is only performed if the array is actually alte= red, not merely because it is traversed. Example: =0A$arr =3D $obj->arr; // property "arr" is an array=0A=0Dforeach ($arr as = $val) ...; // this will CURRENTLY copy the array, because $arr is also visi= ble through $obj->arr although this is not really necessary unless the arra= y is actually changed during iteration=0A=0ACheers,=0A=0ABen=0A=0A=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D Original =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0AFrom: Dmitr= y Stogov =0ATo: Yasuo Ohgaki , Nikita = Popov =0ADate: Wed, 21 Jan 2015 14:57:59 +0100=0ASubj= ect: Re: [PHP-DEV] Fixing strange foreach behavior.=0A=0AHi,=0A=0AYeah, I t= hink changing foreach behaviour in more consistent and efficient=0Away may = make sense.=0AIf we won't use HashTable.nInternalPointer we won't need to c= opy immutable=0Aarrays.=0AThe same for nested foreach on the same array.=0A= We could also eliminate all the HashPosition magic introduced to keep PHP5= =0Abehavior.=0A=0AOn the other hand some apps may relay on current weird be= havior.=0A=0AI remember, long time ago Nikita made some related proposal.= =0ANikita, could you please send a link.=0A=0AThanks. Dmitry.=0A=0AOn Wed, = Jan 21, 2015 at 5:48 AM, Yasuo Ohgaki wrote:=0A=0A> Hi= Rowan,=0A>=0A> On Mon, Jan 19, 2015 at 12:05 AM, Rowan Collins =0A> wrote:=0A>=0A>> On 18 January 2015 at 01:01, Yasuo Ohgak= i wrote:=0A>>=0A>>> Hi Rowan,=0A>>>=0A>>> On Sat, Jan = 17, 2015 at 8:43 PM, Rowan Collins =0A>>> wrote:= =0A>>>=0A>>>> My concern is, at what cost? Given how rarely used the intern= al pointer=0A>>>> is,=0A>>>> are we carrying around a chunk of extra memory= with every array just on=0A>>>> the=0A>>>> off-chance that it will be used= , or is there some magic that makes it=0A>>>> zero-width until it's needed?= =0A>>>>=0A>>>> Reusing it for foreach makes perfect sense if it increases p= erformance=0A>>>> for=0A>>>> the majority of cases, and the only way to cov= er every edge-case like=0A>>>> the=0A>>>> one at the top of this thread wou= ld be to ban that optimisation=0A>>>> outright.=0A>>>> If there's some way = of separating things such that the rarely used=0A>>>> constructs take the p= erformance hit, I'm all for it, though.=0A>>>>=0A>>>=0A>>> External positio= n pointer needs a little more memory for sure. However,=0A>>> external posi= tion pointer is used all over the place in standard and=0A>>> other=0A>>> m= odules.=0A>>>=0A>>> I agree that some benchmark should be taken before we p= roceed.=0A>>>=0A>>=0A>> I meant more that the internal pointer takes up mem= ory which is unused by=0A>> 99% of PHP applications, so we might as well ge= t some benefit from it, but=0A>> it comes to the same thing.=0A>>=0A>=0A> I= t might be a good idea. I agree that more than 99% of PHP codes do not=0A> = need internal position pointer.=0A> Dmitry might be interested in this area= or he might take some benchmarks=0A> already. However, PHP does not=0A> su= pport bock scope, so getting rid of internal hash position pointer may=0A> = not be feasible even for simple loops.=0A> There may be workarounds in the = engine, though.=0A>=0A> Besides scope, user may call current()/next()/reset= () anywhere for an=0A> array in code. External position=0A> pointer is requ= ired to support this. Obvious workaround is to have array=0A> position reso= urce which is attached=0A> to specific array.=0A>=0A> $pos =3D array_fetch_= position($array); // Get array position resource of=0A> $array. Like extern= al hash position in C.=0A> reset($pos); // Reset position;=0A> $var =3D cur= rent($array, $pos);=0A> $var =3D next($array, $pos)=0A> $var =3D next($arra= y, $pos)=0A>=0A> This requires script modification, though.=0A>=0A> Externa= l position resource would be useful for complex array operations.=0A> It co= uld be implemented in standard=0A> module. It may be used to traverse depth= first, breadth first,etc. I might=0A> work on this.=0A>=0A>=0A>>=0A>>> Alt= ernatively, we could achieve consistency the other way round, by=0A>>>> mak= ing=0A>>>> foreach() reset the internal pointer even when it *doesn't* use = it, and=0A>>>> documenting that fact like we do for certain array functions= . Code=0A>>>> relying=0A>>>> on the current behaviour when mixing them is p= robably buggy anyway,=0A>>>> because=0A>>>> it is not well-defined, so the = BC concern should be low.=0A>>>>=0A>>>=0A>>> IIRC, there were discussions w= hen foreach is added that point out=0A>>> internal position=0A>>> pointer u= sage. Before foreach, everyone was using=0A>>> current()/next()/reset(). I = think=0A>>> the reason foreach sets internal position pointer to the end wa= s the=0A>>> code used in=0A>>> those days. It was not a technical/performan= ce reasons, wasn't it? I=0A>>> don't remember=0A>>> well because it was mor= e than 10 years ago. Anyone?=0A>>>=0A>>> IIRC, foreach was introduced by An= di. I think foreach used external=0A>>> position pointer and=0A>>> didn't t= ouch internal position pointer at all with his first proposal.=0A>>> Please= correct me if I'm=0A>>> wrong.=0A>>>=0A>>> As for nesting, I think PHP is = doing the right thing for plain arrays and=0A>>>> non-rewindable integrator= s, because I would expect a for each loop to=0A>>>> start=0A>>>> at the beg= inning, thus have an implicit reset/rewind. The rewindable=0A>>>> behaviour= is awkward, though - intuitively, the iteraror needs to be=0A>>>> able=0A>= >>> to track multiple positions simultaneously, which isn't guaranteed by= =0A>>>> the=0A>>>> interface design. Maybe an error should be issued if the= integrator is=0A>>>> already subject of a foreach loop?=0A>>>>=0A>>>=0A>>>= Iterator is headache, I agree.=0A>>> Iterator nesting may be detected by a= flag easily, but it may be better=0A>>> to generate=0A>>> independent iter= ators for each "foreach".=0A>>>=0A>>>=0A>> I don't think that's an either/o= r situation: in most cases, it would be=0A>> up to the user to create that = independent iterator, because there's no=0A>> general algorithm for cloning= one that the engine could use. (Think of an=0A>> iterator proxying a datab= ase cursor, for instance - it could be rewindable=0A>> directly, but clonin= g it would imply issuing a new query to create a new=0A>> cursor with separ= ate state.) So the engine needs to issue an error telling=0A>> the user tha= t they're reusing an object in an dangerous way. An extended=0A>> interface= could be created later for those cases that can handle the=0A>> situation,= although I doubt it would be that common a use case.=0A>>=0A>=0A> I think = if users need nested database cursor based iterator or like, they=0A> shoul= d create new iterator object by themselves.=0A> i.e. User should have __clo= ne() that opens new cursor.=0A>=0A> It might be good to have un-rewindable = iterator also. e.g. If rewind()=0A> method returns FALSE, then it's not rew= indable.=0A>=0A> Anyway, position independent array iteration with foreach = might be good=0A> feature for PHP 7.=0A>=0A> This behavior is reasonable.= =0A> http://3v4l.org/lJTQG=0A> but this is not reasonable now. (It was used= to be)=0A> http://3v4l.org/HbVnd=0A>=0A> Regards,=0A>=0A> --=0A> Yasuo Ohg= aki=0A> yohgaki@ohgaki.net=0A>=0A>