Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:34216 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 28713 invoked by uid 1010); 22 Dec 2007 15:09:38 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 28697 invoked from network); 22 Dec 2007 15:09:38 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 22 Dec 2007 15:09:38 -0000 Authentication-Results: pb1.pair.com smtp.mail=chris_se@gmx.net; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=chris_se@gmx.net; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmx.net designates 213.165.64.20 as permitted sender) X-PHP-List-Original-Sender: chris_se@gmx.net X-Host-Fingerprint: 213.165.64.20 mail.gmx.net Linux 2.5 (sometimes 2.4) (4) Received: from [213.165.64.20] ([213.165.64.20:35252] helo=mail.gmx.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id B8/00-27869-0B82D674 for ; Sat, 22 Dec 2007 10:09:38 -0500 Received: (qmail invoked by alias); 22 Dec 2007 15:09:33 -0000 Received: from p54A1714B.dip.t-dialin.net (EHLO chris-se.dyndns.org) [84.161.113.75] by mail.gmx.net (mp033) with SMTP; 22 Dec 2007 16:09:33 +0100 X-Authenticated: #186999 X-Provags-ID: V01U2FsdGVkX1+JTdZqMPaJrRUJiPYJJ2bIbvLROa+gDPm8o36lLl hmOexu+lpYRw2y Received: from [192.168.0.3] (HSI-KBW-091-089-005-213.hsi2.kabelbw.de [91.89.5.213]) by chris-se.dyndns.org (Postfix) with ESMTP id 8DE9A2756 for ; Sat, 22 Dec 2007 16:09:49 +0100 (CET) Message-ID: <476D2854.5070803@gmx.net> Date: Sat, 22 Dec 2007 16:08:04 +0100 User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: internals@lists.php.net References: <98b8086f0712150818n40056cedyf0aae7a5a08a27b7@mail.gmail.com> <476582E6.7020808@zend.com> <200712172130.08216.larry@garfieldtech.com> <4FADC266-873E-4FD2-BEC8-28EA9D833297@procata.com> In-Reply-To: <4FADC266-873E-4FD2-BEC8-28EA9D833297@procata.com> X-Enigmail-Version: 0.95.5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 Subject: PATCH: Implementing closures in PHP (was: anonymous functions in PHP) From: chris_se@gmx.net (Christian Seiler) Hi, I was following this thread and came upon Jeff's posting on how closures could be implemented in PHP. Since I would find the feature to be EXTREMELY useful, I decided to actually implement it more or less the way Jeff proposed. So, here's the patch (against PHP_5_3, I can write one against HEAD if you whish): http://www.christian-seiler.de/temp/closures-php-5-3.patch I started with Wez's patch for adding anonymous functions that aren't closures. I changed it to make sure no shift/reduce or reduce/reduce error occur in the grammar. Then I started implementing the actual closure stuff. It was fun because I learned quite a lot about how PHP actually works. I had the following main goals while developing the patch: 1. Don't reinvent the wheel. 2. Don't break anything unless absolutely necessary. 3. Keep it simple. Jeff proposed a new type of zval that holds additional information about the function that is to be called. Adding a new type of zval would need changes throughout the ENTIRE PHP source and probably also throughout quite a few scripts. But fortunately, PHP already posesses a zval that supports the storage of arbitrary data while being very lightweight: Resources. So I simply added a new resource type that stores zend functions. The $var = function () {}; will now make $var a resource (of the type "anonymous function". Anonymous functions are ALWAYS defined at compile time, no matter where they occur. They are simply named __compiled_lamda_1234 and added to the global function table. But instead of simply returning the string '__compiled_lambda_1234', I introduced a new opcode that will create references to the correct local variables that are referenced inside the function. For example, if you have: $func = function () { echo "Hello World\n"; }; This will result in an anonymous function called '__compiled_lambda_0' that is added to the function table at compile time. The opcode for the assignment to $func will be something like: 1 ZEND_DECLARE_ANON_FUNC ~0 '__compiled_lambda_0' 2 ASSIGN !0, ~0 The ZEND_DECLARE_ANON_FUNC opcode handler does the following: It creates a new zend_function, copies the contents of the entire structure of the function table entry corresponding to '__compiled_lamda_0' into that new structure, increments the refcount, registeres it as a resource and returns that resource so it can be assigned to the variable. Now, have a look at a real closure: $string = "Hello World!\n"; $func = function () { lexical $string; echo $string; }; This will result in the same opcode as above. But here, three additional things happen: 1. The compiler sees the keyword 'lexical' and stores the information, that a variable called 'string' should be used inside the closure. 2. The opcode handler sees that a variable named 'string' is marked as lexical in the function definition. Therefore it creates a reference to it in a HashTable of the COPIED zend_function (that will be stored in the resource). 3. The 'lexical $string;' translates into a FETCH opcode that will work in exactly the same way as 'static' or 'global' - only fetching it from the additional HashTable in the zend_function structure. The resource destructor makes sure that the HashTable containing the references to the lexical veriables is correctly destroyed upon destruction of the resource. It does NOT destroy other parts of the function structure because they will be freed when the function is removed from the global function table. With these changes, closures work in PHP. Some caveats / bugs / todo: * Calling anonymous functions by name directly is problematic if there are lexical variables that need to be assigned. I added checks to make sure this case does not happen. * In the opcode handler, error handling needs to be added. * If somebody removes the function from the global function table, (e.g. with runkit), the new opcode will return NULL instead of a resource (error handling is missing). Since I do increment refcount of the zend_function, it SHOULD not cause segfaults or memory leaks, but I haven't tested it. * $this is kind of a problem, because all the fetch handlers in PHP make sure $this is a special kind of variable. For the first version of the patch I chose not to care about this because what still works is e.g. the following: $object = $this; $func = function () { lexical $object; // do something }; Also, inside the closures, the class context is not preserved, so accessing private / protected members is not possible. I'm not sure this actually represents a problem because you can always use normal local variables to pass values between closure and calling method and make the calling method change the properties itself. * I've had some problems with eval(), have a look at the following code: $func = eval ('return function () { echo "Hello World!\n"; };'); $func(); With plain PHP, this seems to work, with the VLD extension loaded (that shows the Opcodes), it crashes. I don't know if that's a problem with eval() or just with VLD and I didn't have time to investigate it further. * Oh, yes, 'lexical' is now a keyword. Although I really don't think that TOO many people use that as an identifier, so it probably won't hurt THAT much. Except those above points, it really works, even with complex stuff. Let me show you some examples: 1. Customized array_filter: function filter_larger ($array, $min = 42) { $filter = function ($value) { lexical $min; return ($value >= $min); }; return array_filter ($array, $filter); } $arr = array (41, 43); var_dump (filter_larger ($arr)); // 43 var_dump (filter_larger ($arr, 40)); // 41, 43 var_dump (filter_larger ($arr, 44)); // empty 2. Jeff's example: function getAdder($x) { return function ($y) { lexical $x; return $x + $y; }; } $plusFive = getAdder(5); $plusTen = getAdder(10); echo $plusFive(4)."\n"; // 9 echo $plusTen(7)."\n"; // 17 3. Nested closures $outer = function ($value) { return function () { lexical $value; return $value * 2; }; }; $duplicator = $outer (4); echo $duplicator ()."\n"; // 8 $duplicator = $outer (8); echo $duplicator ()."\n"; // 16 [Ok, yeah, that example is quite stupid and should NOT be used as an example for good code. ;-) But it's simple and demonstrates the possibilities.] It would be great if somebody could review the patch because I'm shure some parts can still be cleaned up or improved. And it would be even better if this feature would make it into PHP. ;-) Regards, Christian PS: I'd like to thank Derick Rethans for his GREAT Vulcan Logic Disassembler - without it, developement would have been a LOT more painful. PPS: Oh, yeah, if it should be legally necessary, I grant the right to anybody to use this patch under any OSI certified license you may want to choose.