Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:33076 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 31127 invoked by uid 1010); 10 Nov 2007 11:27:24 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 31111 invoked from network); 10 Nov 2007 11:27:24 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 10 Nov 2007 11:27:24 -0000 Authentication-Results: pb1.pair.com smtp.mail=arend@hyves.nl; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=arend@hyves.nl; sender-id=pass Received-SPF: pass (pb1.pair.com: domain hyves.nl designates 85.158.200.82 as permitted sender) X-PHP-List-Original-Sender: arend@hyves.nl X-Host-Fingerprint: 85.158.200.82 exc01vs1.exchange.cysonet.com Windows 2000 SP4, XP SP1 Received: from [85.158.200.82] ([85.158.200.82:58740] helo=exc01vs1.exchange.cysonet.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id F7/3F-24268-A9595374 for ; Sat, 10 Nov 2007 06:27:23 -0500 Received: from hyves1.exchange.cysonet.com ([85.158.200.92]) by exc01vs1.exchange.cysonet.com with Microsoft SMTPSVC(6.0.3790.3959); Sat, 10 Nov 2007 12:27:20 +0100 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C8238C.AC256F09" Date: Sat, 10 Nov 2007 12:27:27 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PHP-DEV] Making parallel database queries from PHP Thread-Index: AcgjOYdKY3EjNCGdSVuSYemiDBHF0QABKacgABKx/Qo= References: <000301c82343$f3cf9df0$0201a8c0@ubar> To: Cc: X-OriginalArrivalTime: 10 Nov 2007 11:27:20.0402 (UTC) FILETIME=[A82DB320:01C8238C] Subject: RE: [PHP-DEV] Making parallel database queries from PHP From: arend@hyves.nl ("Arend van Beelen") ------_=_NextPart_001_01C8238C.AC256F09 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Scott, thanks for your thorough reply! I'm glad to see the first option appears = to be preferred, as it was my first choise as well. The main reason I = had some doubts was because a colleague of mine quite strongly argued = (though I was still not entirely convinced) that I would find myself in = an aweful lot of trouble when trying to implement a multi-threaded = library inside a single-threaded PHP environment. Which is the problem I = am hoping to solve by only exporting a single-threaded API. For the record, I do have some answers to your questions... 1) At the moment we use Apache2 with the prefork MPM. However, there is = some thought going into switching to lighttpd, so I don't think it would = be a good choice to make hard dependencies on Apache right now. The = machines we're using are all dual-cores and quad-cores, so yes, it would = be silly not to make use of their power :) 2) I will be the main developer of the library. I have personal = experience with assembly, C, C++, Java, PHP and whatnot. However, last = few years I've only been professionaly working with PHP so I will need = to make some effort to get back into low-level programming. Fortunately, = I've got some colleagues which are also good C and C++ programmers, so I = think the experience is there :) 3) MySQL is an absolute must. But it would be nice not to make the = library too dependant on MySQL (there's probably little need for that as = well), so it can be extended to other databases in the future. 4) Target platform is Gentoo Linux x86-64. Thanks again! Arend. -----Oorspronkelijk bericht----- Van: Scott A. Guyer [mailto:saguyer@gte.net] Verzonden: za 10-11-2007 3:46 Aan: Arend van Beelen Onderwerp: RE: [PHP-DEV] Making parallel database queries from PHP =20 Hi Arend, The first and second options differ primarily in who owns the scheduling = of tasks (DB tasks). In the first option, you assign tasks to threads and allow the OS to schedule threads. In the second option, you are the scheduler. When you view it this way, the pros/cons are fairly clear. = The latter option gives you all the control to schedule the way you want to schedule. But this control places all the risk on you as well. And you = may very well be painting yourself into a corner with respect to being able = to take advantage of any improvements in OS/job/thread scheduling in the future. Which I'm betting is going to be rampant again as multi-core is sooooo readily available. =20 One sorta fuzzy intangible I can think of is this. Your first choice is = the most common pattern for concurrency these days. Why? I think because = it is easier to implement and because OS thread handling is MUCH better these = days than it used to be say 15 years ago (lighter weight, better scheduling, multi-core optimizations, etc.). Contrast that to the second option. = Where you are kinda hoping any 3rd party libraries (and DB libraries) are = written to support async-I/O. This just isn't as common as you might expect = these days. So you might be constraining yourself a little with the libraries = you could expect to use (of course, this depends a great deal on precisely = which libraries you will use, but it is a risk). Thread safe libraries are = more common than async-I/O libraries in my experience. Async-I/O was making = a little comeback in recent years. But I can't say with any certainty = that it is prevalent in the libraries you might depend on. In both cases, you could implement based on Apache APR library which = would get you up and running nicely in apache on Windows, Unix, MAC. So = that's a plus. Turning briefly to the 3rd option...this really only benefits you in two cases. (1) It completely encapsulates you code so that any failures in = your code will not bring down the PHP (or its hosting app server). (2) You = have some one-off (perhaps proprietary or legacy) code base that you would = not be able to embed in apache/iis/php nicely (e.g., conflicts in threading, = memory management, etc.). It many not be a bad way to prototype as you work = out some kinks in your code. However, I don't favor this approach primarily because it adds an install dependency and a little extra IPC overhead = from PHP to your daemon. Additionally, this option may add a greater = portability burden if you were trying to move your daemon amongst the common OSes. = So I don't consider it a long term option. To conclude, I would favor your option 1, ceteris paribus. I would just = take a hard look at any dependent code you are expecting to utilize in your = code. That's where the rubber will meet the road. Other factors not fully considered which may impact your decision: (1) If in Apache, any particular MPM? All MPMs? Do you have a = deployment that already uses about as many threads as your hardware can handle? (2) Any particular skills (or lack of skills) for the developers of this library? (3) Which DBs are a must for you? Which are nice-to-haves? (4) Target platform(s)? Hope that helps. Cheers, -Scott PS - sorry for length :-( -----Original Message----- From: arendjr@gmail.com [mailto:arendjr@gmail.com] On Behalf Of Arend = van Beelen Sent: Friday, November 09, 2007 8:27 PM To: internals@lists.php.net Subject: [PHP-DEV] Making parallel database queries from PHP Hi there, I am researching the possibility of developing a shared library which = can perform database queries in parallel to multiple databases. One = important requirement is that I will be able to use this functionality from PHP. Because I know PHP is not thread-safe due to other libraries, I am = wondering what would be the best way to implement this. Right now I can imagine = three solutions: - Use multiple threads to connect to the databases, but let the library export a blocking single-threaded API. So, PHP calls a function in the library, this function spawns new threads, which do the real work. = Meanwhile the function waits for the threads to finish, and when all threads are = done it returns the final result back to PHP. - Use a single thread and asynchronous socket communication. So, PHP = calls the library function and this function handles all connections within = the same thread using asynchronous communication, and returns the result to = PHP when all communication is completed. - Use a daemon on the localhost. Make a connection from PHP to the = daemon, the daemon handles all the connections to the databases and passes the result back to the connection made from PHP. Can someone give me some advise about advantages of using one approach = or another? Please keep in mind that I'm hoping for a solution which will = be both stable and minimizes overhead. Thanks, Arend. --=20 Arend van Beelen jr. "If you want my address, it's number one at the end of the bar." ------_=_NextPart_001_01C8238C.AC256F09--