Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:33081 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 79785 invoked by uid 1010); 10 Nov 2007 15:51:01 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 79770 invoked from network); 10 Nov 2007 15:51:00 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 10 Nov 2007 15:51:00 -0000 Authentication-Results: pb1.pair.com header.from=arend@hyves.nl; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=arend@hyves.nl; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain hyves.nl designates 85.158.200.86 as permitted sender) X-PHP-List-Original-Sender: arend@hyves.nl X-Host-Fingerprint: 85.158.200.86 exc03vs1.exchange.cysonet.com Windows 2000 SP4, XP SP1 Received: from [85.158.200.86] ([85.158.200.86:56219] helo=exc03vs1.exchange.cysonet.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 88/42-64752-263D5374 for ; Sat, 10 Nov 2007 10:50:59 -0500 Received: from hyves1.exchange.cysonet.com ([85.158.200.92]) by exc03vs1.exchange.cysonet.com with Microsoft SMTPSVC(6.0.3790.3959); Sat, 10 Nov 2007 16:50:56 +0100 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C823B1.7F4BB828" Date: Sat, 10 Nov 2007 16:51:03 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PHP-DEV] Making parallel database queries from PHP Thread-Index: Acgjl2+B2Boixh5VTauw9pQzoyqfSAACr8Cc References: <4735A78E.8050001@catalyst.net.nz> To: Cc: "Donal McMullan" X-OriginalArrivalTime: 10 Nov 2007 15:50:56.0190 (UTC) FILETIME=[7B1F6DE0:01C823B1] Subject: RE: [PHP-DEV] Making parallel database queries from PHP From: arend@hyves.nl ("Arend van Beelen") ------_=_NextPart_001_01C823B1.7F4BB828 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Donal, thanks for your suggestion. While I think this approach might provide = some quick solutions short-term, there actually is a much bigger problem = we are trying to attack. I don't know exactly how much details I can = give, but I will give some background information to get some more = insight in the situation... We are dealing with literally hundreds of webservers, and hundreds of = database servers, and are expanding both of them on a frequent basis. = Whenever we increase the number of webservers, the databases become our = bottleneck and vice versa. We realize we won't magically solve any of = these bottlenecks by introducing parallel querying on the databases. We = have lots of tables which are divided over more than a dozen database = clusters, and we are getting more and more tables which become so big = they have to be spread out over multiple databases. Because of the = distribution of these tables, querying them becomes increasingly hard, = and we are approaching a limit where further distribution will become = virtually undoable using our current approach. The current approach = being querying the various databases serially from PHP and manually = merging the results. If we continue down this path, our PHP application = will have to do increasingly many queries serially, and latencies will = add up more and more. Not to mention the code maintenance required for = finding the correct databases to query and merging all the results. = Therefore we will be needing parallellization techniques that will be = able to transparently handle communication with the databases, to keep = our latencies low, but also to relieve our PHP application from having = to deal with all the distributed databases. Thanks! Arend. -----Oorspronkelijk bericht----- Van: Donal McMullan [mailto:donal@catalyst.net.nz] Verzonden: za 10-11-2007 13:43 Aan: Arend van Beelen CC: internals@lists.php.net; Alexey Zakhlestin Onderwerp: Re: [PHP-DEV] Making parallel database queries from PHP =20 Hi Arend - If your webserver CPUs are already maxed out, that problem won't go away = on its own, but once you've solved that (optimized your code or added=20 more webservers), the curl_multi_* functions might help you out. A cheap way to parallelize your database or data-object access, is to=20 implement a kind of services-oriented architecture, where you have one=20 PHP script* that does little except get data from a database, serialize=20 that data, and return it to your main PHP script. The main PHP script uses the curl_multi_init, curl_multi_add_handle,=20 etc. functions to call this script multiple times in parallel, returning = different data objects for each call. Because this introduces latency into the data retrieval trip, it will be = slower for most applications. Some circumstances that might make it=20 viable include: * you have > 1 data store * you have multiple slow queries that aren't interdependent * you have to do expensive processing on the data you retrieve * you have lots of slack (CPU, RAM, processes) on the webservers In its favor - it should take just a couple of hours to prototype. If=20 you have a single canonical data store, you might find that as soon as=20 you enable parallel queries against the database, your database becomes=20 the bottleneck, and throughput doesn't actually increase. This technique = should reveal that as a potential problem without much development cost. Interested to know how you proceed. Donal McMullan ----------------------------------------------------------------------- Donal @ Catalyst.Net.NZ PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St OFFICE: +64(4)803-2372 MOB: +64(21)661-254 ----------------------------------------------------------------------- *actually - Java's a pretty good option for this tier too. Arend van Beelen wrote: > While I can see the theoretical advantage of this, I wonder how much = there's too gain in practice (at least for us, that is). >=20 > In our current codebase, when a database query is done, PHP can only = continue when it has the result anyway, so it would require serious code = modifications to make use of such functionality. Also, while it may = theoratically shorten page load times, our webservers are already = constraint by CPU load anyway, so we would probably not be able to get = more pageviews out of it either. >=20 > -----Oorspronkelijk bericht----- > Van: Alexey Zakhlestin [mailto:indeyets@gmail.com] > Verzonden: za 10-11-2007 11:31 > Aan: Arend van Beelen > CC: internals@lists.php.net > Onderwerp: Re: [PHP-DEV] Making parallel database queries from PHP > =20 > I would prefer to have some function, which would check, if the > requested data is already available (if it is not, I would still be > able to do something useful, while waiting) >=20 > On 11/10/07, Arend van Beelen wrote: >> Hi there, >> >> I am researching the possibility of developing a shared library which = can >> perform database queries in parallel to multiple databases. One = important >> requirement is that I will be able to use this functionality from = PHP. >> Because I know PHP is not thread-safe due to other libraries, I am = wondering >> what would be the best way to implement this. Right now I can imagine = three >> solutions: >> >> - Use multiple threads to connect to the databases, but let the = library >> export a blocking single-threaded API. So, PHP calls a function in = the >> library, this function spawns new threads, which do the real work. = Meanwhile >> the function waits for the threads to finish, and when all threads = are done >> it returns the final result back to PHP. >> - Use a single thread and asynchronous socket communication. So, PHP = calls >> the library function and this function handles all connections within = the >> same thread using asynchronous communication, and returns the result = to PHP >> when all communication is completed. >> - Use a daemon on the localhost. Make a connection from PHP to the = daemon, >> the daemon handles all the connections to the databases and passes = the >> result back to the connection made from PHP. >> >> Can someone give me some advise about advantages of using one = approach or >> another? Please keep in mind that I'm hoping for a solution which = will be >> both stable and minimizes overhead. >> >> Thanks, >> Arend. >> >> -- >> Arend van Beelen jr. >> "If you want my address, it's number one at the end of the bar." >> >=20 >=20 ------_=_NextPart_001_01C823B1.7F4BB828--