Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:33076
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain hyves.nl designates 85.158.200.82 as permitted sender)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C8238C.AC256F09"
Date: Sat, 10 Nov 2007 12:27:27 +0100
Message-ID: <AD5924F6A589EF419765D39834C6BAFE348C10@hyves1.exchange.cysonet.com>
Thread-Topic: [PHP-DEV] Making parallel database queries from PHP
Thread-Index: AcgjOYdKY3EjNCGdSVuSYemiDBHF0QABKacgABKx/Qo=
References: <e5fac5430711091727n7ce26594scb9b31f1f3babc1e@mail.gmail.com> <000301c82343$f3cf9df0$0201a8c0@ubar>
To: <internals@lists.php.net>
Cc: <saguyer@gte.net>
Subject: RE: [PHP-DEV] Making parallel database queries from PHP
From: arend@hyves.nl ("Arend van Beelen")

------_=_NextPart_001_01C8238C.AC256F09
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi Scott,

thanks for your thorough reply! I'm glad to see the first option appears =
to be preferred, as it was my first choise as well. The main reason I =
had some doubts was because a colleague of mine quite strongly argued =
(though I was still not entirely convinced) that I would find myself in =
an aweful lot of trouble when trying to implement a multi-threaded =
library inside a single-threaded PHP environment. Which is the problem I =
am hoping to solve by only exporting a single-threaded API.

For the record, I do have some answers to your questions...
1) At the moment we use Apache2 with the prefork MPM. However, there is =
some thought going into switching to lighttpd, so I don't think it would =
be a good choice to make hard dependencies on Apache right now. The =
machines we're using are all dual-cores and quad-cores, so yes, it would =
be silly not to make use of their power :)
2) I will be the main developer of the library. I have personal =
experience with assembly, C, C++, Java, PHP and whatnot. However, last =
few years I've only been professionaly working with PHP so I will need =
to make some effort to get back into low-level programming. Fortunately, =
I've got some colleagues which are also good C and C++ programmers, so I =
think the experience is there :)
3) MySQL is an absolute must. But it would be nice not to make the =
library too dependant on MySQL (there's probably little need for that as =
well), so it can be extended to other databases in the future.
4) Target platform is Gentoo Linux x86-64.

Thanks again!
Arend.

-----Oorspronkelijk bericht-----
Van: Scott A. Guyer [mailto:saguyer@gte.net]
Verzonden: za 10-11-2007 3:46
Aan: Arend van Beelen
Onderwerp: RE: [PHP-DEV] Making parallel database queries from PHP
=20
Hi Arend,

The first and second options differ primarily in who owns the scheduling =
of
tasks (DB tasks).  In the first option, you assign tasks to threads and
allow the OS to schedule threads.  In the second option, you are the
scheduler.  When you view it this way, the pros/cons are fairly clear.  =
The
latter option gives you all the control to schedule the way you want to
schedule.  But this control places all the risk on you as well.  And you =
may
very well be painting yourself into a corner with respect to being able =
to
take advantage of any improvements in OS/job/thread scheduling in the
future.  Which I'm betting is going to be rampant again as multi-core is
sooooo readily available. =20

One sorta fuzzy intangible I can think of is this.  Your first choice is =
the
most common pattern for concurrency these days.  Why?  I think because =
it is
easier to implement and because OS thread handling is MUCH better these =
days
than it used to be say 15 years ago (lighter weight, better scheduling,
multi-core optimizations, etc.).  Contrast that to the second option.  =
Where
you are kinda hoping any 3rd party libraries (and DB libraries) are =
written
to support async-I/O.  This just isn't as common as you might expect =
these
days.  So you might be constraining yourself a little with the libraries =
you
could expect to use (of course, this depends a great deal on precisely =
which
libraries you will use, but it is a risk).  Thread safe libraries are =
more
common than async-I/O libraries in my experience.  Async-I/O was making =
a
little comeback in recent years.  But I can't say with any certainty =
that it
is prevalent in the libraries you might depend on.

In both cases, you could implement based on Apache APR library which =
would
get you up and running nicely in apache on Windows, Unix, MAC.  So =
that's a
plus.

Turning briefly to the 3rd option...this really only benefits you in two
cases.  (1) It completely encapsulates you code so that any failures in =
your
code will not bring down the PHP (or its hosting app server).  (2) You =
have
some one-off (perhaps proprietary or legacy) code base that you would =
not be
able to embed in apache/iis/php nicely (e.g., conflicts in threading, =
memory
management, etc.).  It many not be a bad way to prototype as you work =
out
some kinks in your code.  However, I don't favor this approach primarily
because it adds an install dependency and a little extra IPC overhead =
from
PHP to your daemon.  Additionally, this option may add a greater =
portability
burden if you were trying to move your daemon amongst the common OSes. =
So I
don't consider it a long term option.

To conclude, I would favor your option 1, ceteris paribus. I would just =
take
a hard look at any dependent code you are expecting to utilize in your =
code.
That's where the rubber will meet the road.

Other factors not fully considered which may impact your decision:
(1) If in Apache, any particular MPM?  All MPMs?  Do you have a =
deployment
that already uses about as many threads as your hardware can handle?
(2) Any particular skills (or lack of skills) for the developers of this
library?
(3) Which DBs are a must for you?  Which are nice-to-haves?
(4) Target platform(s)?

Hope that helps.  Cheers,
-Scott

PS - sorry for length :-(



-----Original Message-----
From: arendjr@gmail.com [mailto:arendjr@gmail.com] On Behalf Of Arend =
van
Beelen
Sent: Friday, November 09, 2007 8:27 PM
To: internals@lists.php.net
Subject: [PHP-DEV] Making parallel database queries from PHP

Hi there,

I am researching the possibility of developing a shared library which =
can
perform database queries in parallel to multiple databases. One =
important
requirement is that I will be able to use this functionality from PHP.
Because I know PHP is not thread-safe due to other libraries, I am =
wondering
what would be the best way to implement this. Right now I can imagine =
three
solutions:

- Use multiple threads to connect to the databases, but let the library
export a blocking single-threaded API. So, PHP calls a function in the
library, this function spawns new threads, which do the real work. =
Meanwhile
the function waits for the threads to finish, and when all threads are =
done
it returns the final result back to PHP.
- Use a single thread and asynchronous socket communication. So, PHP =
calls
the library function and this function handles all connections within =
the
same thread using asynchronous communication, and returns the result to =
PHP
when all communication is completed.
- Use a daemon on the localhost. Make a connection from PHP to the =
daemon,
the daemon handles all the connections to the databases and passes the
result back to the connection made from PHP.

Can someone give me some advise about advantages of using one approach =
or
another? Please keep in mind that I'm hoping for a solution which will =
be
both stable and minimizes overhead.

Thanks,
Arend.

--=20
Arend van Beelen jr.
"If you want my address, it's number one at the end of the bar."








------_=_NextPart_001_01C8238C.AC256F09--