Hi there,
I am researching the possibility of developing a shared library which can
perform database queries in parallel to multiple databases. One important
requirement is that I will be able to use this functionality from PHP.
Because I know PHP is not thread-safe due to other libraries, I am wondering
what would be the best way to implement this. Right now I can imagine three
solutions:
- Use multiple threads to connect to the databases, but let the library
export a blocking single-threaded API. So, PHP calls a function in the
library, this function spawns new threads, which do the real work. Meanwhile
the function waits for the threads to finish, and when all threads are done
it returns the final result back to PHP. - Use a single thread and asynchronous socket communication. So, PHP calls
the library function and this function handles all connections within the
same thread using asynchronous communication, and returns the result to PHP
when all communication is completed. - Use a daemon on the localhost. Make a connection from PHP to the daemon,
the daemon handles all the connections to the databases and passes the
result back to the connection made from PHP.
Can someone give me some advise about advantages of using one approach or
another? Please keep in mind that I'm hoping for a solution which will be
both stable and minimizes overhead.
Thanks,
Arend.
--
Arend van Beelen jr.
"If you want my address, it's number one at the end of the bar."
I would prefer to have some function, which would check, if the
requested data is already available (if it is not, I would still be
able to do something useful, while waiting)
Hi there,
I am researching the possibility of developing a shared library which can
perform database queries in parallel to multiple databases. One important
requirement is that I will be able to use this functionality from PHP.
Because I know PHP is not thread-safe due to other libraries, I am wondering
what would be the best way to implement this. Right now I can imagine three
solutions:
- Use multiple threads to connect to the databases, but let the library
export a blocking single-threaded API. So, PHP calls a function in the
library, this function spawns new threads, which do the real work. Meanwhile
the function waits for the threads to finish, and when all threads are done
it returns the final result back to PHP.- Use a single thread and asynchronous socket communication. So, PHP calls
the library function and this function handles all connections within the
same thread using asynchronous communication, and returns the result to PHP
when all communication is completed.- Use a daemon on the localhost. Make a connection from PHP to the daemon,
the daemon handles all the connections to the databases and passes the
result back to the connection made from PHP.Can someone give me some advise about advantages of using one approach or
another? Please keep in mind that I'm hoping for a solution which will be
both stable and minimizes overhead.Thanks,
Arend.--
Arend van Beelen jr.
"If you want my address, it's number one at the end of the bar."
--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/
While I can see the theoretical advantage of this, I wonder how much there's too gain in practice (at least for us, that is).
In our current codebase, when a database query is done, PHP can only continue when it has the result anyway, so it would require serious code modifications to make use of such functionality. Also, while it may theoratically shorten page load times, our webservers are already constraint by CPU load anyway, so we would probably not be able to get more pageviews out of it either.
-----Oorspronkelijk bericht-----
Van: Alexey Zakhlestin [mailto:indeyets@gmail.com]
Verzonden: za 10-11-2007 11:31
Aan: Arend van Beelen
CC: internals@lists.php.net
Onderwerp: Re: [PHP-DEV] Making parallel database queries from PHP
I would prefer to have some function, which would check, if the
requested data is already available (if it is not, I would still be
able to do something useful, while waiting)
Hi there,
I am researching the possibility of developing a shared library which can
perform database queries in parallel to multiple databases. One important
requirement is that I will be able to use this functionality from PHP.
Because I know PHP is not thread-safe due to other libraries, I am wondering
what would be the best way to implement this. Right now I can imagine three
solutions:
- Use multiple threads to connect to the databases, but let the library
export a blocking single-threaded API. So, PHP calls a function in the
library, this function spawns new threads, which do the real work. Meanwhile
the function waits for the threads to finish, and when all threads are done
it returns the final result back to PHP.- Use a single thread and asynchronous socket communication. So, PHP calls
the library function and this function handles all connections within the
same thread using asynchronous communication, and returns the result to PHP
when all communication is completed.- Use a daemon on the localhost. Make a connection from PHP to the daemon,
the daemon handles all the connections to the databases and passes the
result back to the connection made from PHP.Can someone give me some advise about advantages of using one approach or
another? Please keep in mind that I'm hoping for a solution which will be
both stable and minimizes overhead.Thanks,
Arend.--
Arend van Beelen jr.
"If you want my address, it's number one at the end of the bar."
--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/
Hi,
A few pointers and ideas:
- ext/pgsql has support for asynchronous queries (pg_send_query()
and friends) - maybe you can create something out of MySQL Proxy that splits out
a single query into multiple queries and then rejoins them - since MySQL AB is actively developing a new libmysql replacement
for PHP only, you might want to talk to them about implementing
something like this
regards,
Lukas
Hi Lukas,
- maybe you can create something out of MySQL Proxy that splits out
a single query into multiple queries and then rejoins them- since MySQL AB is actively developing a new libmysql replacement
for PHP only, you might want to talk to them about implementing
something like this
Spot on! What you're suggesting, splitting queries and combining the results, is exactly the direction I'm thinking about. We had a short look though at MySQL Proxy, and basically it does not seem to cut the job for us. We want to avoid adding another proxy layer between our webservers and the database servers as it would just mean additional overhead and possible bottlenecks and points of failure. This is why I want to move this functionality onto the webservers themselves, to achieve minimum overhead and to guarantee it will scale with the number of webservers.
Nevertheless, MySQL Proxy does appear to provide some of the functionality we will be needing and it might indeed be a good idea to contact MySQL and try to reuse some of Proxy's components if possible.
Thanks!
Arend.
Hi Arend -
If your webserver CPUs are already maxed out, that problem won't go away
on its own, but once you've solved that (optimized your code or added
more webservers), the curl_multi_* functions might help you out.
A cheap way to parallelize your database or data-object access, is to
implement a kind of services-oriented architecture, where you have one
PHP script* that does little except get data from a database, serialize
that data, and return it to your main PHP script.
The main PHP script uses the curl_multi_init, curl_multi_add_handle,
etc. functions to call this script multiple times in parallel, returning
different data objects for each call.
Because this introduces latency into the data retrieval trip, it will be
slower for most applications. Some circumstances that might make it
viable include:
- you have > 1 data store
- you have multiple slow queries that aren't interdependent
- you have to do expensive processing on the data you retrieve
- you have lots of slack (CPU, RAM, processes) on the webservers
In its favor - it should take just a couple of hours to prototype. If
you have a single canonical data store, you might find that as soon as
you enable parallel queries against the database, your database becomes
the bottleneck, and throughput doesn't actually increase. This technique
should reveal that as a potential problem without much development cost.
Interested to know how you proceed.
Donal McMullan
Donal @ Catalyst.Net.NZ PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)803-2372 MOB: +64(21)661-254
*actually - Java's a pretty good option for this tier too.
Arend van Beelen wrote:
While I can see the theoretical advantage of this, I wonder how much there's too gain in practice (at least for us, that is).
In our current codebase, when a database query is done, PHP can only continue when it has the result anyway, so it would require serious code modifications to make use of such functionality. Also, while it may theoratically shorten page load times, our webservers are already constraint by CPU load anyway, so we would probably not be able to get more pageviews out of it either.
-----Oorspronkelijk bericht-----
Van: Alexey Zakhlestin [mailto:indeyets@gmail.com]
Verzonden: za 10-11-2007 11:31
Aan: Arend van Beelen
CC: internals@lists.php.net
Onderwerp: Re: [PHP-DEV] Making parallel database queries from PHPI would prefer to have some function, which would check, if the
requested data is already available (if it is not, I would still be
able to do something useful, while waiting)Hi there,
I am researching the possibility of developing a shared library which can
perform database queries in parallel to multiple databases. One important
requirement is that I will be able to use this functionality from PHP.
Because I know PHP is not thread-safe due to other libraries, I am wondering
what would be the best way to implement this. Right now I can imagine three
solutions:
- Use multiple threads to connect to the databases, but let the library
export a blocking single-threaded API. So, PHP calls a function in the
library, this function spawns new threads, which do the real work. Meanwhile
the function waits for the threads to finish, and when all threads are done
it returns the final result back to PHP.- Use a single thread and asynchronous socket communication. So, PHP calls
the library function and this function handles all connections within the
same thread using asynchronous communication, and returns the result to PHP
when all communication is completed.- Use a daemon on the localhost. Make a connection from PHP to the daemon,
the daemon handles all the connections to the databases and passes the
result back to the connection made from PHP.Can someone give me some advise about advantages of using one approach or
another? Please keep in mind that I'm hoping for a solution which will be
both stable and minimizes overhead.Thanks,
Arend.--
Arend van Beelen jr.
"If you want my address, it's number one at the end of the bar."
Hi Donal,
thanks for your suggestion. While I think this approach might provide some quick solutions short-term, there actually is a much bigger problem we are trying to attack. I don't know exactly how much details I can give, but I will give some background information to get some more insight in the situation...
We are dealing with literally hundreds of webservers, and hundreds of database servers, and are expanding both of them on a frequent basis. Whenever we increase the number of webservers, the databases become our bottleneck and vice versa. We realize we won't magically solve any of these bottlenecks by introducing parallel querying on the databases. We have lots of tables which are divided over more than a dozen database clusters, and we are getting more and more tables which become so big they have to be spread out over multiple databases. Because of the distribution of these tables, querying them becomes increasingly hard, and we are approaching a limit where further distribution will become virtually undoable using our current approach. The current approach being querying the various databases serially from PHP and manually merging the results. If we continue down this path, our PHP application will have to do increasingly many queries serially, and latencies will add up more and more. Not to mention the code maintenance required for finding the correct databases to query and merging all the results. Therefore we will be needing parallellization techniques that will be able to transparently handle communication with the databases, to keep our latencies low, but also to relieve our PHP application from having to deal with all the distributed databases.
Thanks!
Arend.
-----Oorspronkelijk bericht-----
Van: Donal McMullan [mailto:donal@catalyst.net.nz]
Verzonden: za 10-11-2007 13:43
Aan: Arend van Beelen
CC: internals@lists.php.net; Alexey Zakhlestin
Onderwerp: Re: [PHP-DEV] Making parallel database queries from PHP
Hi Arend -
If your webserver CPUs are already maxed out, that problem won't go away
on its own, but once you've solved that (optimized your code or added
more webservers), the curl_multi_* functions might help you out.
A cheap way to parallelize your database or data-object access, is to
implement a kind of services-oriented architecture, where you have one
PHP script* that does little except get data from a database, serialize
that data, and return it to your main PHP script.
The main PHP script uses the curl_multi_init, curl_multi_add_handle,
etc. functions to call this script multiple times in parallel, returning
different data objects for each call.
Because this introduces latency into the data retrieval trip, it will be
slower for most applications. Some circumstances that might make it
viable include:
- you have > 1 data store
- you have multiple slow queries that aren't interdependent
- you have to do expensive processing on the data you retrieve
- you have lots of slack (CPU, RAM, processes) on the webservers
In its favor - it should take just a couple of hours to prototype. If
you have a single canonical data store, you might find that as soon as
you enable parallel queries against the database, your database becomes
the bottleneck, and throughput doesn't actually increase. This technique
should reveal that as a potential problem without much development cost.
Interested to know how you proceed.
Donal McMullan
Donal @ Catalyst.Net.NZ PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)803-2372 MOB: +64(21)661-254
*actually - Java's a pretty good option for this tier too.
Arend van Beelen wrote:
While I can see the theoretical advantage of this, I wonder how much there's too gain in practice (at least for us, that is).
In our current codebase, when a database query is done, PHP can only continue when it has the result anyway, so it would require serious code modifications to make use of such functionality. Also, while it may theoratically shorten page load times, our webservers are already constraint by CPU load anyway, so we would probably not be able to get more pageviews out of it either.
-----Oorspronkelijk bericht-----
Van: Alexey Zakhlestin [mailto:indeyets@gmail.com]
Verzonden: za 10-11-2007 11:31
Aan: Arend van Beelen
CC: internals@lists.php.net
Onderwerp: Re: [PHP-DEV] Making parallel database queries from PHPI would prefer to have some function, which would check, if the
requested data is already available (if it is not, I would still be
able to do something useful, while waiting)Hi there,
I am researching the possibility of developing a shared library which can
perform database queries in parallel to multiple databases. One important
requirement is that I will be able to use this functionality from PHP.
Because I know PHP is not thread-safe due to other libraries, I am wondering
what would be the best way to implement this. Right now I can imagine three
solutions:
- Use multiple threads to connect to the databases, but let the library
export a blocking single-threaded API. So, PHP calls a function in the
library, this function spawns new threads, which do the real work. Meanwhile
the function waits for the threads to finish, and when all threads are done
it returns the final result back to PHP.- Use a single thread and asynchronous socket communication. So, PHP calls
the library function and this function handles all connections within the
same thread using asynchronous communication, and returns the result to PHP
when all communication is completed.- Use a daemon on the localhost. Make a connection from PHP to the daemon,
the daemon handles all the connections to the databases and passes the
result back to the connection made from PHP.Can someone give me some advise about advantages of using one approach or
another? Please keep in mind that I'm hoping for a solution which will be
both stable and minimizes overhead.Thanks,
Arend.--
Arend van Beelen jr.
"If you want my address, it's number one at the end of the bar."