Hi everybody,
I made a simple test to know if it was possible to speed-up php performance
by using parallel programming.
I modified 2 functions within ./ext/standard/array.c file of PHP-7.0.0RC7 to
use openmp and added the
-fopenmp flag to the compiler flags in the Makefile.
I modified the php functions array_sum()
and in_array()
to do some
benchmarking.
I made changes in the php_search_array() function only using the strict
comparaison.
To make it work in all cases, you have just to do some cut'n'paste, but the
source is easier to read with only 1 change.
Use :
OMP_NUM_THREADS=xx /path_to/php test_file.php
to specify xx, the number of threads to use in your test file
I got an average speed-up of 3.5 using 4 cores.
You can find all details:
. Test files
. Modified source files
. Benchmarks
. Faq
. .
At http://poc.yakpro.com/?php7_performance_boost_parallel_computing
My main goal is to initiate a global thinking, among php core developers,
concerning the parallel computing and php, at each level:
-
Core level: For speeding-up functions, but also the interpreter
(by parallelizing opcodes pre-fetching for example)
-
User level: imagine and implement simple and efficient
primitives to make the php developer comfortable with parallel programming.
A multicore php8 or php9? Wouldn't be cool????
Best regards,
Pascal KISSIAN
I made a simple test to know if it was possible to speed-up php
performance
by using parallel programming.
[...]
Use :OMP_NUM_THREADS=xx /path_to/php test_file.php
Your test runs a single PHP process. Mind that in a typical deployment
on a server you have quite a few parallel PHP processes already
competing for time on the CPU (when not waiting for IO) a benchmark
should reflect that.
For in_array I'm assuming that often either one or no match exist, thus
in average the old algorithm has to process half the elements in
average. With that form of parallelization it will process C/(N -1) +
C/2N elements where C is the count of elements and N the number of
cores, so in total need more CPU. (might be wrong)
So please run tests with a "typical" application (like wordpress or
such) in a more typical environment.
johannes
Hi Johannes,
thanks for the answer,
-----Message d'origine-----
De : Johannes Schlüter [mailto:johannes@schlueters.de]
Envoyé : vendredi 27 novembre 2015 14:42
À : Pascal KISSIAN
Cc : internals@lists.php.net
Objet : Re: [PHP-DEV] Proof of Concept : 3.5x and more Performance Boost for php7 using 4 cores
Your test runs a single PHP process. Mind that in a typical deployment on a server you have quite a few parallel PHP processes already competing for time on the CPU (when not >waiting for IO) a benchmark should reflect that.
You can see at http://poc.yakpro.com/?php7_performance_boost_parallel_computing ,
in the " Faq: Is it usefull on a heavy traffic web site? " section, a graph that answers partially your question.
For in_array I'm assuming that often either one or no match exist, thus in average the old algorithm has to process half the elements in average. With that form of parallelization it >will process C/(N -1) + C/2N elements where C is the count of elements and N the number of cores, so in total need more CPU. (might be wrong)
In the new algorithm, each core processes C/2N elements in parallel, and when one core has found a result, the process is stopped for the other cores...
so the average cpu need is C/2N * N so exactly the same C/2.
So please run tests with a "typical" application (like wordpress or
such) in a more typical environment.
Do you know a real application that is executing onlyarray_sum()
orin_array()
functions?
If the time of those functions represent only 0.5% of global run time, we will just speed-up those 0.5%....
I have not rewritten all php in parallel...
I just did a "Proof of Concept", to be sure that it is possible to run some php functions in parallel, and this in less time.
And I hope that many people will be interested to start thinking parallel in php, as the next speed improvements in hardware will be the multiplication of cores in a cpu.
Hi Johannes,
thanks for the answer,-----Message d'origine-----
De : Johannes Schlüter [mailto:johannes@schlueters.de]
Envoyé : vendredi 27 novembre 2015 14:42
À : Pascal KISSIAN
Cc : internals@lists.php.net
Objet : Re: [PHP-DEV] Proof of Concept : 3.5x and more Performance Boost for php7 using 4 coresYour test runs a single PHP process. Mind that in a typical deployment on a server you have quite a few parallel PHP processes already competing for time on the CPU (when not >waiting for IO) a benchmark should reflect that.
You can see at http://poc.yakpro.com/?php7_performance_boost_parallel_computing ,
in the " Faq: Is it usefull on a heavy traffic web site? " section, a graph that answers partially your question.
It's no real answer, though. And I still have doubts.
For in_array I'm assuming that often either one or no match exist, thus in average the old algorithm has to process half the elements in average. With that form of parallelization it >will process C/(N -1) + C/2N elements where C is the count of elements and N the number of cores, so in total need more CPU. (might be wrong)
In the new algorithm, each core processes C/2N elements in parallel, and when one core has found a result, the process is stopped for the other cores...
so the average cpu need is C/2N * N so exactly the same C/2.
Thanks.
So please run tests with a "typical" application (like wordpress or
such) in a more typical environment.
Do you know a real application that is executing onlyarray_sum()
orin_array()
functions?
If the time of those functions represent only 0.5% of global run time, we will just speed-up those 0.5%....I have not rewritten all php in parallel...
I just did a "Proof of Concept", to be sure that it is possible to run some php functions in parallel, and this in less time.And I hope that many people will be interested to start thinking parallel in php, as the next speed improvements in hardware will be the multiplication of cores in a cpu.
Well how many parts can one parallelize for real life applications? Also
how many typical calls will hit this optimization (the in_array one for
instance is only used for "larger" arrays)
Mind: The parallel code is harder to debug and as the implementation has
different paths depending on the input data and system configuration
which makes it harder to find issues. So the maintenance cost is quite
high and real life gain therefore has to be relevant.
johannes
Hi Pascal,
-----Original Message-----
From: Pascal KISSIAN [mailto:php-mailing-list@lool.fr]
Sent: Friday, November 27, 2015 11:03 AM
To: internals@lists.php.net
Subject: [PHP-DEV] Proof of Concept : 3.5x and more Performance Boost for
php7 using 4 cores
Importance: HighHi everybody,
I made a simple test to know if it was possible to speed-up php
performance by
using parallel programming.I modified 2 functions within ./ext/standard/array.c file of PHP-7.0.0RC7
to use
openmp and added the -fopenmp flag to the compiler flags in the Makefile.I modified the php functions
array_sum()
andin_array()
to do some
benchmarking.I made changes in the php_search_array() function only using the strict
comparaison.To make it work in all cases, you have just to do some cut'n'paste, but
the
source is easier to read with only 1 change.Use :
OMP_NUM_THREADS=xx /path_to/php test_file.php
to specify xx, the number of threads to use in your test file
I got an average speed-up of 3.5 using 4 cores.
You can find all details:
. Test files
. Modified source files
. Benchmarks
. Faq
. .
At http://poc.yakpro.com/?php7_performance_boost_parallel_computing
My main goal is to initiate a global thinking, among php core developers,
concerning the parallel computing and php, at each level:
Core level: For speeding-up functions, but also the
interpreter
(by parallelizing opcodes pre-fetching for example)
User level: imagine and implement simple and efficient
primitives to make the php developer comfortable with parallel
programming.A multicore php8 or php9? Wouldn't be cool????
This is very interesting research, thanks for that.
IMO the decision whether and how to use parallel computations should to be
moved into the user space. The main tricky part with it is that there are
very specific cases where such computation would really bring a speedup.
Namely - positive result is semi expected on an appropriate amounts of data
and algorithms. Many cases with smaller amounts of data may and probably
will show worse results than the current implementation. For smaller data,
all the setup the system needs to perform will most likely negate the
advantage of the parallelism. Probably it is hard to provide a smart enough
implementation to handle any kind of situation, but giving more power to an
actual programmer to decide could possibly solve it.
It could make sense to at least add some tests with a more variety of data
amount, with small arrays as well.
Regards
Anatol
A multicore php8 or php9? Wouldn't be cool????
Honestly, PHP is a poor language for parallel computing. This is
because PHP is a web-focused language. The the most common setup ends
up with a blocking request on a network call. What we really need is
an improved asynchronous model so that we can wait on those network
requests more efficiently.
A multicore php8 or php9? Wouldn't be cool????
Honestly, PHP is a poor language for parallel computing. This is
because PHP is a web-focused language. The the most common setup ends
up with a blocking request on a network call. What we really need is
an improved asynchronous model so that we can wait on those network
requests more efficiently.
I'd second that. Since all of MY data is stored in a database, requests
to that use other cores on the processor or even other processors, and
so to be able to carry on building other areas of material while waiting
for one or more database or other services to return will be a lot more
productive than trying to do the thread that is waiting for a result
faster? In addition, having several users each getting a single core to
handle their request seems a lot more practical than one request hogging
all of the resources?
There are applications that are suitable for high power parallel
processing, and these are ones where reusing GPU resources which are
inherently parallel data aware is much more practical freeing the MPU
resources to handle threads that are less reliant on parallel data.
Adding extensions such as the Lapack which can be expanded to take
advantage of the available hardware to parallel process the data set is
a lot more practical than trying to make an inherently serial process
'parallel'?.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
A multicore php8 or php9? Wouldn't be cool????
Honestly, PHP is a poor language for parallel computing. This is
because PHP is a web-focused language. The the most common setup ends
up with a blocking request on a network call. What we really need is
an improved asynchronous model so that we can wait on those network
requests more efficiently.I'd second that. Since all of MY data is stored in a database, requests
to that use other cores on the processor or even other processors, and
so to be able to carry on building other areas of material while waiting
for one or more database or other services to return will be a lot more
productive than trying to do the thread that is waiting for a result
faster? In addition, having several users each getting a single core to
handle their request seems a lot more practical than one request hogging
all of the resources?There are applications that are suitable for high power parallel
processing, and these are ones where reusing GPU resources which are
inherently parallel data aware is much more practical freeing the MPU
resources to handle threads that are less reliant on parallel data.Adding extensions such as the Lapack which can be expanded to take
advantage of the available hardware to parallel process the data set is
a lot more practical than trying to make an inherently serial process
'parallel'?.
I fully agree here.
Also PHP exists to serve request as fast as possible. Any tasks being
so expensive that parallelism is required is the best candidate to be
delegated outside the request handling, preferably on another box.
Also an extension fits the main usage for such features, in CLI.
--
Pierre
@pierrejoye | http://www.libgd.org