Recently, as part of school course on optimization, I have identified a
potential optimization or feature for PHP. After looking through the
array.c file(located in etc/standard in the PHP source code), I noticed
that both the in_array and array_search are using the linear search C
function php_search_array. The potential issue I see is that even though
the linear search algorithm does not drastically effect performance when
searching a relatively small array, it may slow performance down when
iterating through a very large array(couple of million indexes).
To back this fact up I preformed some benchmarks using a PHP script that
measures the time it takes to preform a linear search and binary search.
The benchmark was in favor of the binary search as it found its target many
times quicker compared to the linear search.
After researching more about different PHP search functions, I was not able
to find one that uses a binary search algorithm. Based on these factors I
believe that a binary search implementation into PHP will prove useful in
certain situations and would like to implement that feature.
At the moment I am debating between adding a separate binary search based
function into array.c or modifying the current array_search_array function
to utilize a binary search algorithm. Since the requirement for a binary
search is that the array being searched is sorted, I thought is would be a
good idea to have a Boolean parameter in the binary search function to
specify if the input array should be sorted or not. something among the
lines of "function binary_search(string target, array arrayToSearch, bool
sort)". Doing this, the function can preform a sort if required to, thus
not needing to sort a array that is already sorted.
Before I can begin the implementation and create a pull request, I would
like to know the PHP's community feedback on my potential contribution and
if this is a wise feature to change or implement.
On Mon, Mar 30, 2015 at 8:24 AM, Danylo Medinsky medinsky94@gmail.com
wrote:
Recently, as part of school course on optimization, I have identified a
potential optimization or feature for PHP. After looking through the
array.c file(located in etc/standard in the PHP source code), I noticed
that both the in_array and array_search are using the linear search C
function php_search_array. The potential issue I see is that even though
the linear search algorithm does not drastically effect performance when
searching a relatively small array, it may slow performance down when
iterating through a very large array(couple of million indexes).To back this fact up I preformed some benchmarks using a PHP script that
measures the time it takes to preform a linear search and binary search.
The benchmark was in favor of the binary search as it found its target many
times quicker compared to the linear search.After researching more about different PHP search functions, I was not able
to find one that uses a binary search algorithm. Based on these factors I
believe that a binary search implementation into PHP will prove useful in
certain situations and would like to implement that feature.At the moment I am debating between adding a separate binary search based
function into array.c or modifying the current array_search_array function
to utilize a binary search algorithm. Since the requirement for a binary
search is that the array being searched is sorted, I thought is would be a
good idea to have a Boolean parameter in the binary search function to
specify if the input array should be sorted or not. something among the
lines of "function binary_search(string target, array arrayToSearch, bool
sort)". Doing this, the function can preform a sort if required to, thus
not needing to sort a array that is already sorted.Before I can begin the implementation and create a pull request, I would
like to know the PHP's community feedback on my potential contribution and
if this is a wise feature to change or implement.
If you need to find an element in an array you should make the lookup key
based. I.e. you save elements as $array[$value] = true and then check
isset($array[$value]). This operation is O(1), as opposed to O(log n) for
binary search and O(n) for linear search.
Using binary search in PHP only makes sense if the elements in the array
cannot be used as keys - where the only reasonable use case I see are
objects, which at least currently do not support comparison operator
overloading, so a binary search implementation would have to accept a
comparator function.
Also the concept of adding a $sort parameter to a binsearch function
doesn't make much sense to me. Sorting the array is more expensive than
doing a linear search.
Nikita
Thanks for the feedback and apologies for the late response.
Since my posting and your responses, I began implementing the binary search
into array.c. After a couple days of coding and research, I managed to
produce this code.
HashTable *arr_hash;
arr_hash = Z_ARRVAL_P(array);
int low=0;
int high=zend_hash_num_elements(arr_hash);
int mid;
zval res;
while(low<=high){
mid=(low+high)/2;
ZVAL_DEREF(entry);
entry = zend_hash_index_find(arr_hash, mid);
compare_function(&res,entry,value);
//php_printf("%ld\n", Z_LVAL(res));
if(Z_LVAL(res) > 0){
high=mid-1;
}else if(Z_LVAL(res) < 0){
low=mid+1;
}else{
RETURN_TRUE;
}
}
RETURN_FALSE;
After benchmarking this code, I managed to conclude that this binary search
algorithm preforms vastly quicker then the php_search_array function. Based
on these results I believe that this algorithm would be a good optimization
or feature to add to PHP. Based on the feedback I have received from you
guys, I am holding off the sort flag in the binary search function
parameters until I can determine how much performance will be compromised
from the sorting.
At the moment I am trying to figure out if my function would be considered
a minor or major chance since if it is a major/large change, I will require
a RFC in order to submit a pull request.
If anybody has any additional feedback or recommendations, feel free to
reply to this post.
On Mon, Mar 30, 2015 at 8:24 AM, Danylo Medinsky medinsky94@gmail.com
wrote:Recently, as part of school course on optimization, I have identified a
potential optimization or feature for PHP. After looking through the
array.c file(located in etc/standard in the PHP source code), I noticed
that both the in_array and array_search are using the linear search C
function php_search_array. The potential issue I see is that even though
the linear search algorithm does not drastically effect performance when
searching a relatively small array, it may slow performance down when
iterating through a very large array(couple of million indexes).To back this fact up I preformed some benchmarks using a PHP script that
measures the time it takes to preform a linear search and binary search.
The benchmark was in favor of the binary search as it found its target
many
times quicker compared to the linear search.After researching more about different PHP search functions, I was not
able
to find one that uses a binary search algorithm. Based on these factors I
believe that a binary search implementation into PHP will prove useful in
certain situations and would like to implement that feature.At the moment I am debating between adding a separate binary search based
function into array.c or modifying the current array_search_array function
to utilize a binary search algorithm. Since the requirement for a binary
search is that the array being searched is sorted, I thought is would be a
good idea to have a Boolean parameter in the binary search function to
specify if the input array should be sorted or not. something among the
lines of "function binary_search(string target, array arrayToSearch, bool
sort)". Doing this, the function can preform a sort if required to, thus
not needing to sort a array that is already sorted.Before I can begin the implementation and create a pull request, I would
like to know the PHP's community feedback on my potential contribution and
if this is a wise feature to change or implement.If you need to find an element in an array you should make the lookup key
based. I.e. you save elements as $array[$value] = true and then check
isset($array[$value]). This operation is O(1), as opposed to O(log n) for
binary search and O(n) for linear search.Using binary search in PHP only makes sense if the elements in the array
cannot be used as keys - where the only reasonable use case I see are
objects, which at least currently do not support comparison operator
overloading, so a binary search implementation would have to accept a
comparator function.Also the concept of adding a $sort parameter to a binsearch function
doesn't make much sense to me. Sorting the array is more expensive than
doing a linear search.Nikita
Hi!
to utilize a binary search algorithm. Since the requirement for a binary
search is that the array being searched is sorted, I thought is would be a
good idea to have a Boolean parameter in the binary search function to
specify if the input array should be sorted or not. something among the
Sorting array takes at least O(n*log n) (well, with comparison sort, but
that's what you'd have to use I assume), while linear search can be done
in O(n). So no profit here.
If you know the array is sorted, binary search function could be
useful, but I'm not sure how wide the use case for it would be.
lines of "function binary_search(string target, array arrayToSearch, bool
sort)". Doing this, the function can preform a sort if required to, thus
not needing to sort a array that is already sorted.
Array function's name would have to start with array_ prefix.
--
Stas Malyshev
smalyshev@gmail.com
Thanks for the feedback and apologies for the late response.
Since my posting and your responses, I began implementing the binary search
into array.c. After a couple days of coding and research, I managed to
produce this code.
HashTable *arr_hash;
arr_hash = Z_ARRVAL_P(array);
int low=0;
int high=zend_hash_num_elements(arr_hash);
int mid;
zval res;
while(low<=high){
mid=(low+high)/2;
ZVAL_DEREF(entry);
entry = zend_hash_index_find(arr_hash, mid);
compare_function(&res,entry,value);
//php_printf("%ld\n", Z_LVAL(res));
if(Z_LVAL(res) > 0){
high=mid-1;
}else if(Z_LVAL(res) < 0){
low=mid+1;
}else{
RETURN_TRUE;
}
}
RETURN_FALSE;
After benchmarking this code, I managed to conclude that this binary search
algorithm preforms vastly quicker then the php_search_array function. Based
on these results I believe that this algorithm would be a good optimization
or feature to add to PHP. Based on the feedback I have received from you
guys, I am holding off the sort flag in the binary search function
parameters until I can determine how much performance will be compromised
from the sorting.
At the moment I am trying to figure out if my function would be considered
a minor or major chance since if it is a major/large change, I will require
a RFC in order to submit a pull request.
If anybody has any additional feedback or recommendations, feel free to
reply to this post.
Recently, as part of school course on optimization, I have identified a
potential optimization or feature for PHP. After looking through the
array.c file(located in etc/standard in the PHP source code), I noticed
that both the in_array and array_search are using the linear search C
function php_search_array. The potential issue I see is that even though
the linear search algorithm does not drastically effect performance when
searching a relatively small array, it may slow performance down when
iterating through a very large array(couple of million indexes).To back this fact up I preformed some benchmarks using a PHP script that
measures the time it takes to preform a linear search and binary search.
The benchmark was in favor of the binary search as it found its target many
times quicker compared to the linear search.After researching more about different PHP search functions, I was not
able to find one that uses a binary search algorithm. Based on these
factors I believe that a binary search implementation into PHP will prove
useful in certain situations and would like to implement that feature.At the moment I am debating between adding a separate binary search based
function into array.c or modifying the current array_search_array function
to utilize a binary search algorithm. Since the requirement for a binary
search is that the array being searched is sorted, I thought is would be a
good idea to have a Boolean parameter in the binary search function to
specify if the input array should be sorted or not. something among the
lines of "function binary_search(string target, array arrayToSearch, bool
sort)". Doing this, the function can preform a sort if required to, thus
not needing to sort a array that is already sorted.Before I can begin the implementation and create a pull request, I would
like to know the PHP's community feedback on my potential contribution and
if this is a wise feature to change or implement.
Am 14.04.2015 20:24 schrieb "Danylo Medinsky" medinsky94@gmail.com:
... until I can determine how much performance will be compromised
from the sorting.
Sorting at least has to look at each array element once, and execute the
comparison function once. Compare that to a searching scan, which can
terminate once it finds the element it is looking for.
You would need to remember that the array is already sorted, and make
several binary searches (more than log(count(array)), to get a win. But
with an arbitrary comparison function you cannot remember whether the array
is sorted, because it is only sorted wrt. that specific comparison function.
The only scenario where sort-first-then-search can make sense, is when you
have to search for several items "in parallel" under a given comparison
function, and your number of items to search for is clearly larger than
log(count(array)).
Patrick
Hi Danylo,
On Wed, Apr 15, 2015 at 3:23 AM, Danylo Medinsky medinsky94@gmail.com
wrote:
Thanks for the feedback and apologies for the late response.
Since my posting and your responses, I began implementing the binary search
into array.c. After a couple days of coding and research, I managed to
produce this code.HashTable *arr_hash; arr_hash = Z_ARRVAL_P(array); int low=0; int high=zend_hash_num_elements(arr_hash); int mid; zval res; while(low<=high){ mid=(low+high)/2; ZVAL_DEREF(entry); entry = zend_hash_index_find(arr_hash, mid); compare_function(&res,entry,value); //php_printf("%ld\n", Z_LVAL(res)); if(Z_LVAL(res) > 0){ high=mid-1; }else if(Z_LVAL(res) < 0){ low=mid+1; }else{ RETURN_TRUE; } } RETURN_FALSE;
After benchmarking this code, I managed to conclude that this binary search
algorithm preforms vastly quicker then the php_search_array function. Based
on these results I believe that this algorithm would be a good optimization
or feature to add to PHP. Based on the feedback I have received from you
guys, I am holding off the sort flag in the binary search function
parameters until I can determine how much performance will be compromised
from the sorting.At the moment I am trying to figure out if my function would be considered
a minor or major chance since if it is a major/large change, I will require
a RFC in order to submit a pull request.If anybody has any additional feedback or recommendations, feel free to
reply to this post.
Since array has state, making it a class makes sense.
It would be better if it implemented as array class. You can keep
sorted state if you use class.
Natural place would be SplArray, but the implementation is overly
complex...
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net