Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:85809 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 97819 invoked from network); 14 Apr 2015 18:23:58 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Apr 2015 18:23:58 -0000 Authentication-Results: pb1.pair.com header.from=medinsky94@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=medinsky94@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.218.54 as permitted sender) X-PHP-List-Original-Sender: medinsky94@gmail.com X-Host-Fingerprint: 209.85.218.54 mail-oi0-f54.google.com Received: from [209.85.218.54] ([209.85.218.54:34660] helo=mail-oi0-f54.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 99/E6-47925-C3B5D255 for ; Tue, 14 Apr 2015 14:23:57 -0400 Received: by oiko83 with SMTP id o83so8275848oik.1 for ; Tue, 14 Apr 2015 11:23:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=eGW7vQA7Dk3FvjN2kjGkfHf4BmBlAfDW/BKWe0AtFIc=; b=rLuM0OtGLXT543Y9VMax3iPII7XZrkb+bZJd6iKw31n6MXlfkqpLA+Hy9K08/eqTZB xBxBFV+pXJQlHNgbk+NsRrc3ks6lWNeXyk+Atfv+0AoU9sP+f3hs7hKBWc/hGRVLuJL4 AGKCkyDEzHJAh8SUcSfrhwn5HA5J3bGYo6TcSyrPQ4+Pzsvf+5jpt5OyPLwNMEFWD1m/ 434SSGmjmMge8pGsMhdBPhOKHUm3Fc/shPokAeJW2R0NjUNNt8mTWPBozCH3Yr2adWzz pRIpdM/+B0am+aWZ7mf3CzIFMra5JzcGHWTLt3fWdJX4UbGCkb2AO7wHKJ7TSpF0ynuL fjiQ== MIME-Version: 1.0 X-Received: by 10.202.208.1 with SMTP id h1mr12211081oig.74.1429035833685; Tue, 14 Apr 2015 11:23:53 -0700 (PDT) Received: by 10.202.85.16 with HTTP; Tue, 14 Apr 2015 11:23:53 -0700 (PDT) In-Reply-To: References: Date: Tue, 14 Apr 2015 11:23:53 -0700 Message-ID: To: internals@lists.php.net Content-Type: multipart/alternative; boundary=001a113dd2543cf9af0513b35598 Subject: Re: Potential binary search optimization or feature From: medinsky94@gmail.com (Danylo Medinsky) --001a113dd2543cf9af0513b35598 Content-Type: text/plain; charset=UTF-8 Thanks for the feedback and apologies for the late response. Since my posting and your responses, I began implementing the binary search into array.c. After a couple days of coding and research, I managed to produce this code. HashTable *arr_hash; arr_hash = Z_ARRVAL_P(array); int low=0; int high=zend_hash_num_elements(arr_hash); int mid; zval res; while(low<=high){ mid=(low+high)/2; ZVAL_DEREF(entry); entry = zend_hash_index_find(arr_hash, mid); compare_function(&res,entry,value); //php_printf("%ld\n", Z_LVAL(res)); if(Z_LVAL(res) > 0){ high=mid-1; }else if(Z_LVAL(res) < 0){ low=mid+1; }else{ RETURN_TRUE; } } RETURN_FALSE; After benchmarking this code, I managed to conclude that this binary search algorithm preforms vastly quicker then the php_search_array function. Based on these results I believe that this algorithm would be a good optimization or feature to add to PHP. Based on the feedback I have received from you guys, I am holding off the sort flag in the binary search function parameters until I can determine how much performance will be compromised from the sorting. At the moment I am trying to figure out if my function would be considered a minor or major chance since if it is a major/large change, I will require a RFC in order to submit a pull request. If anybody has any additional feedback or recommendations, feel free to reply to this post. On 29 March 2015 at 23:24, Danylo Medinsky wrote: > Recently, as part of school course on optimization, I have identified a > potential optimization or feature for PHP. After looking through the > array.c file(located in etc/standard in the PHP source code), I noticed > that both the in_array and array_search are using the linear search C > function php_search_array. The potential issue I see is that even though > the linear search algorithm does not drastically effect performance when > searching a relatively small array, it may slow performance down when > iterating through a very large array(couple of million indexes). > > To back this fact up I preformed some benchmarks using a PHP script that > measures the time it takes to preform a linear search and binary search. > The benchmark was in favor of the binary search as it found its target many > times quicker compared to the linear search. > > After researching more about different PHP search functions, I was not > able to find one that uses a binary search algorithm. Based on these > factors I believe that a binary search implementation into PHP will prove > useful in certain situations and would like to implement that feature. > > At the moment I am debating between adding a separate binary search based > function into array.c or modifying the current array_search_array function > to utilize a binary search algorithm. Since the requirement for a binary > search is that the array being searched is sorted, I thought is would be a > good idea to have a Boolean parameter in the binary search function to > specify if the input array should be sorted or not. something among the > lines of "function binary_search(string target, array arrayToSearch, bool > sort)". Doing this, the function can preform a sort if required to, thus > not needing to sort a array that is already sorted. > > Before I can begin the implementation and create a pull request, I would > like to know the PHP's community feedback on my potential contribution and > if this is a wise feature to change or implement. > > > --001a113dd2543cf9af0513b35598--