Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:85544 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 32796 invoked from network); 30 Mar 2015 09:12:49 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 Mar 2015 09:12:49 -0000 Authentication-Results: pb1.pair.com header.from=nikita.ppv@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.212.181 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@gmail.com X-Host-Fingerprint: 209.85.212.181 mail-wi0-f181.google.com Received: from [209.85.212.181] ([209.85.212.181:36930] helo=mail-wi0-f181.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id C9/30-31096-09319155 for ; Mon, 30 Mar 2015 04:12:48 -0500 Received: by wiaa2 with SMTP id a2so119345637wia.0 for ; Mon, 30 Mar 2015 02:12:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=jmDhwdzQAfqs5Bpz2U3ZP6HaEB7VoBh6jBB17Vfcko8=; b=F0GOHnEaxzHJKuLmQrPBMNn+IYfKo1IO5t7v/xHHS/rjfGH1iMvCXMPx0XVP5cjIW9 OX0Zl/DmDz6JNTjY0P8MmKr2LiMIPwfUZRFxkBIwhQykkwhmQQfBzvh3bTaEQbONrbWW uxY1CgAktwogVsJXV+f4XZDBTx3iOVxGBUysbB95zGGbA3C4WSSOJ+U7vmzaeFO9YXvF zN9Vyfx318Y1lVPHrMZz+OoE/ez46GN2dJAP5GDX8g7ksVIbt2y/AP7jFGVK2Q93FjKh 0zIcFs+dbx8AbXzZXG3pRPqCVRHmONhTv5KrX48xxTfF+FQbQkiPqwDPomQ2TnagcCZB oFdw== MIME-Version: 1.0 X-Received: by 10.180.218.200 with SMTP id pi8mr20682527wic.71.1427706765197; Mon, 30 Mar 2015 02:12:45 -0700 (PDT) Received: by 10.27.85.216 with HTTP; Mon, 30 Mar 2015 02:12:45 -0700 (PDT) In-Reply-To: References: Date: Mon, 30 Mar 2015 11:12:45 +0200 Message-ID: To: Danylo Medinsky Cc: PHP internals Content-Type: multipart/alternative; boundary=001a1135e4629546fb05127de256 Subject: Re: [PHP-DEV] Potential binary search optimization or feature From: nikita.ppv@gmail.com (Nikita Popov) --001a1135e4629546fb05127de256 Content-Type: text/plain; charset=UTF-8 On Mon, Mar 30, 2015 at 8:24 AM, Danylo Medinsky wrote: > Recently, as part of school course on optimization, I have identified a > potential optimization or feature for PHP. After looking through the > array.c file(located in etc/standard in the PHP source code), I noticed > that both the in_array and array_search are using the linear search C > function php_search_array. The potential issue I see is that even though > the linear search algorithm does not drastically effect performance when > searching a relatively small array, it may slow performance down when > iterating through a very large array(couple of million indexes). > > To back this fact up I preformed some benchmarks using a PHP script that > measures the time it takes to preform a linear search and binary search. > The benchmark was in favor of the binary search as it found its target many > times quicker compared to the linear search. > > After researching more about different PHP search functions, I was not able > to find one that uses a binary search algorithm. Based on these factors I > believe that a binary search implementation into PHP will prove useful in > certain situations and would like to implement that feature. > > At the moment I am debating between adding a separate binary search based > function into array.c or modifying the current array_search_array function > to utilize a binary search algorithm. Since the requirement for a binary > search is that the array being searched is sorted, I thought is would be a > good idea to have a Boolean parameter in the binary search function to > specify if the input array should be sorted or not. something among the > lines of "function binary_search(string target, array arrayToSearch, bool > sort)". Doing this, the function can preform a sort if required to, thus > not needing to sort a array that is already sorted. > > Before I can begin the implementation and create a pull request, I would > like to know the PHP's community feedback on my potential contribution and > if this is a wise feature to change or implement. > If you need to find an element in an array you should make the lookup key based. I.e. you save elements as $array[$value] = true and then check isset($array[$value]). This operation is O(1), as opposed to O(log n) for binary search and O(n) for linear search. Using binary search in PHP only makes sense if the elements in the array cannot be used as keys - where the only reasonable use case I see are objects, which at least currently do not support comparison operator overloading, so a binary search implementation would have to accept a comparator function. Also the concept of adding a $sort parameter to a binsearch function doesn't make much sense to me. Sorting the array is more expensive than doing a linear search. Nikita --001a1135e4629546fb05127de256--