Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:104841 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 55509 invoked from network); 21 Mar 2019 14:43:43 -0000 Received: from unknown (HELO mail-it1-f176.google.com) (209.85.166.176) by pb1.pair.com with SMTP; 21 Mar 2019 14:43:43 -0000 Received: by mail-it1-f176.google.com with SMTP id v8so2258829itf.0 for ; Thu, 21 Mar 2019 04:35:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gvCdOx37mxmxhG0NdeDU58nHWZoXG3HiBh0ACd7RXVM=; b=Q3r/IAwBqwg92+Kd8WSDiFViGNFk2de6JpXfLyBaM+GtiYTIebZJsknPZ6SEMJtuw2 D0PZK2Pfk7Bd9+j17tmKDJ1OMl683g+6OJLleqTJMQjk7SxNwyMhTiGOLMD+JpDHwenR Bf15hvCr8zcO1MZUQpaTPDNeC0MXuJQwwDEG2W7YyQT4v+XZ9O85zXkJyXS5FYkQMXM1 L1rXlFA+rKiXcropLZA1gPcoa4MXZEsFF7QI2uNTAVTULWR9hgT0oVdHv+cpKXbWXls4 dnkZxfblS2hOLbhVbvtZlssNG+XyyFC8/CNVGx2OToYVPhCotIy39pQ96ymSHHvYyap5 MRdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gvCdOx37mxmxhG0NdeDU58nHWZoXG3HiBh0ACd7RXVM=; b=jsUReMYTf4VnUFl74j598Gy3IgYCTEmSuVKoMIzE/DuHPf87ZmhonvkAwSb9nrg681 Enr5gdHadJ8uSBB47AoHaTIhohiGNNSEx13WlwXOw734LJjd2jP1VguR47JEjKwrUYv5 sTrU4GJeUdsfGjcXZdunCI7qNm6nHP7D7h+ezs9avgWbwvt3dgqswi3BTd8wFuW47+dT NPDHssHwiDTSXyLbMNT8islaBdXTECgX+nAv4w5NeUJ8UIZuz1skEjfd/aRZfqcgAFde AGlezlCjAKK3KO073EVAxOS549rd4d0XtmY7bdvcQV1sy3ra+q1hQTPkBDuvRGIJi2im DPtg== X-Gm-Message-State: APjAAAVvv8R2r6eVfQ6hnXaX/nFGEQI8rkHRZrDmxjj4mL8a2HVRIzgp 62GKq+qBFvCgYLtPcBxNexyY4hw2NYpgIMHEoKI= X-Google-Smtp-Source: APXvYqzs2tH8+1GkVD6totBcH3rfiNiBpkH4KUj+CT274LYyBl4MAV2pjr0pCxjIYz4zYVRx0JKG5yLoEwLbdjVtIiQ= X-Received: by 2002:a24:78ca:: with SMTP id p193mr2201050itc.27.1553168150800; Thu, 21 Mar 2019 04:35:50 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Thu, 21 Mar 2019 12:35:33 +0100 Message-ID: To: "C. Scott Ananian" Cc: PHP internals Content-Type: multipart/alternative; boundary="000000000000e7acfc058499232a" Subject: Re: [PHP-DEV] Offset-only results from preg_match From: nikita.ppv@gmail.com (Nikita Popov) --000000000000e7acfc058499232a Content-Type: text/plain; charset="UTF-8" On Wed, Mar 20, 2019 at 4:35 PM C. Scott Ananian wrote: > On Tue, Mar 19, 2019 at 10:58 AM Nikita Popov > wrote: > >> After thinking about this some more, while this may be a minor >> performance improvement, it still does more work than necessary. In >> particular the use of OFFSET_CAPTURE (which would be pretty much required >> here) needs one new two-element array for each subpattern. If the captured >> strings are short, this is where the main cost is going to be. >> > > The primary use of this feature is when the captured strings are *long*, > as that's when we most want to avoid copying a substring. > > >> I'm wondering if we shouldn't consider a new object oriented API for PCRE >> which can return a match object where subpattern positions and contents can >> be queried via method calls, so you only pay for the parts that you do >> access. >> > > Seems like this is letting the perfect be the enemy of the good. The > LENGTH_CAPTURE significantly reduces allocation for long match strings, and > it allocates the same two-element arrays that OFFSET_CAPTURE would -- it > just stores an integer where there would otherwise be an expensive > substring. Furthermore, since the array structure is left mostly alone, it > would be not-too-hard to support earlier-PHP versions, with something like: > > $hasLengthCapture = defined('PREG_LENGTH_CAPTURE') ? PREG_LENGTH_CAPTURE : > 0; > $r = preg_match($pat, $sub, $m, PREG_OFFSET_CAPTURE | $hasLengthCapture); > $matchOneLength = $hasLengthCapture ? $m[1][0] : strlen($m[1][0]); > $matchOneOffset = $m[1][1]; > > If you introduce a whole new OO accessor object, it starts becoming very > hard to write backward-compatible code. > --scott > Fair enough. I've created https://github.com/php/php-src/pull/3971 to implement this feature. It would be good to have some confirmation that this is really a significant performance improvement before we land it though. Nikita --000000000000e7acfc058499232a--