Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:104831 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 87334 invoked from network); 20 Mar 2019 18:43:23 -0000 Received: from unknown (HELO mail-oi1-f177.google.com) (209.85.167.177) by pb1.pair.com with SMTP; 20 Mar 2019 18:43:23 -0000 Received: by mail-oi1-f177.google.com with SMTP id x188so1185741oia.13 for ; Wed, 20 Mar 2019 08:35:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wikimedia.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2fIOUvdVVDeAzCDq7Ay4I+QiA3VJUqs3u/wDYMzJUVE=; b=YAFpDD2N5I1XEApGZB2/GXWYxuSQ/gFb2eBREM775zZpEroClWe3fBdjNNKFx15PpY YEUNPrI4klTbj8iUn9Jp7e8Vd1yLHyVcwrbSUftORqOSyypIpBi7L5DhcT2jG3Cww2t9 M3Uv0GRgen695dE2RvfuoiW82JsndptjkVJME= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2fIOUvdVVDeAzCDq7Ay4I+QiA3VJUqs3u/wDYMzJUVE=; b=aQsPfiGRbatJPxD8cOYFG6u1nVNf0n1kvOfVQxsLIaZcZ8/cRdVc366HK0TD7bxWZu zIcMEdOYN6Qj7jp/O28gAuvsa8bAwkne/VX0J7MieFJNc4nEAZn74cztQ2eMnJLfl9u+ r4UTJ6HIlFEYPfpd5fApGkvm7I3Fi8yoRlfrJ0B1iNHOvnhKqyDQBqH1U3tFq70c7cCP AssDCO5N1+pNcTs6n23k+P6ROM9e/hBLSyIGkL2yD948bBrJzX2WrRIi+DfG1EtbRzzO vpZ+pqSZU6+isGWtydJtjSmR0M8K8QIaiq3TgWakwoXdL6MkXQhfqJstBMJnDPN5Lilr o1RA== X-Gm-Message-State: APjAAAUuHW4et/mkh41YNK6qKjsOiMIMoAjU3gLEzsipG5itQLklovps S71ZrP7tIrfGtV+kIkp1eT5Lqz02SBhNcrmUiJ/BlQ== X-Google-Smtp-Source: APXvYqz+UyX0fcbIs4sb2E2xYMnIoxBS76sj54XBygnTluIobGfj9M/quzq4O0RW+TOxEgAtySD1SsHoPPUJiEBuoeA= X-Received: by 2002:aca:4c88:: with SMTP id z130mr5792278oia.170.1553096118580; Wed, 20 Mar 2019 08:35:18 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Wed, 20 Mar 2019 11:35:07 -0400 Message-ID: To: Nikita Popov Cc: PHP internals Content-Type: multipart/alternative; boundary="0000000000007356c60584885e0b" Subject: Re: [PHP-DEV] Offset-only results from preg_match From: cananian@wikimedia.org ("C. Scott Ananian") --0000000000007356c60584885e0b Content-Type: text/plain; charset="UTF-8" On Tue, Mar 19, 2019 at 10:58 AM Nikita Popov wrote: > After thinking about this some more, while this may be a minor performance > improvement, it still does more work than necessary. In particular the use > of OFFSET_CAPTURE (which would be pretty much required here) needs one new > two-element array for each subpattern. If the captured strings are short, > this is where the main cost is going to be. > The primary use of this feature is when the captured strings are *long*, as that's when we most want to avoid copying a substring. > I'm wondering if we shouldn't consider a new object oriented API for PCRE > which can return a match object where subpattern positions and contents can > be queried via method calls, so you only pay for the parts that you do > access. > Seems like this is letting the perfect be the enemy of the good. The LENGTH_CAPTURE significantly reduces allocation for long match strings, and it allocates the same two-element arrays that OFFSET_CAPTURE would -- it just stores an integer where there would otherwise be an expensive substring. Furthermore, since the array structure is left mostly alone, it would be not-too-hard to support earlier-PHP versions, with something like: $hasLengthCapture = defined('PREG_LENGTH_CAPTURE') ? PREG_LENGTH_CAPTURE : 0; $r = preg_match($pat, $sub, $m, PREG_OFFSET_CAPTURE | $hasLengthCapture); $matchOneLength = $hasLengthCapture ? $m[1][0] : strlen($m[1][0]); $matchOneOffset = $m[1][1]; If you introduce a whole new OO accessor object, it starts becoming very hard to write backward-compatible code. --scott --0000000000007356c60584885e0b--