Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:104830 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 84504 invoked from network); 20 Mar 2019 18:34:29 -0000 Received: from unknown (HELO mail-ot1-f41.google.com) (209.85.210.41) by pb1.pair.com with SMTP; 20 Mar 2019 18:34:29 -0000 Received: by mail-ot1-f41.google.com with SMTP id f10so2485756otb.6 for ; Wed, 20 Mar 2019 08:26:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wikimedia.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=H/rLNbYggr1nbjApeY+VjMhbgKW4c8JbqoL28KkdJvc=; b=Mu1g2Jh7fpChSykQI2FSWvf/mYpUT+KQgprEs6BiYZuyN+d9NIA+4fwl027+5Nog2h /ukE8WLI80sFpCagpcQ7em5V4Ve754OeQLGu/WT7vrMfGeBFLiDnqeHSHGiEqs46t+/Q RmdFbRRyQ6TiSOgGTMv9VFxK7MfPjC7mYqmEc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=H/rLNbYggr1nbjApeY+VjMhbgKW4c8JbqoL28KkdJvc=; b=plwZFGdwZZlsUpf1poL9EEpoOxrdQhXch8ABXE4z4QaFPckYFTBPcUkrAIw4v/UGzq ajauiSCjUjBare/qHctd7pxEQiPZsXaejZmB2Y5SC/7AVSvX2JbEbmLaTYfE4Znirnkd CHhTWKzA4Rktj3hkyLcMY8J6p5/y5YoVvdWK5Iru4Hy0O1tggoUHR9neb3lSXmUzaJRa MOCrTh1v9ICN2fEUE/FLofxZa2nnax+AaU2VyDDePp1y6s8a5Uwd6n/XcyHPl1QqYs5y p5gGG6+Q72bItG3qRui3XyIFk1YkFTSVmYkxOMrep7Z3aCBzU+h/qU3x6+izk4Y52Oew Yn5g== X-Gm-Message-State: APjAAAX2Iy/6zvzRV3ahixxi66/b76kX7AW7VpSPYbiRFww/AzVKXKF9 dbT584BRdRqRIF5GJx9TS/UBaGAgHQEqPad+VOQwlw== X-Google-Smtp-Source: APXvYqx91mWK3kBQ81SJAgv11CjjAGmex3KwxgZQly0FpeQTCYBWvRyoR+Hb2cz2xOTBFoWhPvSpylUntmFSszEm8EQ= X-Received: by 2002:a9d:6946:: with SMTP id p6mr6453132oto.164.1553095583429; Wed, 20 Mar 2019 08:26:23 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Wed, 20 Mar 2019 11:26:12 -0400 Message-ID: To: Nikita Popov Cc: PHP internals Content-Type: multipart/alternative; boundary="0000000000008d903c0584883e74" Subject: Re: [PHP-DEV] PCRE partial matching From: cananian@wikimedia.org ("C. Scott Ananian") --0000000000008d903c0584883e74 Content-Type: text/plain; charset="UTF-8" Looks nice to me. In connection with the PREG_LENGTH_CAPTURE option floated in a previous post, this would easily let the wikimedia/remex-html package parse HTML in a streaming fashion; it would fill up a buffer array and then do an incremental parse, stopping as soon as a (hard) partial match was found, then move the prefix returned to the start of the buffer, wait for the buffer to fill more, then restart. --scott On Wed, Mar 20, 2019 at 5:32 AM Nikita Popov wrote: > Hi internals, > > PCRE has some very nice partial matching functionality described at > https://www.pcre.org/current/doc/html/pcre2partial.html. This is useful > for > streaming processing, as it allows you to distinguish between "there's > definitely no match here" and "this could match starting from position N, > but we need more data to find out". > > Here is a PR to expose this functionality from PHP: > https://github.com/php/php-src/pull/3969 The PR has a basic description of > the API. > > What do you think? > > Nikita > -- (http://cscott.net) --0000000000008d903c0584883e74--