Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:45147 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 91242 invoked from network); 30 Jul 2009 16:13:21 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 Jul 2009 16:13:21 -0000 Authentication-Results: pb1.pair.com smtp.mail=gwynne@darkrainfall.org; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=gwynne@darkrainfall.org; sender-id=unknown Received-SPF: error (pb1.pair.com: domain darkrainfall.org from 208.97.132.207 cause and error) X-PHP-List-Original-Sender: gwynne@darkrainfall.org X-Host-Fingerprint: 208.97.132.207 caiajhbdccah.dreamhost.com Received: from [208.97.132.207] ([208.97.132.207:57360] helo=homiemail-a3.g.dreamhost.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id C8/62-03589-F96C17A4 for ; Thu, 30 Jul 2009 12:13:20 -0400 Received: from Moonstar.home (pool-71-174-84-161.bstnma.fios.verizon.net [71.174.84.161]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by homiemail-a3.g.dreamhost.com (Postfix) with ESMTP id 6E5D9C5B09; Thu, 30 Jul 2009 09:13:16 -0700 (PDT) Cc: PHP Internals List Message-ID: <20467B1A-8A7F-455E-B27D-FCAFD2E65C8A@darkrainfall.org> To: Stefan Walk In-Reply-To: <3B2C180E-2168-49D3-BE49-5B80346ED868@fuer-et.de> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Thu, 30 Jul 2009 12:13:14 -0400 References: <4A6C6496.7060603@mozo.jp> <20090730120535.7AF5.627AFB7B@blueyonder.co.uk> <3B2C180E-2168-49D3-BE49-5B80346ED868@fuer-et.de> X-Mailer: Apple Mail (2.935.3) Subject: Re: [PHP-DEV] Alternative mbstring implementation using ICU From: gwynne@darkrainfall.org (Gwynne Raskind) On Jul 30, 2009, at 11:27 AM, Stefan Walk wrote: >> and that there's nothing you can do with Oniguruma that you can't >> also practically do with PCRE (to the best of my knowledge), > http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt - Paragraph 8, > example 2 - specifying the nest level for subroutines/back > references doesn't work with pcre. That's one example I know from > the top of my head ... Granted :). But it would be possible to rewrite that example into a PCRE regexp, just a bit less efficiently. Subroutine calls *are* one thing I like about Oniguruma, but not enough to counter all the other arguments in favor of PCRE. Honestly, that sort of extension to the syntax verges on kicking regexp out of the domain it was intended for; function calls are verging on making regexp trivially Turing-complete. If you need that level of matching, it's probably time to consider more aggressive string parsing methodology, such as tokenization. Regexp is basically a VM (at least as most implementations handle it), and on nontrivial patterns will often be slower than other methods. Certainly that particular example, matching an XML tag, will never deal with all the esoteric cases. You can approach perfect matching as pattern complexity approaches infinite (i.e. lim complexity->inf correctness = inf), but the pattern is only realistically useful in specific cases where you can be guaranteed a valid input stream. And supposing it doesn't match, there's no reasonable way to determine *why*. Also, more to the point, Oniguruma hasn't been updated in two years and counting, which is enough to count it as unmaintained. PCRE is still seeing releases on a regular basis. -- Gwynne