Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:4825 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 13822 invoked by uid 1010); 14 Oct 2003 14:47:23 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 13787 invoked from network); 14 Oct 2003 14:47:23 -0000 Received: from unknown (HELO david.home) (66.65.53.174) by pb1.pair.com with SMTP; 14 Oct 2003 14:47:23 -0000 Received: from Spooler by david.home (Mercury/32 v3.32) ID MO000044; 14 Oct 03 10:47:23 -0400 Received: from spooler by david.home (Mercury/32 v3.32); 14 Oct 03 10:47:20 -0400 Received: from David (127.0.0.1) by david.home (Mercury/32 v3.32) ID MG000043; 14 Oct 03 10:47:19 -0400 To: Date: Tue, 14 Oct 2003 10:47:19 -0400 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal Subject: regex operators From: sklar@sklar.com ("David Sklar") I was thinking about adding one or two regex-related features to the engine: 1. "preg_case": this would behave just like case but instead of doing an equality comparison, would match against a regular expression, e.g. switch($data) { preg_case '/^\d{5}(-\d{4})?$/': print "US Postal Code"; break; preg_case '/^[a-z]\d[a-z] ?\d[a-z]\d$/i'; print "Canadian Postal Code"; break; default: print "something else!"; } Where should any captured subpatterns go? 2. A regex match operator that returns an array containing subpatterns. If there is no match against the regex, then an empty array (or just false?) would be returned. if ($data =~ '/^(\d{5})(-\d{4})?$/') { print "The whole postal code is $data[0]."; print "The first five digits is $data[1]."; if ($data[2]) { print "The ZIP+4 is $data[2].";} } Some issues with adding these features: - It creates an engine dependency on the PCRE library. - There would have to be some new opcodes and parser tokens - Ideally the code that implements these operators could share as much as possible with what's already been done in the PCRE extension -- is that possible? Comments? Thanks, David