Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:27663 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 23420 invoked by uid 1010); 25 Jan 2007 19:14:10 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 23405 invoked from network); 25 Jan 2007 19:14:10 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 25 Jan 2007 19:14:10 -0000 Authentication-Results: pb1.pair.com header.from=nlopess@php.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=nlopess@php.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain php.net from 212.55.154.21 cause and error) X-PHP-List-Original-Sender: nlopess@php.net X-Host-Fingerprint: 212.55.154.21 relay1.ptmail.sapo.pt Linux 2.4/2.6 Received: from [212.55.154.21] ([212.55.154.21:45857] helo=sapo.pt) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 55/60-13103-08109B54 for ; Thu, 25 Jan 2007 14:14:09 -0500 Received: (qmail 30883 invoked from network); 25 Jan 2007 19:14:01 -0000 Received: from unknown (HELO sapo.pt) (10.134.35.208) by relay1 with SMTP; 25 Jan 2007 19:14:01 -0000 Received: (qmail 18086 invoked from network); 25 Jan 2007 19:13:58 -0000 X-AntiVirus: PTMail-AV 0.3-0.88.6 X-Virus-Status: Clean (0.03703 seconds) Received: from unknown (HELO pc07653) (nunoplopes@sapo.pt@[82.155.75.146]) (envelope-sender ) by mta13 (qmail-ldap-1.03) with SMTP for ; 25 Jan 2007 19:13:58 -0000 Message-ID: <002c01c740b4$f926d0b0$0100a8c0@pc07653> To: "Andrei Zmievski" Cc: "Ilia Alshanetsky" , "Pierre" , References: <0F741213-BCA4-4923-A83A-3E4E9C561DAE@prohost.org> <45B897E5.40007@zend.com> <41936.195.22.180.233.1169730121.squirrel@avilys.eik.lt> <45B8B2E5.4010204@zend.com> <40869.195.22.180.233.1169733866.squirrel@avilys.eik.lt> <3ED37F9A-9BC8-4BBA-BB85-77BB0B188074@prohost.org> <000b01c74090$60bce950$0100a8c0@pc07653> <017A7F13-255C-4C7E-B22F-7481CCE07BAB@prohost.org> <000a01c74093$b03dd180$0100a8c0@pc07653> <0EFF1969-038A-4F67-872C-674B99E75009@prohost.org> <6b4d01c77cd1c8ca09b68d822bcd1f15@gravitonic.com> <004e01c740a3$599a2670$0100a8c0@pc07653> <3d6fe7502ae17ed621fe251baeb4403b@gravitonic.com> Date: Thu, 25 Jan 2007 19:14:00 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3028 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Subject: Re: [PHP-DEV] Re: PHP 5.2.1RC3 Released From: nlopess@php.net ("Nuno Lopes") > I've been thinking about how to not force UTF-8 in PCRE for PHP 6, and > it's not that simple. This is mainly due to preg_replace(), because it > allows array() parameters that can contain mixed IS_UNICODE and IS_STRING > values. I hope you realize though, that in UTF-8 mode PCRE does not care > about POSIX locales, even in PHP 5. I haven't though on that, but can't you simply reject mixing of unicode and binary strings? > By the way, I think ICU regexp extension, when implemented, will let you > match Portuguese characters in UTF-8 strings. I wasn't aware of that API.. anyway it is probably slower than pcre+locales (because it uses unicode propertie table lookups) > Yes, UTF-8 covers many aspects but does it know about words, white > spaces (not sure if ws are always the same) and other locale specific > issues? generally, not only pcre. Maybe it is more something for ICU > directly, as you said later in this thread. That's not really a problem with pcre, as it supports unicode character properties. It isn't documented in phpdoc (don't look at me :P), but it looks like: \pL where L is one of (from http://pcre.org/pcre.txt): L Letter Ll Lower case letter N Number Nd Decimal number Nl Letter number No Other number P Punctuation Zs Space separator (...) Nuno