Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:31042 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 64167 invoked by uid 1010); 18 Jul 2007 09:43:03 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 64152 invoked from network); 18 Jul 2007 09:43:03 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 18 Jul 2007 09:43:03 -0000 Authentication-Results: pb1.pair.com smtp.mail=rasmus@lerdorf.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=rasmus@lerdorf.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lerdorf.com from 204.11.219.139 cause and error) X-PHP-List-Original-Sender: rasmus@lerdorf.com X-Host-Fingerprint: 204.11.219.139 mail.lerdorf.com Received: from [204.11.219.139] ([204.11.219.139:41278] helo=mail.lerdorf.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 79/E8-18661-5A0ED964 for ; Wed, 18 Jul 2007 05:43:03 -0400 Received: from trainburn-lm-corp-yahoo-com.local (user-11fad8l.dsl.mindspring.com [66.245.53.21]) (authenticated bits=0) by mail.lerdorf.com (8.14.1/8.14.1/Debian-7) with ESMTP id l6I9gtSZ005949 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 18 Jul 2007 02:42:57 -0700 Message-ID: <469DE09F.9080509@lerdorf.com> Date: Wed, 18 Jul 2007 02:42:55 -0700 User-Agent: Thunderbird 2.0.0.4 (Macintosh/20070604) MIME-Version: 1.0 To: Derick Rethans CC: PHP Developers Mailing List References: <698DE66518E7CA45812BD18E807866CE648191@us-ex1.zend.net> <54C4340A-D9EA-4B5A-B39C-B55B29B1B3BC@prohost.org> <698DE66518E7CA45812BD18E807866CE648193@us-ex1.zend.net> <469B7FB1.1070507@pooteeweet.org> <698DE66518E7CA45812BD18E807866CE648290@us-ex1.zend.net> <7.0.1.0.2.20070718023255.0dc0eed0@zend.com> In-Reply-To: X-Enigmail-Version: 0.95.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.1/3692/Wed Jul 18 00:39:32 2007 on colo.lerdorf.com X-Virus-Status: Clean Subject: Re: [PHP-DEV] POSIX regex From: rasmus@lerdorf.com (Rasmus Lerdorf) Derick Rethans wrote: > Regarding the unicode on/off modes, I don't think you put yourself in > the developer's view at all. Users are not going to be better of having > to deal with both modes. Have you guys really thought this through? Let's look at this from two angles. First, from the our perspective maintaining and developing PHP. Without the Unicode switch, and as has already been suggested, PHP 5 will never die. Anything new in PHP 6 that isn't related to Unicode will be backported to PHP 5. Or, a slight variation of that, any developer with no interest in Unicode will only work on the PHP 5 branch and not bother worrying about whether it works in PHP 6 forcing others to do that work. I don't think we have the resources to do this, and I think it is likely to either create 2 classes of developers and potentially diverging trees, or it may simply kill off the Unicode effort altogether if not enough developers bother looking at PHP 6 since PHP 5 will live forever and is free of all this annoying Unicode stuff that is just too complicated to deal with. Second, from the user space PHP developers' perspective. There are two groups of those out there. There is the group that builds apps for controlled environments. Yahoo, Facebook, and the hundreds, if not thousands of smaller companies out there that will define a certain PHP configuration and code against that. To them such a switch isn't a big deal except when it comes to re-using external code. Which bring us to the second group which is the group that strives to build portable apps designed to run on as many unknown PHP configs as possible. This is the group that will get hit by this, and here is where we need to figure out how to cause them the least amount of pain. They are going to feel some pain in order to get their heads around Unicode no matter how we handle this. For the portion of these folks who don't want to worry about Unicode at all and they actually have code that does stuff on binary strings that will break, their stuff just won't work no matter what we do. The difference comes down to whether it gets marked as PHP5-only or it gets marked as non-Unicode-only. And the other camp who do want to make sure their stuff supports Unicode will need to write the Unicode and non-Unicode versions and check to see if the system they are running on supports Unicode or not. Whether they check the PHP version number, or the Unicode switch, or probe directly for the features they need, it ends up being about the same amount of pain. What may be somewhat lost in all this, that I hope nobody here is forgetting, is that smooth Unicode support is really important. Being able to work directly in your native charset with your native strings without having to deal with iconv and other crap is the goal here. And let's also not forget that a lot of code will actually work unchanged in PHP 6 Unicode-mode and suddenly be Unicode-capable where they weren't before. I would love to see all this energy put toward making sure as much code as possible falls into this category instead of arguing about where to put the Unicode switch. It's still a switch whether you put it in the version number or in the .ini file. In the version number it is simply easier for people to ignore from all sides or the discussion here, but where does that leave us 4 years from now? Perhaps the real argument here is whether we should be doing Unicode at all? -Rasmus