Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:31044 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 71942 invoked by uid 1010); 18 Jul 2007 09:56:16 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 71927 invoked from network); 18 Jul 2007 09:56:16 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 18 Jul 2007 09:56:16 -0000 Authentication-Results: pb1.pair.com smtp.mail=mls@pooteeweet.org; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=mls@pooteeweet.org; sender-id=unknown Received-SPF: error (pb1.pair.com: domain pooteeweet.org from 212.112.227.169 cause and error) X-PHP-List-Original-Sender: mls@pooteeweet.org X-Host-Fingerprint: 212.112.227.169 ipx11223.ipxserver.de Linux 2.5 (sometimes 2.4) (4) Received: from [212.112.227.169] ([212.112.227.169:35474] helo=ipx11223.ipxserver.de) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 4C/0A-18661-DB3ED964 for ; Wed, 18 Jul 2007 05:56:15 -0400 Received: from localhost (localhost [127.0.0.1]) by ipx11223.ipxserver.de (Postfix) with ESMTP id 5D53FA58033; Wed, 18 Jul 2007 11:56:10 +0200 (CEST) Received: from ipx11223.ipxserver.de ([127.0.0.1]) by localhost (flottensignalgeber [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 07500-04; Wed, 18 Jul 2007 11:56:04 +0200 (CEST) Received: from [192.168.1.46] (49-120.5-85.cust.bluewin.ch [85.5.120.49]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ipx11223.ipxserver.de (Postfix) with ESMTP id 55933DF002E; Wed, 18 Jul 2007 11:56:04 +0200 (CEST) Message-ID: <469DE3A0.1060902@pooteeweet.org> Date: Wed, 18 Jul 2007 11:55:44 +0200 User-Agent: Thunderbird 2.0.0.4 (Windows/20070604) MIME-Version: 1.0 To: Rasmus Lerdorf Cc: Derick Rethans , PHP Developers Mailing List References: <698DE66518E7CA45812BD18E807866CE648191@us-ex1.zend.net> <54C4340A-D9EA-4B5A-B39C-B55B29B1B3BC@prohost.org> <698DE66518E7CA45812BD18E807866CE648193@us-ex1.zend.net> <469B7FB1.1070507@pooteeweet.org> <698DE66518E7CA45812BD18E807866CE648290@us-ex1.zend.net> <7.0.1.0.2.20070718023255.0dc0eed0@zend.com> <469DE09F.9080509@lerdorf.com> In-Reply-To: <469DE09F.9080509@lerdorf.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by somedaemon at backendmedia.com Subject: Re: [PHP-DEV] POSIX regex From: mls@pooteeweet.org (Lukas Kahwe Smith) Rasmus Lerdorf wrote: > Derick Rethans wrote: >> Regarding the unicode on/off modes, I don't think you put yourself in >> the developer's view at all. Users are not going to be better of having >> to deal with both modes. > > Have you guys really thought this through? > > Let's look at this from two angles. > > First, from the our perspective maintaining and developing PHP. Without > the Unicode switch, and as has already been suggested, PHP 5 will never > die. Anything new in PHP 6 that isn't related to Unicode will be > backported to PHP 5. Or, a slight variation of that, any developer with > no interest in Unicode will only work on the PHP 5 branch and not bother > worrying about whether it works in PHP 6 forcing others to do that work. > I don't think we have the resources to do this, and I think it is > likely to either create 2 classes of developers and potentially > diverging trees, or it may simply kill off the Unicode effort altogether > if not enough developers bother looking at PHP 6 since PHP 5 will live > forever and is free of all this annoying Unicode stuff that is just too > complicated to deal with. > > Second, from the user space PHP developers' perspective. There are two > groups of those out there. There is the group that builds apps for > controlled environments. Yahoo, Facebook, and the hundreds, if not > thousands of smaller companies out there that will define a certain PHP > configuration and code against that. To them such a switch isn't a big > deal except when it comes to re-using external code. Which bring us to > the second group which is the group that strives to build portable apps > designed to run on as many unknown PHP configs as possible. This is the > group that will get hit by this, and here is where we need to figure out > how to cause them the least amount of pain. They are going to feel some > pain in order to get their heads around Unicode no matter how we handle > this. For the portion of these folks who don't want to worry about > Unicode at all and they actually have code that does stuff on binary > strings that will break, their stuff just won't work no matter what we > do. The difference comes down to whether it gets marked as PHP5-only or > it gets marked as non-Unicode-only. And the other camp who do want to > make sure their stuff supports Unicode will need to write the Unicode > and non-Unicode versions and check to see if the system they are running > on supports Unicode or not. Whether they check the PHP version number, > or the Unicode switch, or probe directly for the features they need, it > ends up being about the same amount of pain. > > What may be somewhat lost in all this, that I hope nobody here is > forgetting, is that smooth Unicode support is really important. Being > able to work directly in your native charset with your native strings > without having to deal with iconv and other crap is the goal here. And > let's also not forget that a lot of code will actually work unchanged in > PHP 6 Unicode-mode and suddenly be Unicode-capable where they weren't > before. I would love to see all this energy put toward making sure as > much code as possible falls into this category instead of arguing about > where to put the Unicode switch. It's still a switch whether you put it > in the version number or in the .ini file. In the version number it is > simply easier for people to ignore from all sides or the discussion > here, but where does that leave us 4 years from now? I guess the question (which I am unable to answer) is if its easier to maintain PHP6 with the switch or be forced to backport to PHP5 without the switch in PHP6. If it does end up that a lot of devs prefer to work on PHP5 and as a result PHP6 is left dangling, I wonder if with the switch things will be any easier as devs will work/test only the non unicode side of things? I think this was the key point that was brought up that it will not be easier and instead was deemed more error prone to handle all the if's in a single tree, versus have a "clean" separation. Also I wonder how a unicode on/off switch will be handled on the documentation side. It would add more permutations in the documentation to have the switch. From my understanding the situation is fairly non trivial already in how to handle all the version dependent differences. Philipp, whats your take on this? regards, Lukas