Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:30488 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 7839 invoked by uid 1010); 6 Jul 2007 09:54:51 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 7824 invoked from network); 6 Jul 2007 09:54:51 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 6 Jul 2007 09:54:51 -0000 Authentication-Results: pb1.pair.com smtp.mail=rasmus@lerdorf.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=rasmus@lerdorf.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lerdorf.com from 207.126.228.149 cause and error) X-PHP-List-Original-Sender: rasmus@lerdorf.com X-Host-Fingerprint: 207.126.228.149 rsmtp1.corp.yahoo.com FreeBSD 4.7-5.2 (or MacOS X 10.2-10.3) (2) Received: from [207.126.228.149] ([207.126.228.149:31897] helo=rsmtp1.corp.yahoo.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 4B/90-01395-9611E864 for ; Fri, 06 Jul 2007 05:54:51 -0400 Received: from trainburn-lm-corp-yahoo-com.local (socks1.corp.yahoo.com [216.145.54.158]) (authenticated bits=0) by rsmtp1.corp.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l669scsa061554 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 6 Jul 2007 02:54:38 -0700 (PDT) Message-ID: <468E1158.2030900@lerdorf.com> Date: Fri, 06 Jul 2007 02:54:32 -0700 User-Agent: Thunderbird 2.0.0.4 (Macintosh/20070604) MIME-Version: 1.0 To: Derick Rethans CC: Cristian Rodriguez , internals@lists.php.net References: <1181829227.3478.3.camel@localhost.localdomain> <4678252F.2050803@sci.fi> <46783212.4020900@lerdorf.com> <34654.216.230.84.67.1183064088.squirrel@www.l-i-e.com> <54557.78.61.224.253.1183098089.squirrel@avilys.eik.lt> <4684BB91.4070507@zend.com> <2169.24.1.37.132.1183693664.squirrel@www.l-i-e.com> <1183699755.14343.5.camel@johannes.nop> <7d5a202f0707060224oa64dfeaw2c7ee17a735648f9@mail.gmail.com> In-Reply-To: X-Enigmail-Version: 0.95.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6? From: rasmus@lerdorf.com (Rasmus Lerdorf) Derick Rethans wrote: > On Fri, 6 Jul 2007, Cristian Rodriguez wrote: > >> On 7/6/07, Johannes Schlüter wrote: >>> which will just produce way more >>> problems to hosters and developers of software for "PHP 6". >> >> yes :-( .. So if unicode.semantics cannot be set at runtime with >> ini_set() or at least "per-dir" is a complete non-sense to have it, >> as the vast mayority of users will not be able to turn it On/off and >> will certainly be off in most configurations as otherwise it will >> break too much code. >> >> Im sorry but I dont see this ending as a good thing.. looks pretty >> much like more of the same old mistakes ( magic_quotes , safe_mode >> anyone ? this may be even worse..) > > This *is* worse because with magic_quotes you can atleast workaround it > in user land. Not so much with this setting. It comes down to whether we want a true Unicode mode for PHP. As far as I am concerned, anything short of that is rather half-assed and feels bolted on like in other languages. The huge difficulty, and the reason it is bolted on after the fact in most languages, is that it is extremely difficult to transition from non-unicode to full unicode without breaking everything. The suggestion has been to just have a bunch of Unicode functions you can call so you explicitly control when you are doing Unicode stuff and the rest of the time you are working in binary mode. That's exactly what we have with the Unicode semantics turned off. The idea is for all the Unicode functionality to be available in this mode and like has been stated many times, this is the mode most ISP's are going to run their shared servers in and as such this is the mode a portable PHP script needs to be written for. However, does this mean we shouldn't even attempt to get it right? 5 years from now, are we still going to limp along having to call explicit functions to compare and iterate over unicode strings? Or heaven forbid, we end up with a mess of various string classes. A string is just a string, it isn't a class and it shouldn't be complicated. It should have carried a charset with it from day one, but it didn't, so we are where we are. So yes, the only real customers for this full Unicode mode in PHP 6 are going to be the folks that have full control over their servers and their software which will likely limit it to hosted services and exclude large PHP software packages that will necessarily need to be written to be portable. That of course creates a split right down the middle and makes code sharing harder, and maybe it won't work, but the hope is we can minimize these issues enough that the amount of code that realistically needs to be written twice will be rather limited. If we can't get it down to a manageable set of known things that people need to watch out for, then our full unicode attempt has failed and we need to stick with the half-assed approach. I'm not convinced we are there yet and I'd hate to see us give up before we have taken a decent stab at it. We need to think big and longterm, not small and shortterm here. -Rasmus