Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:37444 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 45350 invoked from network); 5 May 2008 07:17:56 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 5 May 2008 07:17:56 -0000 Authentication-Results: pb1.pair.com smtp.mail=lester@lsces.co.uk; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=lester@lsces.co.uk; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 213.123.20.132 cause and error) X-PHP-List-Original-Sender: lester@lsces.co.uk X-Host-Fingerprint: 213.123.20.132 c2bthomr14.btconnect.com Received: from [213.123.20.132] ([213.123.20.132:22235] helo=c2bthomr14.btconnect.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BE/F0-40102-194BE184 for ; Mon, 05 May 2008 03:17:48 -0400 Received: from [127.0.0.1] (host81-138-11-136.in-addr.btopenworld.com [81.138.11.136]) by c2bthomr14.btconnect.com with ESMTP id AYC11510; Mon, 5 May 2008 08:17:29 +0100 (BST) Message-ID: <481EB410.1090804@lsces.co.uk> Date: Mon, 05 May 2008 08:15:28 +0100 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080313 SeaMonkey/1.1.9 MIME-Version: 1.0 To: PHP internals References: <4BD5A050-02F2-46BD-B867-FA8CA12FF1BD@macvicar.net> <48988.78.61.224.253.1209918881.nsm@avilys.eik.lt> <60526.78.61.224.253.1209928511.nsm@avilys.eik.lt> In-Reply-To: <60526.78.61.224.253.1209928511.nsm@avilys.eik.lt> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Junkmail-Status: score=10/50, host=c2bthomr14.btconnect.com X-Junkmail-SD-Raw: score=unknown, refid=str=0001.0A01020A.481EB48A.00F8,ss=1,fgs=0, ip=127.0.0.1, so=2007-10-30 19:00:17, dmn=5.4.3/2008-02-01 X-Junkmail-IWF: false Subject: Re: [PHP-DEV] Removal of unicode_semantics From: lester@lsces.co.uk (Lester Caine) Tomas Kuliavas wrote: >>>> We've discussed this a few times in the past and it's time to make a >>>> final decision about its removal. >>>> >>>> I think most people have agreed that this is the way forward but no >>>> one has produced a patch. I have a student working on unicode >>>> conversion for the Google Summer of Code and this would help make it >>>> simpler. >>> unicode_semantics=on breaks backwards compatibility in scripts that have >>> implemented multiple character set support in current PHP setups. >> Why don't you go ahead and make a list of those exacty issues then? We >> can then see how to fix those issues. That's much more useful then just >> posting to the mailinglist when you don't agree with something. From >> what I've seen with my code base, the changes that I have to do are >> minimal once some (internal) functions are fixed up. > > If I remain silent, others will have arguments that "everybody agrees on > removal of unicode_semantics". > I can bypass it by adding one line to every script that operates with > binary strings, but where are warranties that you won't dump declare() > support just like you dump unicode_semantics. What happens to your new > Unicode aware string functions, if I lie about strings' charset to PHP > interpreter? mb_strlen can't calculate correct $string length even when I > set correct charset in mb_strlen() arguments. If above code works as I > want in PHP6 unicode_semantics=on, mb_strlen($string,'utf-8') returns 2 > and not 1. That sounds like just the sort of edge case that Derick is suggesting needs logging for fixing up. unicode_semantics=on is just another bodge to to make it happen rather than a solution. I think I understand your description, and to my eyes it looks like a unicode bug that needs addressing? We have been maintaining two code bases for a long time now - PHP4 and PHP5. Now that PHP4 is being shelved finally those of us who have had to maintain compatibility with PHP4 can now move on and address the problems of PHP5/PHP6 compatibility. So from *MY* point of view unicode_semantics=on is creating a THIRD case to have to manage? PLEASE can someone take charge and at least get PHP6 moving forward to a stable alpha so that we have something users can be happy to test against! PHP5 = code sets PHP6 = Unicode -- Lester Caine - G8HFL ----------------------------- Contact - http://home.lsces.co.uk/lsces/wiki/?page=contact L.S.Caine Electronic Services - http://home.lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk// Firebird - http://www.firebirdsql.org/index.php