Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:47215 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 13605 invoked from network); 13 Mar 2010 11:09:29 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 13 Mar 2010 11:09:29 -0000 Authentication-Results: pb1.pair.com header.from=lester@lsces.co.uk; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=lester@lsces.co.uk; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 213.123.26.188 cause and error) X-PHP-List-Original-Sender: lester@lsces.co.uk X-Host-Fingerprint: 213.123.26.188 c2beaomr10.btconnect.com Received: from [213.123.26.188] ([213.123.26.188:29943] helo=c2beaomr10.btconnect.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 10/63-27678-4627B9B4 for ; Sat, 13 Mar 2010 06:09:25 -0500 Received: from [10.0.0.5] (host81-138-11-136.in-addr.btopenworld.com [81.138.11.136]) by c2beaomr10.btconnect.com with ESMTP id EEM52657; Sat, 13 Mar 2010 11:09:20 GMT X-Mirapoint-IP-Reputation: reputation=Fair-1, source=Queried, refid=0001.0A0B0302.4B9B7260.0270, actions=tag Message-ID: <4B9B7260.1000205@lsces.co.uk> Date: Sat, 13 Mar 2010 11:09:20 +0000 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100217 Fedora/2.0.3-1.fc12 SeaMonkey/2.0.3 MIME-Version: 1.0 To: PHP Developers Mailing List References: <4B9926E8.4080202@lerdorf.com> <7f3ed2c31003120958w7bd41059o88869669c6f5b0d9@mail.gmail.com> <661d85d51003130107o7cf19012m7ce93f0147c7585a@mail.gmail.com> In-Reply-To: <661d85d51003130107o7cf19012m7ce93f0147c7585a@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Junkmail-Status: score=10/50, host=c2beaomr10.btconnect.com X-Junkmail-SD-Raw: score=unknown, refid=str=0001.0A0B0201.4B9B7261.01F1,ss=1,fgs=0, ip=0.0.0.0, so=2009-07-20 21:54:04, dmn=5.7.1/2009-08-27, mode=single engine X-Junkmail-IWF: false Subject: Re: [PHP-DEV] PHP 6 From: lester@lsces.co.uk (Lester Caine) Chen Ze wrote: > On Sat, Mar 13, 2010 at 2:34 AM, Derick Rethans wrote: >> On Fri, 12 Mar 2010, Hannes Magnusson wrote: >> >>> On Fri, Mar 12, 2010 at 17:38, Moriyoshi Koizumi wrote: >>>> I'd love to see my brand-new mbstring implementation in the release. >>>> Dropping mbstring completely won't be any good because lots of >>>> applications rely on it, but I don't really want to maintain the funky >>>> library bundled with it. >>> >>> Thats actually one of the ideas we had on IRC. >>> That mbstring patch and more ext/intl features should be enough to >>> solve "the unicode problem". >> >> Sorry, but that is not true. intl and mbstring can provide functionality >> to deal with UTF 8 string manipulation functions, they can not provide >> proper Unicode support. Proper Unicode support is *not* only just >> dealing with UTF-8 strings. Proper Unicode support includes dealing with >> file streams, with different encodings, with localiztion, with sorting, >> with locales, with formatting numbers. Offloading this to extensions >> makes Unicode support an add-on hack, and not a language feature. I am >> not saying that intl and mbstring aren't *useful*, but they definitely >> do not solve "the unicode problem". >> > > I think unicode should only care for string handling. Formatting > numbers should not be the thing that unicode cares. Unicode is a > standard for text, not for text or number formatting. > > Back to the days we don't have unicode, the number formatting have > already existed. It even exists when computer was not invented. > > That is same for sorting. > > When we think about Unicode, we should think about those really > related to Unicode,like file system. Number formatting and sorting are > other things which intl cares. > > For the unicode, I think we should implement something like: > > $chars=new mchar($bytes,$bytes_encoding); > echo $chars;//output encoding > foreach ($chars as $char) { > echo $char;//output single utf-16/utf-8 char (depends on default > output encoding) > } > echo $chars->bytes('gbk'); > > $chars->outputEncoding('gbk'); > echo $chars; > > ini_set('mchar_output_encoding','gbk'); > echo $chars; > > ini_set('mchar_filesystem_encoding','gbk'); > echo $chars->filepath(); I think this probably highlights the fundamental difference of opinions on Unicode? Handling unicode CONTENT is not the problem here. People nowadays expect to be able to use their own language to write code, and create functions using words that they recognize. In databases, table and field names are now expected to support unicode, rather than just handling unicode data pumped into ascii titled fields. Personally I'm quite happy with just using ascii names for things, but more and more overseas customers provide contact details in 'strange' character sets that only unicode can handle, and handling THAT in PHP5 is not a problem. It's when people start building databases with unicode metadata and expect the tools interfacing with that to understand unicode as well. It was my understanding that PHP6 was intended to provide international users with something that they could use in their own native language? Unicode titled files with unicode titled classes and functions. -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk// Firebird - http://www.firebirdsql.org/index.php