Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:47327 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 4386 invoked from network); 16 Mar 2010 19:04:14 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Mar 2010 19:04:14 -0000 Authentication-Results: pb1.pair.com smtp.mail=lester@lsces.co.uk; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=lester@lsces.co.uk; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 213.123.26.184 cause and error) X-PHP-List-Original-Sender: lester@lsces.co.uk X-Host-Fingerprint: 213.123.26.184 c2beaomr06.btconnect.com Received: from [213.123.26.184] ([213.123.26.184:21978] helo=c2beaomr06.btconnect.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id B1/C0-15129-C26DF9B4 for ; Tue, 16 Mar 2010 14:04:13 -0500 Received: from [10.0.0.5] (host81-138-11-136.in-addr.btopenworld.com [81.138.11.136]) by c2beaomr06.btconnect.com with ESMTP id CRJ79282; Tue, 16 Mar 2010 19:03:39 GMT X-Mirapoint-IP-Reputation: reputation=Fair-1, source=Queried, refid=0001.0A0B0302.4B9FD60A.0220, actions=tag Message-ID: <4B9FD60A.3050500@lsces.co.uk> Date: Tue, 16 Mar 2010 19:03:38 +0000 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100217 Fedora/2.0.3-1.fc12 SeaMonkey/2.0.3 MIME-Version: 1.0 To: PHP internals References: <4B9C9007.1080802@lsces.co.uk> <4B9EC3B2.7070901@zend.com> <4B9F4196.9030404@lsces.co.uk> <99cf22521003160448k5028ae61y70e1e61428d13280@mail.gmail.com> <99cf22521003161040x4dba08fblb7e088cef16b64a9@mail.gmail.com> <4B9FCEA7.50108@lerdorf.com> In-Reply-To: <4B9FCEA7.50108@lerdorf.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Junkmail-Status: score=10/50, host=c2beaomr06.btconnect.com X-Junkmail-SD-Raw: score=unknown, refid=str=0001.0A0B0201.4B9FD629.0240,ss=1,fgs=0, ip=0.0.0.0, so=2009-07-20 21:54:04, dmn=5.7.1/2009-08-27, mode=single engine X-Junkmail-IWF: false Subject: Re: [PHP-DEV] Where are we ACTUALLY on Unicode? From: lester@lsces.co.uk (Lester Caine) Rasmus Lerdorf wrote: > On 03/16/2010 10:40 AM, dreamcat four wrote: >> As for text files on disk, if they are unicode, they are most commonly >> utf-8 too. So then, why use utf-16 as internal unicode representation >> in Php? It doesn't really make a lot of sense for most regular people >> who want to use Php for their web application. Unless they don't >> really care how slow its gonna be converting everything, constantly... > > Well, the obvious original reason is that ICU uses UTF-16 internally and > the logic was that we would be going in and out of ICU to do all the > various Unicode operations many more times than we would be interfacing > with external things like MySQL or files on disk. You generally only > read or write a string once from an external source, but you may perform > multiple Unicode operations on that same string so avoiding a conversion > for each operation seems logical. Which begs the question - is ICU actually the right base? But I'd still like some feedback on my idea that until an operation needs to be able to handle multi byte character string processing, why not simply stay in UTF-8? No reason why a string variable can't be converted only when needed, and then dropped back to UTF-8 if needed later? And if the user is only using single byte characters then the multi byte stuff never kicks in anyway? If you NEED raw speed use the basic character set. -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk// Firebird - http://www.firebirdsql.org/index.php