Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:47326 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 97804 invoked from network); 16 Mar 2010 18:32:19 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Mar 2010 18:32:19 -0000 Authentication-Results: pb1.pair.com smtp.mail=rasmus@lerdorf.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=rasmus@lerdorf.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lerdorf.com from 209.85.218.209 cause and error) X-PHP-List-Original-Sender: rasmus@lerdorf.com X-Host-Fingerprint: 209.85.218.209 mail-bw0-f209.google.com Received: from [209.85.218.209] ([209.85.218.209:49392] helo=mail-bw0-f209.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 3F/7F-15129-2BECF9B4 for ; Tue, 16 Mar 2010 13:32:19 -0500 Received: by bwz1 with SMTP id 1so263441bwz.1 for ; Tue, 16 Mar 2010 11:32:15 -0700 (PDT) Received: by 10.103.50.13 with SMTP id c13mr7353646muk.63.1268764335017; Tue, 16 Mar 2010 11:32:15 -0700 (PDT) Received: from [192.168.200.22] (c-98-234-184-167.hsd1.ca.comcast.net [98.234.184.167]) by mx.google.com with ESMTPS id t10sm28398605muh.29.2010.03.16.11.32.11 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 16 Mar 2010 11:32:13 -0700 (PDT) Message-ID: <4B9FCEA7.50108@lerdorf.com> Date: Tue, 16 Mar 2010 11:32:07 -0700 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10pre) Gecko/20100316 Shredder/3.0.4pre MIME-Version: 1.0 To: dreamcat four CC: Lester Caine , PHP internals References: <4B9C9007.1080802@lsces.co.uk> <4B9EC3B2.7070901@zend.com> <4B9F4196.9030404@lsces.co.uk> <99cf22521003160448k5028ae61y70e1e61428d13280@mail.gmail.com> <99cf22521003161040x4dba08fblb7e088cef16b64a9@mail.gmail.com> In-Reply-To: <99cf22521003161040x4dba08fblb7e088cef16b64a9@mail.gmail.com> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Where are we ACTUALLY on Unicode? From: rasmus@lerdorf.com (Rasmus Lerdorf) On 03/16/2010 10:40 AM, dreamcat four wrote: > As for text files on disk, if they are unicode, they are most commonly > utf-8 too. So then, why use utf-16 as internal unicode representation > in Php? It doesn't really make a lot of sense for most regular people > who want to use Php for their web application. Unless they don't > really care how slow its gonna be converting everything, constantly... Well, the obvious original reason is that ICU uses UTF-16 internally and the logic was that we would be going in and out of ICU to do all the various Unicode operations many more times than we would be interfacing with external things like MySQL or files on disk. You generally only read or write a string once from an external source, but you may perform multiple Unicode operations on that same string so avoiding a conversion for each operation seems logical. -Rasmus