Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:47329 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 5791 invoked from network); 16 Mar 2010 19:06:03 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Mar 2010 19:06:03 -0000 Authentication-Results: pb1.pair.com header.from=dreamcat4@gmail.com; sender-id=pass; domainkeys=bad Authentication-Results: pb1.pair.com smtp.mail=dreamcat4@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.220.215 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: dreamcat4@gmail.com X-Host-Fingerprint: 209.85.220.215 mail-fx0-f215.google.com Received: from [209.85.220.215] ([209.85.220.215:38637] helo=mail-fx0-f215.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id EF/01-15129-A96DF9B4 for ; Tue, 16 Mar 2010 14:06:03 -0500 Received: by fxm7 with SMTP id 7so288543fxm.23 for ; Tue, 16 Mar 2010 12:05:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=tQK4ke3ZK541VXmlhGgqeGcKLLji3QCCJ0CZk/FlTWo=; b=p714fHbn661NZXbqFgEmh7t8PksBWPrmIVq+jTG+GSnQ9wit80RJ/yZ3V2cBSu8G6x 2ipu0Od+iGvV2CimyoFO/a+j9kC+abq4Ea5f3eprHweN/Nn/sPhQRKXbgT4P3D72IBCk qSRk8l4b6lXobDpbK6Dpz3Ioyl/M9C8tcftUQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=T/zTMHRMrP8T2s2nwGS8ShK762FkR3J9xZ8DnwTf+k2RlbmRm44/zzkveh7xapOxqd ExQMpIWg8eqB0e+/Wj3n7BCpk1uTM/GVw1/D4w3/3LBfexzNZMd9rCBRsz3mPpDpwuiU zyKY3WbIy/E47nvVCkCcCYxthJn3ZW12qnAh0= MIME-Version: 1.0 Received: by 10.223.4.217 with SMTP id 25mr6802548fas.82.1268766359174; Tue, 16 Mar 2010 12:05:59 -0700 (PDT) In-Reply-To: <4B9FCEA7.50108@lerdorf.com> References: <4B9C9007.1080802@lsces.co.uk> <4B9EC3B2.7070901@zend.com> <4B9F4196.9030404@lsces.co.uk> <99cf22521003160448k5028ae61y70e1e61428d13280@mail.gmail.com> <99cf22521003161040x4dba08fblb7e088cef16b64a9@mail.gmail.com> <4B9FCEA7.50108@lerdorf.com> Date: Tue, 16 Mar 2010 19:05:39 +0000 Message-ID: <99cf22521003161205w22335143lbf531a0f58a60610@mail.gmail.com> To: Rasmus Lerdorf Cc: Lester Caine , PHP internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Where are we ACTUALLY on Unicode? From: dreamcat4@gmail.com (dreamcat four) On Tue, Mar 16, 2010 at 6:32 PM, Rasmus Lerdorf wrote: > On 03/16/2010 10:40 AM, dreamcat four wrote: >> As for text files on disk, if they are unicode, they are most commonly >> utf-8 too. So then, why use utf-16 as internal unicode representation >> in Php? It doesn't really make a lot of sense for most regular people >> who want to use Php for their web application. Unless they don't >> really care how slow its gonna be converting everything, constantly... > > Well, the obvious original reason is that ICU uses UTF-16 internally and > the logic was that we would be going in and out of ICU to do all the > various Unicode operations many more times than we would be interfacing > with external things like MySQL or files on disk. =C2=A0You generally onl= y > read or write a string once from an external source, but you may perform > multiple Unicode operations on that same string so avoiding a conversion > for each operation seems logical. > > -Rasmus > > > Its only logical if you've bothered to profile the conversion calls to ICU against the non-ICU conversion calls. Im guessing the way to do that, is to have 2 versions of each conversion method. One used by ICU, and another used everywhere else. The harder part is to find some suitable, real life php programs to test with.