Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:47271 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 84447 invoked from network); 14 Mar 2010 14:43:24 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Mar 2010 14:43:24 -0000 Authentication-Results: pb1.pair.com header.from=dreamcat4@gmail.com; sender-id=pass; domainkeys=bad Authentication-Results: pb1.pair.com smtp.mail=dreamcat4@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.220.171 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: dreamcat4@gmail.com X-Host-Fingerprint: 209.85.220.171 mail-fx0-f171.google.com Received: from [209.85.220.171] ([209.85.220.171:62873] helo=mail-fx0-f171.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 24/82-07348-B06FC9B4 for ; Sun, 14 Mar 2010 09:43:23 -0500 Received: by fxm19 with SMTP id 19so980192fxm.1 for ; Sun, 14 Mar 2010 07:43:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=AVu6YA71678l7QOot9pM+oJqqatbC6T0ldipJMaRiwY=; b=b2s80I9uZIwHij9iEnnsKxVqy4mhrv4+PIiGix4J9QEqCy3T3fezVkqsgosmVnAA/b Comkms/hrXVDiOoixhowy4hNS8jVYUyU7xlhY1f6h6hVoNKkPcUPs+0VxC1mSiKgzZp4 e3IHLsLC4siHVuJ72u0UMHPzI1dqdn3YLGGIo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=EDi1enuztgSfh6EjhgKFW/fFoDaP7ppHzRbzoVjOID/OWJbYTErvHjYrBUCS1ptlJS 8UZu1q5PVdaPOtGL2O5gKbmqRyTOQNHEyYe8btw2lHImS9dAHFc9x6SWg9Q9hi1Or0RG w63fNFWdoRK7R6w+nAZ4AEv7T8pTfQ808FhTs= MIME-Version: 1.0 Received: by 10.223.3.137 with SMTP id 9mr4036850fan.49.1268577800109; Sun, 14 Mar 2010 07:43:20 -0700 (PDT) In-Reply-To: <4bcbf4711003140723s712c2653xa61e8f6053983553@mail.gmail.com> References: <4B9C9007.1080802@lsces.co.uk> <4B9C91D7.2050402@rowe-clan.net> <13008E62F851429F84B9FE2F3F230286@pc> <4bcbf4711003140723s712c2653xa61e8f6053983553@mail.gmail.com> Date: Sun, 14 Mar 2010 14:43:00 +0000 Message-ID: <99cf22521003140743t29ce3ecxeaa0b5d4609ba7a2@mail.gmail.com> To: PHP Development Content-Type: text/plain; charset=UTF-8 Subject: Re: [PHP-DEV] Where are we ACTUALLY on Unicode? From: dreamcat4@gmail.com (dreamcat four) Hi, I used to work a job where we used UTF-16 for embedded applications. Our company chose UTF-16 over UTF-8 because it was byte-aligned and therefore faster / more effecient to process than UTF-8. However theres no reason why UTF-8 has to be drastically slower. The truch is, even we could have used UTF-8 there. And I don't buy the whole byte size / memory thing either. Even in our restricted embedded environments, that was never a consideration anyway. Because a well written program won't bloat memory by holding too many strings. That's what MYSQL is for. Apple uses UTF-16 for CFString, NSString data. But elsewhere (and on the web!) most people uses UTF-8. Pretty much. You should implement UTF-8, with a view to still allow adding UTF-16 support later on. That is to say, the encoding should be wrapped, and switchable underneath. Of course all that is easier said than done with PHP. But thats the right way to do it. On Sun, Mar 14, 2010 at 2:23 PM, Jordi Boggiano wrote: > On Sun, Mar 14, 2010 at 12:03 PM, Stan Vassilev wrote: >> UTF8 also takes 4 bytes for representing characters in the higher bit >> planes, as quite a lot of bits are lost for every char in order to describe >> how long the code point is, and when it ends and so on. This means >> memory-wise it may not be of big benefit to asian countries. > > I remember Brian Aker saying that they chose to work internally with > UTF-8 for Drizzle. His explanation of it was that asian countries have > so much english content mixed in that on average even for them UTF-8 > still had a lower footprint than UTF-16/32. I do not know where the > stats came from, but if it holds any truth it is worth considering. > > Cheers, > Jordi > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >