Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:47268
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: error (pb1.pair.com: domain mozo.jp from 209.85.211.204 cause and error)
MIME-Version: 1.0
In-Reply-To: <4bcbf4711003140723s712c2653xa61e8f6053983553@mail.gmail.com>
References: <4B9C9007.1080802@lsces.co.uk> <4B9C91D7.2050402@rowe-clan.net> 
	<13008E62F851429F84B9FE2F3F230286@pc> <4bcbf4711003140723s712c2653xa61e8f6053983553@mail.gmail.com>
Date: Sun, 14 Mar 2010 23:34:24 +0900
Message-ID: <cd1fb7541003140734h2b95ffd4o80210400bf197958@mail.gmail.com>
To: Jordi Boggiano <j.boggiano@seld.be>
Cc: Stan Vassilev <sv_forums@fmethod.com>, "William A. Rowe Jr." <wrowe@rowe-clan.net>, internals@lists.php.net
Content-Type: text/plain; charset=ISO-8859-1
Subject: Re: [PHP-DEV] Where are we ACTUALLY on Unicode?
From: mozo@mozo.jp (Moriyoshi Koizumi)

On Sun, Mar 14, 2010 at 11:23 PM, Jordi Boggiano <j.boggiano@seld.be> wrote:
> On Sun, Mar 14, 2010 at 12:03 PM, Stan Vassilev <sv_forums@fmethod.com> wrote:
>> UTF8 also takes 4 bytes for representing characters in the higher bit
>> planes, as quite a lot of bits are lost for every char in order to describe
>> how long the code point is, and when it ends and so on. This means
>> memory-wise it may not be of big benefit to asian countries.
>
> I remember Brian Aker saying that they chose to work internally with
> UTF-8 for Drizzle. His explanation of it was that asian countries have
> so much english content mixed in that on average even for them UTF-8
> still had a lower footprint than UTF-16/32. I do not know where the
> stats came from, but if it holds any truth it is worth considering.

This is true, as most of the text data that are interchanged in the
Internet should be represented in HTML, in which such characters and
alphabetic tags always appear alternatively.

Moriyoshi