Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:47267 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 76619 invoked from network); 14 Mar 2010 14:33:23 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Mar 2010 14:33:23 -0000 Authentication-Results: pb1.pair.com header.from=pierre.php@gmail.com; sender-id=pass; domainkeys=bad Authentication-Results: pb1.pair.com smtp.mail=pierre.php@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.170 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: pierre.php@gmail.com X-Host-Fingerprint: 74.125.82.170 mail-wy0-f170.google.com Received: from [74.125.82.170] ([74.125.82.170:41514] helo=mail-wy0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 27/B0-07348-2B3FC9B4 for ; Sun, 14 Mar 2010 09:33:22 -0500 Received: by wyb36 with SMTP id 36so467025wyb.29 for ; Sun, 14 Mar 2010 07:33:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=ZoesoSj/yAnTFWDZH6Xm5nJvq5Mp7BiKdvI/aqX1F44=; b=yFdT2JtroZbAfANhC7CcGPHfY8czr0K1zSUJYkYI+2/TXAzrMuUhgLpEvsyMEQXuVo jdzGgfRmpwKKvCQKqbU7e+Q5OUKR3+7BqfFkVhI6MVDzBxwsuF8ye0Zv30TWVv2Qdu0u gZFRj4jlxoiTbawH5hQ628a+/4Mb1CoQxT1RE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=WMqhaENFZvfNFYjyMDguQGl7IPiZjUFYdJsNJpCFUKm2YI5Xt4+8rX0OgkxYzq5cTZ pT0M/76tVfRRspoLl9LKe4GO8hIfUEVXSfap1puz5DAnXxnn6P63oOhJozHjzJbvwTIZ d92n3ZaGJGnTCFYEskbiWhAs/Vg/NcqmoWQJI= MIME-Version: 1.0 Received: by 10.216.91.9 with SMTP id g9mr1314620wef.194.1268577199529; Sun, 14 Mar 2010 07:33:19 -0700 (PDT) In-Reply-To: <4bcbf4711003140723s712c2653xa61e8f6053983553@mail.gmail.com> References: <4B9C9007.1080802@lsces.co.uk> <4B9C91D7.2050402@rowe-clan.net> <13008E62F851429F84B9FE2F3F230286@pc> <4bcbf4711003140723s712c2653xa61e8f6053983553@mail.gmail.com> Date: Sun, 14 Mar 2010 15:33:19 +0100 Message-ID: To: Jordi Boggiano Cc: Stan Vassilev , "William A. Rowe Jr." , internals@lists.php.net Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [PHP-DEV] Where are we ACTUALLY on Unicode? From: pierre.php@gmail.com (Pierre Joye) On Sun, Mar 14, 2010 at 3:23 PM, Jordi Boggiano wrote: > On Sun, Mar 14, 2010 at 12:03 PM, Stan Vassilev wrote: >> UTF8 also takes 4 bytes for representing characters in the higher bit >> planes, as quite a lot of bits are lost for every char in order to describe >> how long the code point is, and when it ends and so on. This means >> memory-wise it may not be of big benefit to asian countries. > > I remember Brian Aker saying that they chose to work internally with > UTF-8 for Drizzle. His explanation of it was that asian countries have > so much english content mixed in that on average even for them UTF-8 > still had a lower footprint than UTF-16/32. I do not know where the > stats came from, but if it holds any truth it is worth considering. The idea behind his reasonning was to about optimizing the 90% of the cases while being "fast enough" for the last 10% (could have been other numbers, but that's the idea). For what I remember about our discussions, he also mentioned fast UTF-8 capable string processing implementation (as fast as what UTF-16 could be). I like this the 90/10 approach especially as it actually matches what we have in PHP. Cheers, -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org