Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:47302 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 70469 invoked from network); 16 Mar 2010 12:15:28 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Mar 2010 12:15:28 -0000 Authentication-Results: pb1.pair.com smtp.mail=php@hristov.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php@hristov.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain hristov.com from 85.92.87.36 cause and error) X-PHP-List-Original-Sender: php@hristov.com X-Host-Fingerprint: 85.92.87.36 iko.gotobg.net Linux 2.6 Received: from [85.92.87.36] ([85.92.87.36:59292] helo=iko.gotobg.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 74/B7-15129-E567F9B4 for ; Tue, 16 Mar 2010 07:15:27 -0500 Received: from g226141143.adsl.alicedsl.de ([92.226.141.143] helo=[192.168.1.127]) by iko.gotobg.net with esmtpa (Exim 4.69) (envelope-from ) id 1NrVg1-0006v6-6m; Tue, 16 Mar 2010 14:15:13 +0200 Message-ID: <4B9F7652.7090000@hristov.com> Date: Tue, 16 Mar 2010 13:15:14 +0100 User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: dreamcat four CC: Lester Caine , PHP internals References: <4B9C9007.1080802@lsces.co.uk> <4B9EC3B2.7070901@zend.com> <4B9F4196.9030404@lsces.co.uk> <99cf22521003160448k5028ae61y70e1e61428d13280@mail.gmail.com> In-Reply-To: <99cf22521003160448k5028ae61y70e1e61428d13280@mail.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - iko.gotobg.net X-AntiAbuse: Original Domain - lists.php.net X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - hristov.com X-Source: X-Source-Args: X-Source-Dir: Subject: Re: [PHP-DEV] Where are we ACTUALLY on Unicode? From: php@hristov.com (Andrey Hristov) dreamcat four wrote: > On Tue, Mar 16, 2010 at 8:30 AM, Lester Caine wrote: >> '3' is not a very processor friendly number, so working with 4 even though >> wasteful on memory, does make perfect sense. How long is it since we had a >> 640k limit on working memory? SERVERS should have a good amount of memory >> for caching information anyway. SO is UTF-16 the right approach for >> processing wide strings? It needs special code to handle everything wider >> than 16 bits, but at what gain really? If all core functionality is handled >> as 32 bit characters is there that much of an overhead over the additional >> processing to get around strings of dissimilar sizes in UTF-16 ? > > Just to re-enforce some of Lester's points above here. > > 4-byte per character is never slower that 2-bytes per character... its > faster if anything. Bear in mind that 4-byte has been the defacto size > for all modern cpu registers / 32-bit microarchitectures since.... > like... Forever. Give a c compiler 4bytes of data... it'll say: thank > you very much, and more of the same please! It keeps em happy ;) > > Sure UTF-16 can make sense. But only if your external representations > are also in UTF-16. So whats the default Unicode settings for MYSQL, > POSTGRE, etc? Well, are they always set to UTF-8, or UTF-16? > > Just do the same as them. > All MySQL GA versions (not including the upcoming 5.5 which is not GA) can't eat UTF-16 queries but can receive UTF-16 results (although all MySQL GA releases that know character sets, 4.1, 5.0, 5.1, don't know anything about UTF-16 but only UCS-2, which are the characters in the BMP). It is probable (I can't say definitely due to Oracle's recognition rules) that 5.5 will have proper UTF-16. UTF-16 has its advantages. If your unicode data includes mostly ASCII characters and here and there some non-ascii ones, then UTF-8 should be the choice - less disk space used, which means the HDD can read more data which in turn means more table rows server per second. Converting in the client (PHP) is ok, as it scales, just throw some more web servers. Scaling a RDBMS is completely different story Best, Andrey