Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:47264 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 68693 invoked from network); 14 Mar 2010 14:01:04 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Mar 2010 14:01:04 -0000 Authentication-Results: pb1.pair.com smtp.mail=pierre.php@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=pierre.php@gmail.com; sender-id=pass; domainkeys=bad Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.42 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: pierre.php@gmail.com X-Host-Fingerprint: 74.125.82.42 mail-ww0-f42.google.com Received: from [74.125.82.42] ([74.125.82.42:63261] helo=mail-ww0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 35/B0-63734-E1CEC9B4 for ; Sun, 14 Mar 2010 09:01:03 -0500 Received: by wwc33 with SMTP id 33so1810785wwc.29 for ; Sun, 14 Mar 2010 07:01:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=yF0dpSBm2/Kk8JkmcK505s8PjwkXtKS4DV2P3az2ZxU=; b=UcvE6XWIVl5hAogzZBlepa1mIm88BqK8p78vUCKslJBwYgaNux1BL4LCk20V4jwGUt 8X92ZGKpS7lRUaNDl0tJGwXStID6vtEgz0pU4nk2LvMdAKpL8Ve5kVuWoagGlmw7ljlu LT+2ZeZYrUqhX01hIMHeijxgAUiuLAGIFgQKo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=PI2Lr7VZMovPUhAgrqzmOxZ8LHtJhzeJLccPyJSUpDxwADvx4a31KWZzTbQ8/ztFP6 /h7RMBcDy2HRa3cQsXpYFMV2l6kGijG4n8leq/JTMaHF6BIVQoU2giU4RYTsmFEgYXaP LkN25qa5f0ngDsWINY2Mb4CcG8+/ZCofLFdNI= MIME-Version: 1.0 Received: by 10.216.85.12 with SMTP id t12mr2381708wee.158.1268575259962; Sun, 14 Mar 2010 07:00:59 -0700 (PDT) In-Reply-To: <13008E62F851429F84B9FE2F3F230286@pc> References: <4B9C9007.1080802@lsces.co.uk> <4B9C91D7.2050402@rowe-clan.net> <13008E62F851429F84B9FE2F3F230286@pc> Date: Sun, 14 Mar 2010 15:00:59 +0100 Message-ID: To: Stan Vassilev Cc: "William A. Rowe Jr." , internals@lists.php.net Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [PHP-DEV] Where are we ACTUALLY on Unicode? From: pierre.php@gmail.com (Pierre Joye) hi, On Sun, Mar 14, 2010 at 12:03 PM, Stan Vassilev wrote: > UTF8 is good for text that contains mostly ASCII chars and the occasional > Unicode international chars. It's also generally ok for storing and passing > strings between apps. That's not completely correct. UTF-8 is used out there for almost unicode only applications as well. I'd to say it is a matter of what the projects are written for. See below. > > Still, having variable-width encoding UTF8 or UTF16 doesn't cut it for > general use to me as in tests it shows drastic slowdown when the script > needs to do heavy string processing. I'd rather have it take more RAM for > Unicode strings while being fast, and use Latin-1 when what I need is > Latin-1. The problem I have with UTF-16 is that it does not fit well with PHP usage. While you are right about the performence vs memory usage, it is sadly a small part of the problem. If you take a look at the current implementation (trunk, which uses UTF-16), we have to convert to UTF-8 almost everywhere as long as we deal with external APIs (file systems or other libs). The win we may have from using UTF-16 is almost completely lost by the conversions cost. That obviously does not apply for scripts using only core PHP features (no file access, no extension usage, etc.), but these scripts are barely real worlds use cases. Please not that I'm not voting against UTF-16 or for UTF-8, but I would like to have a real evaluation this time, unlike what has been done for trunk a couple of years ago. Cheers, -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org