Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:47328 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 5498 invoked from network); 16 Mar 2010 19:05:51 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Mar 2010 19:05:51 -0000 Authentication-Results: pb1.pair.com header.from=stas@zend.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=stas@zend.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 63.205.162.117 as permitted sender) X-PHP-List-Original-Sender: stas@zend.com X-Host-Fingerprint: 63.205.162.117 us-mr1.zend.com Received: from [63.205.162.117] ([63.205.162.117:40485] helo=us-mr1.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 93/01-15129-E86DF9B4 for ; Tue, 16 Mar 2010 14:05:50 -0500 Received: from us-gw1.zend.com (us-ex1.zend.net [192.168.16.5]) by us-mr1.zend.com (Postfix) with ESMTP id CDE1243F73; Tue, 16 Mar 2010 12:05:01 -0700 (PDT) Received: from [192.168.16.93] ([192.168.16.93]) by us-gw1.zend.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 16 Mar 2010 12:05:46 -0700 Message-ID: <4B9FD68B.5020900@zend.com> Date: Tue, 16 Mar 2010 12:05:47 -0700 Organization: Zend Technologies User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 To: Lester Caine CC: PHP internals References: <4B9C9007.1080802@lsces.co.uk> <4B9EC3B2.7070901@zend.com> <4B9F4196.9030404@lsces.co.uk> In-Reply-To: <4B9F4196.9030404@lsces.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 16 Mar 2010 19:05:46.0984 (UTC) FILETIME=[AF64C680:01CAC53B] Subject: Re: [PHP-DEV] Where are we ACTUALLY on Unicode? From: stas@zend.com (Stanislav Malyshev) Hi! > On disk storage should probably be UTF-8 without any question? Windows > use of widestrings for some files simple doubles up the on disk storage As file content, it's OK (an it'd be easy to add option to specify content transformation if we wanted), but prescribing filenames as UTF-8 would probably be not workable, since different systems (and maybe even different filesystems inside same OS?) can have different opinions on that. > '3' is not a very processor friendly number, so working with 4 even > though wasteful on memory, does make perfect sense. How long is it since I'm not sure it does. Most of PHP strings are short, so memory loss would be very significant. Also, take into account that CPU caches aren't as big as the main memory, and not fitting your data into the cache is expensive. > we had a 640k limit on working memory? SERVERS should have a good amount It doesn't matter how much memory you have, in numbers. Until we find an unlimited source of computer memory left by the aliens in Himalayas, memory costs money. It doesn't matter how much memory do you have - however many gigs you have, you'll be able to run 3 times less PHP processes in new version on the same hardware than in old version, which means new PHP would cost you more to run. "Memory is cheap" is a very misunderstood expression - it's only cheap if you always have much more than you need. > Probably 90% of the time a string will come in and go out without > requiring any processing at all, so leave it as UTF-8 ? The only time we It might be great if we could do that. The problem might be that right now AFAIK we don't have a good library to work with utf-8 strings (please correct me if I'm wrong here). -- Stanislav Malyshev, Zend Software Architect stas@zend.com http://www.zend.com/ (408)253-8829 MSN: stas@zend.com