Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:47333 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 19112 invoked from network); 16 Mar 2010 20:04:29 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Mar 2010 20:04:29 -0000 Authentication-Results: pb1.pair.com header.from=tyra3l@gmail.com; sender-id=pass; domainkeys=bad Authentication-Results: pb1.pair.com smtp.mail=tyra3l@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.218.209 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: tyra3l@gmail.com X-Host-Fingerprint: 209.85.218.209 mail-bw0-f209.google.com Received: from [209.85.218.209] ([209.85.218.209:34395] helo=mail-bw0-f209.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 5E/A3-15129-B44EF9B4 for ; Tue, 16 Mar 2010 15:04:28 -0500 Received: by bwz1 with SMTP id 1so348229bwz.1 for ; Tue, 16 Mar 2010 13:04:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=fxLGaBsEPer8a65fT7NEMZsgOGka+3Z+Y6v8qclpMuc=; b=O95oXgw2JsJVHyT2oY+sWjfTq3hr6XMTNsGdfyW4cd+j84wlxNKkrajabZys7HOJvV Oxt42yBPn5q3DP6nlN0GeVoITftHfTChbbb8KKlSpcblkC9wWus75np7OnyJLAeiU9we cDDyOB5zahPbJ0BPs9xCEFEuqpgI9zHRoK6hM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=rcJrGtqwT5opov6GdQuy5zL46MQVn7OXc8/Zy3GXLKJ35GSdZdTrwGYRHHsEkZw6kS pXTcztW3wmQgSMGxX8mm/Cr4781v+HlS+nZlujEedE9waeioZ4uGN1p3Hr6kUbwsvRGg K90mYY4CJe9B/QjYvULNfe0EGJqIuum6lWSj8= MIME-Version: 1.0 Received: by 10.204.10.17 with SMTP id n17mr15696bkn.149.1268769865002; Tue, 16 Mar 2010 13:04:25 -0700 (PDT) In-Reply-To: <4B9FD68B.5020900@zend.com> References: <4B9C9007.1080802@lsces.co.uk> <4B9EC3B2.7070901@zend.com> <4B9F4196.9030404@lsces.co.uk> <4B9FD68B.5020900@zend.com> Date: Tue, 16 Mar 2010 21:04:24 +0100 Message-ID: To: Stanislav Malyshev Cc: Lester Caine , PHP internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Where are we ACTUALLY on Unicode? From: tyra3l@gmail.com (Ferenc Kovacs) On Tue, Mar 16, 2010 at 8:05 PM, Stanislav Malyshev wrote: > Hi! > >> On disk storage should probably be UTF-8 without any question? Windows >> use of widestrings for some files simple doubles up the on disk storage > > As file content, it's OK (an it'd be easy to add option to specify conten= t > transformation if we wanted), but prescribing filenames as UTF-8 would > probably be not workable, since different systems (and maybe even differe= nt > filesystems inside same OS?) can have different opinions on that. > >> '3' is not a very processor friendly number, so working with 4 even >> though wasteful on memory, does make perfect sense. How long is it since > > I'm not sure it does. Most of PHP strings are short, so memory loss would= be > very significant. Also, take into account that CPU caches aren't as big a= s > the main memory, and not fitting your data into the cache is expensive. > >> we had a 640k limit on working memory? SERVERS should have a good amount > > It doesn't matter how much memory you have, in numbers. Until we find an > unlimited source of computer memory left by the aliens in Himalayas, memo= ry > costs money. It doesn't matter how much memory do you have - however many > gigs you have, you'll be able to run 3 times less PHP processes in new > version on the same hardware than in old version, which means new PHP wou= ld > cost you more to run. "Memory is cheap" is a very misunderstood expressio= n - > it's only cheap if you always have much more than you need. > >> Probably 90% of the time a string will come in and go out without >> requiring any processing at all, so leave it as UTF-8 ? The only time we > > It might be great if we could do that. The problem might be that right no= w > AFAIK we don't have a good library to work with utf-8 strings (please > correct me if I'm wrong here). http://source.icu-project.org/repos/icu/icuhtml/trunk/design/strings/icu_ut= f8.html from ICU 3.6 changelog =3D> The UTF-8 transformation functions and macros are faster. from 4.2 =3D> UTF-8 friendly internal data structure for Unicode data looku= p so it's seems that guys at ICU tries to close the gap between the UTF-16 and UTF-8 performance, so maybe it would be a good idea, to check out the current situation. Tyrael > -- > Stanislav Malyshev, Zend Software Architect > stas@zend.com =C2=A0 http://www.zend.com/ > (408)253-8829 =C2=A0 MSN: stas@zend.com > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >