Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:84433 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 99692 invoked from network); 8 Mar 2015 13:53:46 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 8 Mar 2015 13:53:46 -0000 X-Host-Fingerprint: 78.217.8.218 gaz43-2-78-217-8-218.fbx.proxad.net Received: from [78.217.8.218] ([78.217.8.218:24773] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id E3/42-17427-8645CF45 for ; Sun, 08 Mar 2015 08:53:45 -0500 To: internals@lists.php.net,Lester Caine , internals@lists.php.net Message-ID: <54FC5465.10208@luni.fr> Date: Sun, 08 Mar 2015 14:53:41 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 References: <5D8591E2-5AE6-4B4C-AAE0-3D15523410AC@gmail.com> <54F83C4D.1020206@gmail.com> <54FB3175.3000308@luni.fr> <54FC1E67.3070504@luni.fr> <54FC2FC1.9070008@lsces.co.uk> In-Reply-To: <54FC2FC1.9070008@lsces.co.uk> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Posted-By: 78.217.8.218 Subject: Re: [PHP-DEV] Consistent function names From: gregory@luni.fr (=?UTF-8?B?R3LDqWdvcnkgUGxhbmNoYXQ=?=) Le 08/03/2015 12:17, Lester Caine a écrit : > On 08/03/15 10:03, Grégory Planchat wrote: >> Then using multiple encodings in a same script or using a same script >> for multiple encodings becomes straightforward and standard. Most PHP >> developers doesn't even know what is Unicode or a character encoding, >> they just see "odd characters that are removed with a header() call or >> utf8_decode()", no teasing intended, they just don't want to have to >> handle this. PHP should not let this sort of consideration to the sole >> awareness of user-space developers. > > Not part of THIS discussion exactly, but I have to take that in > isolation. 'Most PHP developers' need to be very aware of Unicode these > days. Simply pretending it does not exist is a deangerous exercise and > my own code base has been UTF8 for several years now. Even though I > don't speak anything but English, a large section of the material one > has to handle has characters which get lost if one does not maintain > UTF8 through out the process. People are going on about 'data loss' when > converting, and that applies equally to strings as numbers. > > The default encoding these days is UTF8 ... > This is not exactly what I meant, and your point is the way things should be, of course. What I meant is that a text search or fetching the size of a string *MUST* behave the same way, whatever which encoding you use, without having to know what is the actual enconding of the string at any time. Currently a strlen on an UTF-8 behaves more like a C "sizeof(str) - 1" when you are using other characters than the ASCII page. The idea is really making these statements work, whatever the encoding you are using : "Lorem ipsum dolor sit amet"->length(); "Lorem ipsum dolor sit amet"->search('lorem'); "Lorem ipsum dolor sit amet"->replace('lorem', 'Lorem'); Grégory Planchat