Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:84433
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
To: internals@lists.php.net,Lester Caine <lester@lsces.co.uk>, internals@lists.php.net
Message-ID: <54FC5465.10208@luni.fr>
Date: Sun, 08 Mar 2015 14:53:41 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
References: <CAGa2bXYa5Lz0JqySVSQ+hGfCW-WxxKWvqFtWaZo=OVf+LLsV8A@mail.gmail.com> <5D8591E2-5AE6-4B4C-AAE0-3D15523410AC@gmail.com> <CAGa2bXa+RyQLuYe72f2m+j5c+YObfPtaOTeknMRU7p3s+96Seg@mail.gmail.com> <CEF2AD42-3CE1-489F-8192-D1DC3D8D8698@gmail.com> <CAGa2bXZDnuE3mKQYD+Sq2=kH=YDXuRQA3o+9wg4v81ZS+h3rLw@mail.gmail.com> <54F83C4D.1020206@gmail.com> <CAEZPtU6ni038E+b0ziAC1b0w=t3gsmBwMjAui4e0sXd8EgbyXQ@mail.gmail.com> <CAL0xaBF7u2h9A5UnVB+-z6SwDtLOVY_qL7B9UGj7w_Lecwct6A@mail.gmail.com> <CAGa2bXa5zER03VrMrtD9aUQ38LK9C_UWU-jbGjzEZUoxbUsSQQ@mail.gmail.com> <CAL0xaBFJtxd3gf9H3ToD0-6mugOBFWR50wB_MRnQ0UZsPWF0Fw@mail.gmail.com> <CAGa2bXaO=Spn5f6qTY8ZrPE8eJ-qwPMS-+2-FHKAVqAnSKsp+Q@mail.gmail.com> <CAEZPtU41SqAf3gV=BY8+g3UNO=k=SyuCER2ch4SNBZ0P4bTbuQ@mail.gmail.com> <CAGa2bXaotozdH6mHcXVDPrYDPjw4dxfrFxKkVWrqwPdQBD4tmA@mail.gmail.com> <54FB3175.3000308@luni.fr> <CAGa2bXZ5ez1Lu2_HRwm_PUQRrTYzO_0bsk3oGbQQ++b_wLzbww@mail.gmail.com> <54FC1E67.3070504@luni.fr> <54FC2FC1.9070008@lsces.co.uk>
In-Reply-To: <54FC2FC1.9070008@lsces.co.uk>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Subject: Re: [PHP-DEV] Consistent function names
From: gregory@luni.fr (=?UTF-8?B?R3LDqWdvcnkgUGxhbmNoYXQ=?=)

Le 08/03/2015 12:17, Lester Caine a écrit :
> On 08/03/15 10:03, Grégory Planchat wrote:
>> Then using multiple encodings in a same script or using a same script
>> for multiple encodings becomes straightforward and standard. Most PHP
>> developers doesn't even know what is Unicode or a character encoding,
>> they just see "odd characters that are removed with a header() call or
>> utf8_decode()", no teasing intended, they just don't want to have to
>> handle this. PHP should not let this sort of consideration to the sole
>> awareness of user-space developers.
>
> Not part of THIS discussion exactly, but I have to take that in
> isolation. 'Most PHP developers' need to be very aware of Unicode these
> days. Simply pretending it does not exist is a deangerous exercise and
> my own code base has been UTF8 for several years now. Even though I
> don't speak anything but English, a large section of the material one
> has to handle has characters which get lost if one does not maintain
> UTF8 through out the process. People are going on about 'data loss' when
> converting, and that applies equally to strings as numbers.
>
> The default encoding these days is UTF8 ...
>

This is not exactly what I meant, and your point is the way things 
should be, of course.

What I meant is that a text search or fetching the size of a string 
*MUST* behave the same way, whatever which encoding you use, without 
having to know what is the actual enconding of the string at any time.

Currently a strlen on an UTF-8 behaves more like a C "sizeof(str) - 1" 
when you are using other characters than the ASCII page.

The idea is really making these statements work, whatever the encoding 
you are using :

"Lorem ipsum dolor sit amet"->length();
"Lorem ipsum dolor sit amet"->search('lorem');
"Lorem ipsum dolor sit amet"->replace('lorem', 'Lorem');

Grégory Planchat