Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:106095
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 8bit
Date: Fri, 28 Jun 2019 16:54:49 -0400
To: Ben Ramsey <ben@benramsey.com>
Cc: Nikita Popov <nikita.ppv@gmail.com>, PHP internals
 <internals@lists.php.net>
In-Reply-To: <2CF672F8-12F5-4D37-8B8C-591A6E695220@benramsey.com>
References: <8442f1fa5544b2ca03e7cebbc64e8e5c@wkhudgins.info>
 <EECABEB0-D1E5-4438-A061-477C9BEDC024@simon.geek.nz>
 <eab667bf34e12ee2ad8b6bedabf66cc2@wkhudgins.info>
 <B6702192-0F94-4676-9B44-0DADD0E65060@simon.geek.nz>
 <683c5da474e13283030cac3d0c0ec080@wkhudgins.info>
 <2c37999d1e5372ae6ab48bfce5420796@wkhudgins.info>
 <CAF+90c-YBYSQGkmwS0qa68fpAcQxef_7SGa6LRkavamiwLtpRA@mail.gmail.com>
 <2CF672F8-12F5-4D37-8B8C-591A6E695220@benramsey.com>
Message-ID: <78034520cdb610d923e25d47ed718938@wkhudgins.info>
User-Agent: Roundcube Webmail/1.3.3
Subject: Re: [PHP-DEV] [RFC] Desire to move RFC
 add_str_begin_and_end_functions to a vote
From: will@wkhudgins.info

These are good points. Originally my RFC called for less functions but 
based on feedback I added the others. My proposal: take the RFC as-is to 
a vote. If it fails, I will raise another RFC for a vote that will just 
contain the two basic functions: str_begins and str_ends.

Thanks,

Will

On 2019-06-22 15:56, Ben Ramsey wrote:
>> On Jun 22, 2019, at 10:32, Nikita Popov <nikita.ppv@gmail.com> wrote:
>> 
>>> On Thu, Jun 20, 2019 at 12:32 AM <will@wkhudgins.info> wrote:
>>> 
>>> I sent this earlier this week without [RFC] in the subject 
>>> line...since
>>> some people might have filters to check the subject line I wanted to
>>> send this again with the proper substring in the subject line–to make 
>>> it
>>> clear I intend to take this to a vote in two weeks. Apologies for the
>>> duplicate email.
>>> 
>>> -Will
>>> 
>>>> On 2019-06-18 14:45, will@wkhudgins.info wrote:
>>>> Hello all,
>>>> 
>>>> I submitted this RFC several years ago. I collected a lot of 
>>>> feedback
>>>> and I have updated the RFC and corresponding github patch. Please 
>>>> see
>>>> the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions
>>>> and the github patch at https://github.com/php/php-src/pull/2049. I
>>>> have addressed many concerns
>>>> (order of arguments, name of functions, multibye support, etc). I 
>>>> plan
>>>> to move this RFC to a vote in the coming weeks.
>>>> 
>>>> Thanks,
>>>> 
>>>> Will
>>> 
>> 
>> Unfortunately, this looks like a case where the RFC feedback has made 
>> the
>> proposal worse, rather than better :(
>> 
>> I think it's easier to start with what I think this proposal should 
>> be:
>> There should be just two functions, str_starts_with() and 
>> str_ends_with()
>> -- and that's it.
>> 
>> The important realization to have here is that these functions are a 
>> bit of
>> sugar for an operation that is quite common, but can also be easily
>> implemented with existing functions (using strcmp, strpos or substr,
>> depending on what you like). There is no need for us to cover every
>> conceivable combination, just make the common case more convenient and
>> easier to read.
>> 
>> With that in mind:
>> * I believe the "starts with" and "ends with" naming is a lot more
>> canonical, used by Python, Ruby, Java, JavaScript and probably lots 
>> more.
>> * In my experience case-insensitive "i" variants of strings functions 
>> are
>> used much less, by an order of magnitude. With this being sugar in the
>> first place, I don't think there's a need to cover case-insensitive
>> variations (and from a quick look, these don't seem to be first class
>> methods in other languages either). If we do want to have them, I'd 
>> suggest
>> making the names str_starts_with_ci() and str_ends_with_ci(), which is 
>> more
>> obvious and harder to miss than str_istarts_with() etc.
>> * Having mb_* variants of these functions doesn't really make sense. I
>> realize that there's this knee-jerk reaction about how if it doesn't 
>> have
>> "mb" in the name it's not Unicode compatible, but in this case it's 
>> even
>> more wrong than usual. The normal str_starts_with() function is 
>> perfectly
>> safe to use on UTF-8 strings, the only difference between it and
>> mb_str_starts_with() is that it's going to be implemented a lot more
>> efficiently. The only case that *might* make some sense is the
>> case-insensitive variant here, because that has some genuine reliance 
>> on
>> the character encoding. But then again, this can be handled by 
>> case-folding
>> the strings first, something that mbstring is going to do internally 
>> anyway.
>> 
>> I would happily accept a proposal for str_starts_with() + 
>> str_ends_with(),
>> but I'm a lot more apprehensive about adding these 8 new functions.
>> 
>> Regards,
>> Nikita
> 
> 
> I like the idea of simplifying this to the two functions
> str_starts_with() and str_ends_with().
> 
> When I was looking through this the other day, I had trouble coming up
> with an example of a string with the mb_* versions would ever generate
> a different result from the non-multibyte versions, since the
> implementation only needs to count and analyze bytes for uniqueness.
> Perhaps it would only be an issue with the case-insensitive versions,
> as Nikita points out? If so, can someone provide some example strings
> where an mb_starts_with_ci() would return true, while
> str_starts_with_ci() would return false?
> 
> I think the case sensitivity versions would be common enough in use
> cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.),
> but maybe the signatures could be revised to pass a third parameter?
> 
> str_starts_with($haystack, $needle, $case_sensitive = true): bool
> 
> -Ben