Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:106110
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 8bit
Date: Sat, 29 Jun 2019 11:53:23 -0400
To: CHU Zhaowei <me@jhdxr.com>
Cc: 'Nikita Popov' <nikita.ppv@gmail.com>, 'Ben Ramsey' <ben@benramsey.com>,
 'PHP internals' <internals@lists.php.net>
In-Reply-To: <000a01d52e91$2a4b6f20$7ee24d60$@jhdxr.com>
References: <8442f1fa5544b2ca03e7cebbc64e8e5c@wkhudgins.info>
 <EECABEB0-D1E5-4438-A061-477C9BEDC024@simon.geek.nz>
 <eab667bf34e12ee2ad8b6bedabf66cc2@wkhudgins.info>
 <B6702192-0F94-4676-9B44-0DADD0E65060@simon.geek.nz>
 <683c5da474e13283030cac3d0c0ec080@wkhudgins.info>
 <2c37999d1e5372ae6ab48bfce5420796@wkhudgins.info>
 <CAF+90c-YBYSQGkmwS0qa68fpAcQxef_7SGa6LRkavamiwLtpRA@mail.gmail.com>
 <2CF672F8-12F5-4D37-8B8C-591A6E695220@benramsey.com>
 <78034520cdb610d923e25d47ed718938@wkhudgins.info>
 <CAF+90c_RkYUfvMtkwYhB2wy1uKfakAJxk4hjM2yLhNxt56L3UQ@mail.gmail.com>
 <000a01d52e91$2a4b6f20$7ee24d60$@jhdxr.com>
Message-ID: <93528f103c149b7c48e8f35914049d06@wkhudgins.info>
User-Agent: Roundcube Webmail/1.3.3
Subject: Re: [PHP-DEV] [RFC] Desire to move RFC
 add_str_begin_and_end_functions to a vote
From: will@wkhudgins.info

Nikita: I like the idea of splitting the mb_* versions from the main 
vote...I'll have to see how to do that in the docu-wiki GUI but I like 
the idea!

CHU: I will add a note that some userland functions may not be 
compatible with this change although I don't think that should be a 
showstopper, voters can decide as they see fit.

How do people tend to feel about the "str_startswith" vs 
"str_starts_with" naming convention? I've seen people propose both.

Thanks,

Will

On 2019-06-29 11:41, CHU Zhaowei wrote:
> Agreed. I'm wondering why the author choose to use begin(s) /end(s)
> while almost all other popular language has a more clear naming. e.g.
> starts_with or has_prefix.
> 
> In addition, like someone else pointed out two years ago, userland may
> already have functions with the same name, and this should be
> considered as a potential BC break, which is not reflected in the RFC
> yet.
> 
> Regards,
> CHU Zhaowei
> 
>> -----Original Message-----
>> From: Nikita Popov <nikita.ppv@gmail.com>
>> Sent: Saturday, June 29, 2019 6:07 AM
>> To: will@wkhudgins.info
>> Cc: Ben Ramsey <ben@benramsey.com>; PHP internals 
>> <internals@lists.php.net>
>> Subject: Re: [PHP-DEV] [RFC] Desire to move RFC
>> add_str_begin_and_end_functions to a vote
>> 
>> On Fri, Jun 28, 2019 at 10:54 PM <will@wkhudgins.info> wrote:
>> 
>> > These are good points. Originally my RFC called for less functions but
>> > based on feedback I added the others. My proposal: take the RFC as-is
>> > to a vote. If it fails, I will raise another RFC for a vote that will
>> > just contain the two basic functions: str_begins and str_ends.
>> >
>> 
>> To put my comments into more actionable form, here is what I would
>> recommend for this RFC:
>> 
>>  * Rename str_begins -> str_starts_with, str_ends -> str_ends_with, 
>> str_ibegins ->
>> str_starts_with_ci, str_iends -> str_ends_with_ci. As mentioned 
>> before, this is
>> standard terminology used by many, many programming languages and it 
>> would
>> be great if PHP did not deviate from convention without strong reason.
>>  * Have a separate vote (in the same RFC) for the addition of the 
>> corresponding
>> mb_* variants.
>> 
>> I believe doing those two changes will ensure that the core part of 
>> the RFC
>> passes. I personally would be voting yes on the first part and no on 
>> the second,
>> but others may decide as they see fit.
>> 
>> Nikita
>> 
>> 
>> > On 2019-06-22 15:56, Ben Ramsey wrote:
>> > >> On Jun 22, 2019, at 10:32, Nikita Popov <nikita.ppv@gmail.com> wrote:
>> > >>
>> > >>> On Thu, Jun 20, 2019 at 12:32 AM <will@wkhudgins.info> wrote:
>> > >>>
>> > >>> I sent this earlier this week without [RFC] in the subject
>> > >>> line...since some people might have filters to check the subject
>> > >>> line I wanted to send this again with the proper substring in the
>> > >>> subject line–to make it clear I intend to take this to a vote in
>> > >>> two weeks. Apologies for the duplicate email.
>> > >>>
>> > >>> -Will
>> > >>>
>> > >>>> On 2019-06-18 14:45, will@wkhudgins.info wrote:
>> > >>>> Hello all,
>> > >>>>
>> > >>>> I submitted this RFC several years ago. I collected a lot of
>> > >>>> feedback and I have updated the RFC and corresponding github
>> > >>>> patch. Please see the RFC at
>> > >>>> https://wiki.php.net/rfc/add_str_begin_and_end_functions
>> > >>>> and the github patch at https://github.com/php/php-src/pull/2049.
>> > >>>> I have addressed many concerns (order of arguments, name of
>> > >>>> functions, multibye support, etc). I plan to move this RFC to a
>> > >>>> vote in the coming weeks.
>> > >>>>
>> > >>>> Thanks,
>> > >>>>
>> > >>>> Will
>> > >>>
>> > >>
>> > >> Unfortunately, this looks like a case where the RFC feedback has
>> > >> made the proposal worse, rather than better :(
>> > >>
>> > >> I think it's easier to start with what I think this proposal should
>> > >> be:
>> > >> There should be just two functions, str_starts_with() and
>> > >> str_ends_with()
>> > >> -- and that's it.
>> > >>
>> > >> The important realization to have here is that these functions are
>> > >> a bit of sugar for an operation that is quite common, but can also
>> > >> be easily implemented with existing functions (using strcmp, strpos
>> > >> or substr, depending on what you like). There is no need for us to
>> > >> cover every conceivable combination, just make the common case more
>> > >> convenient and easier to read.
>> > >>
>> > >> With that in mind:
>> > >> * I believe the "starts with" and "ends with" naming is a lot more
>> > >> canonical, used by Python, Ruby, Java, JavaScript and probably lots
>> > >> more.
>> > >> * In my experience case-insensitive "i" variants of strings
>> > >> functions are used much less, by an order of magnitude. With this
>> > >> being sugar in the first place, I don't think there's a need to
>> > >> cover case-insensitive variations (and from a quick look, these
>> > >> don't seem to be first class methods in other languages either). If
>> > >> we do want to have them, I'd suggest making the names
>> > >> str_starts_with_ci() and str_ends_with_ci(), which is more obvious
>> > >> and harder to miss than str_istarts_with() etc.
>> > >> * Having mb_* variants of these functions doesn't really make
>> > >> sense. I realize that there's this knee-jerk reaction about how if
>> > >> it doesn't have "mb" in the name it's not Unicode compatible, but
>> > >> in this case it's even more wrong than usual. The normal
>> > >> str_starts_with() function is perfectly safe to use on UTF-8
>> > >> strings, the only difference between it and
>> > >> mb_str_starts_with() is that it's going to be implemented a lot
>> > >> more efficiently. The only case that *might* make some sense is the
>> > >> case-insensitive variant here, because that has some genuine
>> > >> reliance on the character encoding. But then again, this can be
>> > >> handled by case-folding the strings first, something that mbstring
>> > >> is going to do internally anyway.
>> > >>
>> > >> I would happily accept a proposal for str_starts_with() +
>> > >> str_ends_with(), but I'm a lot more apprehensive about adding these
>> > >> 8 new functions.
>> > >>
>> > >> Regards,
>> > >> Nikita
>> > >
>> > >
>> > > I like the idea of simplifying this to the two functions
>> > > str_starts_with() and str_ends_with().
>> > >
>> > > When I was looking through this the other day, I had trouble coming
>> > > up with an example of a string with the mb_* versions would ever
>> > > generate a different result from the non-multibyte versions, since
>> > > the implementation only needs to count and analyze bytes for uniqueness.
>> > > Perhaps it would only be an issue with the case-insensitive
>> > > versions, as Nikita points out? If so, can someone provide some
>> > > example strings where an mb_starts_with_ci() would return true,
>> > > while
>> > > str_starts_with_ci() would return false?
>> > >
>> > > I think the case sensitivity versions would be common enough in use
>> > > cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.),
>> > > but maybe the signatures could be revised to pass a third parameter?
>> > >
>> > > str_starts_with($haystack, $needle, $case_sensitive = true): bool
>> > >
>> > > -Ben
>> >