Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:106110 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 86717 invoked from network); 29 Jun 2019 18:36:13 -0000 Received: from unknown (HELO gateway4.unifiedlayer.com) (69.89.20.149) by pb1.pair.com with SMTP; 29 Jun 2019 18:36:13 -0000 Received: from cm3.websitewelcome.com (unknown [108.167.139.23]) by gateway4.unifiedlayer.com (Postfix) with ESMTP id 3CBB7200917CB for ; Sat, 29 Jun 2019 10:53:24 -0500 (CDT) Received: from krieger.asoshared.com ([65.99.237.153]) by cmsmtp with ESMTP id hFfIhocczzc5jhFfIhhIUr; Sat, 29 Jun 2019 10:53:24 -0500 X-Authority-Reason: nr=8 Received: from krieger.asoshared.com ([65.99.237.153]:28885) by krieger.asoshared.com with esmtpa (Exim 4.91) (envelope-from ) id 1hhFfH-002qLm-PF; Sat, 29 Jun 2019 11:53:23 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Sat, 29 Jun 2019 11:53:23 -0400 To: CHU Zhaowei Cc: 'Nikita Popov' , 'Ben Ramsey' , 'PHP internals' In-Reply-To: <000a01d52e91$2a4b6f20$7ee24d60$@jhdxr.com> References: <8442f1fa5544b2ca03e7cebbc64e8e5c@wkhudgins.info> <683c5da474e13283030cac3d0c0ec080@wkhudgins.info> <2c37999d1e5372ae6ab48bfce5420796@wkhudgins.info> <2CF672F8-12F5-4D37-8B8C-591A6E695220@benramsey.com> <78034520cdb610d923e25d47ed718938@wkhudgins.info> <000a01d52e91$2a4b6f20$7ee24d60$@jhdxr.com> Message-ID: <93528f103c149b7c48e8f35914049d06@wkhudgins.info> X-Sender: will@wkhudgins.info User-Agent: Roundcube Webmail/1.3.3 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - krieger.asoshared.com X-AntiAbuse: Original Domain - lists.php.net X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - wkhudgins.info X-BWhitelist: no X-Source-IP: 65.99.237.153 X-Source-L: Yes X-Exim-ID: 1hhFfH-002qLm-PF X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: krieger.asoshared.com [65.99.237.153]:28885 X-Source-Auth: will@wkhudgins.info X-Email-Count: 1 X-Source-Cap: d2todWRnaW47d2todWRnaW47a3JpZWdlci5hc29zaGFyZWQuY29t X-Local-Domain: yes Subject: Re: [PHP-DEV] [RFC] Desire to move RFC add_str_begin_and_end_functions to a vote From: will@wkhudgins.info Nikita: I like the idea of splitting the mb_* versions from the main vote...I'll have to see how to do that in the docu-wiki GUI but I like the idea! CHU: I will add a note that some userland functions may not be compatible with this change although I don't think that should be a showstopper, voters can decide as they see fit. How do people tend to feel about the "str_startswith" vs "str_starts_with" naming convention? I've seen people propose both. Thanks, Will On 2019-06-29 11:41, CHU Zhaowei wrote: > Agreed. I'm wondering why the author choose to use begin(s) /end(s) > while almost all other popular language has a more clear naming. e.g. > starts_with or has_prefix. > > In addition, like someone else pointed out two years ago, userland may > already have functions with the same name, and this should be > considered as a potential BC break, which is not reflected in the RFC > yet. > > Regards, > CHU Zhaowei > >> -----Original Message----- >> From: Nikita Popov >> Sent: Saturday, June 29, 2019 6:07 AM >> To: will@wkhudgins.info >> Cc: Ben Ramsey ; PHP internals >> >> Subject: Re: [PHP-DEV] [RFC] Desire to move RFC >> add_str_begin_and_end_functions to a vote >> >> On Fri, Jun 28, 2019 at 10:54 PM wrote: >> >> > These are good points. Originally my RFC called for less functions but >> > based on feedback I added the others. My proposal: take the RFC as-is >> > to a vote. If it fails, I will raise another RFC for a vote that will >> > just contain the two basic functions: str_begins and str_ends. >> > >> >> To put my comments into more actionable form, here is what I would >> recommend for this RFC: >> >> * Rename str_begins -> str_starts_with, str_ends -> str_ends_with, >> str_ibegins -> >> str_starts_with_ci, str_iends -> str_ends_with_ci. As mentioned >> before, this is >> standard terminology used by many, many programming languages and it >> would >> be great if PHP did not deviate from convention without strong reason. >> * Have a separate vote (in the same RFC) for the addition of the >> corresponding >> mb_* variants. >> >> I believe doing those two changes will ensure that the core part of >> the RFC >> passes. I personally would be voting yes on the first part and no on >> the second, >> but others may decide as they see fit. >> >> Nikita >> >> >> > On 2019-06-22 15:56, Ben Ramsey wrote: >> > >> On Jun 22, 2019, at 10:32, Nikita Popov wrote: >> > >> >> > >>> On Thu, Jun 20, 2019 at 12:32 AM wrote: >> > >>> >> > >>> I sent this earlier this week without [RFC] in the subject >> > >>> line...since some people might have filters to check the subject >> > >>> line I wanted to send this again with the proper substring in the >> > >>> subject lineā€“to make it clear I intend to take this to a vote in >> > >>> two weeks. Apologies for the duplicate email. >> > >>> >> > >>> -Will >> > >>> >> > >>>> On 2019-06-18 14:45, will@wkhudgins.info wrote: >> > >>>> Hello all, >> > >>>> >> > >>>> I submitted this RFC several years ago. I collected a lot of >> > >>>> feedback and I have updated the RFC and corresponding github >> > >>>> patch. Please see the RFC at >> > >>>> https://wiki.php.net/rfc/add_str_begin_and_end_functions >> > >>>> and the github patch at https://github.com/php/php-src/pull/2049. >> > >>>> I have addressed many concerns (order of arguments, name of >> > >>>> functions, multibye support, etc). I plan to move this RFC to a >> > >>>> vote in the coming weeks. >> > >>>> >> > >>>> Thanks, >> > >>>> >> > >>>> Will >> > >>> >> > >> >> > >> Unfortunately, this looks like a case where the RFC feedback has >> > >> made the proposal worse, rather than better :( >> > >> >> > >> I think it's easier to start with what I think this proposal should >> > >> be: >> > >> There should be just two functions, str_starts_with() and >> > >> str_ends_with() >> > >> -- and that's it. >> > >> >> > >> The important realization to have here is that these functions are >> > >> a bit of sugar for an operation that is quite common, but can also >> > >> be easily implemented with existing functions (using strcmp, strpos >> > >> or substr, depending on what you like). There is no need for us to >> > >> cover every conceivable combination, just make the common case more >> > >> convenient and easier to read. >> > >> >> > >> With that in mind: >> > >> * I believe the "starts with" and "ends with" naming is a lot more >> > >> canonical, used by Python, Ruby, Java, JavaScript and probably lots >> > >> more. >> > >> * In my experience case-insensitive "i" variants of strings >> > >> functions are used much less, by an order of magnitude. With this >> > >> being sugar in the first place, I don't think there's a need to >> > >> cover case-insensitive variations (and from a quick look, these >> > >> don't seem to be first class methods in other languages either). If >> > >> we do want to have them, I'd suggest making the names >> > >> str_starts_with_ci() and str_ends_with_ci(), which is more obvious >> > >> and harder to miss than str_istarts_with() etc. >> > >> * Having mb_* variants of these functions doesn't really make >> > >> sense. I realize that there's this knee-jerk reaction about how if >> > >> it doesn't have "mb" in the name it's not Unicode compatible, but >> > >> in this case it's even more wrong than usual. The normal >> > >> str_starts_with() function is perfectly safe to use on UTF-8 >> > >> strings, the only difference between it and >> > >> mb_str_starts_with() is that it's going to be implemented a lot >> > >> more efficiently. The only case that *might* make some sense is the >> > >> case-insensitive variant here, because that has some genuine >> > >> reliance on the character encoding. But then again, this can be >> > >> handled by case-folding the strings first, something that mbstring >> > >> is going to do internally anyway. >> > >> >> > >> I would happily accept a proposal for str_starts_with() + >> > >> str_ends_with(), but I'm a lot more apprehensive about adding these >> > >> 8 new functions. >> > >> >> > >> Regards, >> > >> Nikita >> > > >> > > >> > > I like the idea of simplifying this to the two functions >> > > str_starts_with() and str_ends_with(). >> > > >> > > When I was looking through this the other day, I had trouble coming >> > > up with an example of a string with the mb_* versions would ever >> > > generate a different result from the non-multibyte versions, since >> > > the implementation only needs to count and analyze bytes for uniqueness. >> > > Perhaps it would only be an issue with the case-insensitive >> > > versions, as Nikita points out? If so, can someone provide some >> > > example strings where an mb_starts_with_ci() would return true, >> > > while >> > > str_starts_with_ci() would return false? >> > > >> > > I think the case sensitivity versions would be common enough in use >> > > cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.), >> > > but maybe the signatures could be revised to pass a third parameter? >> > > >> > > str_starts_with($haystack, $needle, $case_sensitive = true): bool >> > > >> > > -Ben >> >