Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:106095 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 85336 invoked from network); 28 Jun 2019 23:37:51 -0000 Received: from unknown (HELO gateway9.unifiedlayer.com) (74.220.217.99) by pb1.pair.com with SMTP; 28 Jun 2019 23:37:51 -0000 Received: from cm4.websitewelcome.com (unknown [108.167.139.16]) by gateway9.unifiedlayer.com (Postfix) with ESMTP id 4FE9F200A758E for ; Fri, 28 Jun 2019 15:54:50 -0500 (CDT) Received: from krieger.asoshared.com ([65.99.237.153]) by cmsmtp with ESMTP id gxtShzrHlDhm0gxtShCuHM; Fri, 28 Jun 2019 15:54:50 -0500 X-Authority-Reason: nr=8 Received: from krieger.asoshared.com ([65.99.237.153]:24254) by krieger.asoshared.com with esmtpa (Exim 4.91) (envelope-from ) id 1hgxtR-000YFf-Ui; Fri, 28 Jun 2019 16:54:49 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Fri, 28 Jun 2019 16:54:49 -0400 To: Ben Ramsey Cc: Nikita Popov , PHP internals In-Reply-To: <2CF672F8-12F5-4D37-8B8C-591A6E695220@benramsey.com> References: <8442f1fa5544b2ca03e7cebbc64e8e5c@wkhudgins.info> <683c5da474e13283030cac3d0c0ec080@wkhudgins.info> <2c37999d1e5372ae6ab48bfce5420796@wkhudgins.info> <2CF672F8-12F5-4D37-8B8C-591A6E695220@benramsey.com> Message-ID: <78034520cdb610d923e25d47ed718938@wkhudgins.info> X-Sender: will@wkhudgins.info User-Agent: Roundcube Webmail/1.3.3 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - krieger.asoshared.com X-AntiAbuse: Original Domain - lists.php.net X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - wkhudgins.info X-BWhitelist: no X-Source-IP: 65.99.237.153 X-Source-L: Yes X-Exim-ID: 1hgxtR-000YFf-Ui X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: krieger.asoshared.com [65.99.237.153]:24254 X-Source-Auth: will@wkhudgins.info X-Email-Count: 1 X-Source-Cap: d2todWRnaW47d2todWRnaW47a3JpZWdlci5hc29zaGFyZWQuY29t X-Local-Domain: yes Subject: Re: [PHP-DEV] [RFC] Desire to move RFC add_str_begin_and_end_functions to a vote From: will@wkhudgins.info These are good points. Originally my RFC called for less functions but based on feedback I added the others. My proposal: take the RFC as-is to a vote. If it fails, I will raise another RFC for a vote that will just contain the two basic functions: str_begins and str_ends. Thanks, Will On 2019-06-22 15:56, Ben Ramsey wrote: >> On Jun 22, 2019, at 10:32, Nikita Popov wrote: >> >>> On Thu, Jun 20, 2019 at 12:32 AM wrote: >>> >>> I sent this earlier this week without [RFC] in the subject >>> line...since >>> some people might have filters to check the subject line I wanted to >>> send this again with the proper substring in the subject lineā€“to make >>> it >>> clear I intend to take this to a vote in two weeks. Apologies for the >>> duplicate email. >>> >>> -Will >>> >>>> On 2019-06-18 14:45, will@wkhudgins.info wrote: >>>> Hello all, >>>> >>>> I submitted this RFC several years ago. I collected a lot of >>>> feedback >>>> and I have updated the RFC and corresponding github patch. Please >>>> see >>>> the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions >>>> and the github patch at https://github.com/php/php-src/pull/2049. I >>>> have addressed many concerns >>>> (order of arguments, name of functions, multibye support, etc). I >>>> plan >>>> to move this RFC to a vote in the coming weeks. >>>> >>>> Thanks, >>>> >>>> Will >>> >> >> Unfortunately, this looks like a case where the RFC feedback has made >> the >> proposal worse, rather than better :( >> >> I think it's easier to start with what I think this proposal should >> be: >> There should be just two functions, str_starts_with() and >> str_ends_with() >> -- and that's it. >> >> The important realization to have here is that these functions are a >> bit of >> sugar for an operation that is quite common, but can also be easily >> implemented with existing functions (using strcmp, strpos or substr, >> depending on what you like). There is no need for us to cover every >> conceivable combination, just make the common case more convenient and >> easier to read. >> >> With that in mind: >> * I believe the "starts with" and "ends with" naming is a lot more >> canonical, used by Python, Ruby, Java, JavaScript and probably lots >> more. >> * In my experience case-insensitive "i" variants of strings functions >> are >> used much less, by an order of magnitude. With this being sugar in the >> first place, I don't think there's a need to cover case-insensitive >> variations (and from a quick look, these don't seem to be first class >> methods in other languages either). If we do want to have them, I'd >> suggest >> making the names str_starts_with_ci() and str_ends_with_ci(), which is >> more >> obvious and harder to miss than str_istarts_with() etc. >> * Having mb_* variants of these functions doesn't really make sense. I >> realize that there's this knee-jerk reaction about how if it doesn't >> have >> "mb" in the name it's not Unicode compatible, but in this case it's >> even >> more wrong than usual. The normal str_starts_with() function is >> perfectly >> safe to use on UTF-8 strings, the only difference between it and >> mb_str_starts_with() is that it's going to be implemented a lot more >> efficiently. The only case that *might* make some sense is the >> case-insensitive variant here, because that has some genuine reliance >> on >> the character encoding. But then again, this can be handled by >> case-folding >> the strings first, something that mbstring is going to do internally >> anyway. >> >> I would happily accept a proposal for str_starts_with() + >> str_ends_with(), >> but I'm a lot more apprehensive about adding these 8 new functions. >> >> Regards, >> Nikita > > > I like the idea of simplifying this to the two functions > str_starts_with() and str_ends_with(). > > When I was looking through this the other day, I had trouble coming up > with an example of a string with the mb_* versions would ever generate > a different result from the non-multibyte versions, since the > implementation only needs to count and analyze bytes for uniqueness. > Perhaps it would only be an issue with the case-insensitive versions, > as Nikita points out? If so, can someone provide some example strings > where an mb_starts_with_ci() would return true, while > str_starts_with_ci() would return false? > > I think the case sensitivity versions would be common enough in use > cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.), > but maybe the signatures could be revised to pass a third parameter? > > str_starts_with($haystack, $needle, $case_sensitive = true): bool > > -Ben