Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:106109 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 83597 invoked from network); 29 Jun 2019 18:24:52 -0000 Received: from unknown (HELO smtpbgau1.qq.com) (54.206.16.166) by pb1.pair.com with SMTP; 29 Jun 2019 18:24:52 -0000 X-QQ-mid:Yeas18t1561822912t601t09583 Received: from 14D73359B2E14ADDB4DDC3857AE32898 (me@jhdxr.com [220.255.44.152]) X-QQ-SSF:00000000000000F0F61000000000000 To: "'Nikita Popov'" , Cc: "'Ben Ramsey'" , "'PHP internals'" References: <8442f1fa5544b2ca03e7cebbc64e8e5c@wkhudgins.info> <683c5da474e13283030cac3d0c0ec080@wkhudgins.info> <2c37999d1e5372ae6ab48bfce5420796@wkhudgins.info> <2CF672F8-12F5-4D37-8B8C-591A6E695220@benramsey.com> <78034520cdb610d923e25d47ed718938@wkhudgins.info> In-Reply-To: Date: Sat, 29 Jun 2019 23:41:48 +0800 Message-ID: <000a01d52e91$2a4b6f20$7ee24d60$@jhdxr.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQH86RgXQcpMEpzUbW/jouiATozGzAKyFBLIApdMPTsCbaqYkQJRlWRkAwaaoVMBkyz7bAHnifSkAroW88sB5klehaW5+vcA Content-Language: zh-cn X-QQ-SENDSIZE: 520 Feedback-ID: Yeas:jhdxr.com:qybgforeign:qybgforeign4 X-QQ-Bgrelay: 1 Subject: RE: [PHP-DEV] [RFC] Desire to move RFC add_str_begin_and_end_functions to a vote From: me@jhdxr.com (=?utf-8?b?Q0hVIFpoYW93ZWk=?=) Agreed. I'm wondering why the author choose to use begin(s) /end(s) = while almost all other popular language has a more clear naming. e.g. = starts_with or has_prefix.=20 In addition, like someone else pointed out two years ago, userland may = already have functions with the same name, and this should be considered = as a potential BC break, which is not reflected in the RFC yet.=20 Regards, CHU Zhaowei > -----Original Message----- > From: Nikita Popov > Sent: Saturday, June 29, 2019 6:07 AM > To: will@wkhudgins.info > Cc: Ben Ramsey ; PHP internals = > Subject: Re: [PHP-DEV] [RFC] Desire to move RFC > add_str_begin_and_end_functions to a vote >=20 > On Fri, Jun 28, 2019 at 10:54 PM wrote: >=20 > > These are good points. Originally my RFC called for less functions = but > > based on feedback I added the others. My proposal: take the RFC = as-is > > to a vote. If it fails, I will raise another RFC for a vote that = will > > just contain the two basic functions: str_begins and str_ends. > > >=20 > To put my comments into more actionable form, here is what I would > recommend for this RFC: >=20 > * Rename str_begins -> str_starts_with, str_ends -> str_ends_with, = str_ibegins -> > str_starts_with_ci, str_iends -> str_ends_with_ci. As mentioned = before, this is > standard terminology used by many, many programming languages and it = would > be great if PHP did not deviate from convention without strong reason. > * Have a separate vote (in the same RFC) for the addition of the = corresponding > mb_* variants. >=20 > I believe doing those two changes will ensure that the core part of = the RFC > passes. I personally would be voting yes on the first part and no on = the second, > but others may decide as they see fit. >=20 > Nikita >=20 >=20 > > On 2019-06-22 15:56, Ben Ramsey wrote: > > >> On Jun 22, 2019, at 10:32, Nikita Popov = wrote: > > >> > > >>> On Thu, Jun 20, 2019 at 12:32 AM wrote: > > >>> > > >>> I sent this earlier this week without [RFC] in the subject > > >>> line...since some people might have filters to check the subject > > >>> line I wanted to send this again with the proper substring in = the > > >>> subject line=E2=80=93to make it clear I intend to take this to a = vote in > > >>> two weeks. Apologies for the duplicate email. > > >>> > > >>> -Will > > >>> > > >>>> On 2019-06-18 14:45, will@wkhudgins.info wrote: > > >>>> Hello all, > > >>>> > > >>>> I submitted this RFC several years ago. I collected a lot of > > >>>> feedback and I have updated the RFC and corresponding github > > >>>> patch. Please see the RFC at > > >>>> https://wiki.php.net/rfc/add_str_begin_and_end_functions > > >>>> and the github patch at = https://github.com/php/php-src/pull/2049. > > >>>> I have addressed many concerns (order of arguments, name of > > >>>> functions, multibye support, etc). I plan to move this RFC to a > > >>>> vote in the coming weeks. > > >>>> > > >>>> Thanks, > > >>>> > > >>>> Will > > >>> > > >> > > >> Unfortunately, this looks like a case where the RFC feedback has > > >> made the proposal worse, rather than better :( > > >> > > >> I think it's easier to start with what I think this proposal = should > > >> be: > > >> There should be just two functions, str_starts_with() and > > >> str_ends_with() > > >> -- and that's it. > > >> > > >> The important realization to have here is that these functions = are > > >> a bit of sugar for an operation that is quite common, but can = also > > >> be easily implemented with existing functions (using strcmp, = strpos > > >> or substr, depending on what you like). There is no need for us = to > > >> cover every conceivable combination, just make the common case = more > > >> convenient and easier to read. > > >> > > >> With that in mind: > > >> * I believe the "starts with" and "ends with" naming is a lot = more > > >> canonical, used by Python, Ruby, Java, JavaScript and probably = lots > > >> more. > > >> * In my experience case-insensitive "i" variants of strings > > >> functions are used much less, by an order of magnitude. With this > > >> being sugar in the first place, I don't think there's a need to > > >> cover case-insensitive variations (and from a quick look, these > > >> don't seem to be first class methods in other languages either). = If > > >> we do want to have them, I'd suggest making the names > > >> str_starts_with_ci() and str_ends_with_ci(), which is more = obvious > > >> and harder to miss than str_istarts_with() etc. > > >> * Having mb_* variants of these functions doesn't really make > > >> sense. I realize that there's this knee-jerk reaction about how = if > > >> it doesn't have "mb" in the name it's not Unicode compatible, but > > >> in this case it's even more wrong than usual. The normal > > >> str_starts_with() function is perfectly safe to use on UTF-8 > > >> strings, the only difference between it and > > >> mb_str_starts_with() is that it's going to be implemented a lot > > >> more efficiently. The only case that *might* make some sense is = the > > >> case-insensitive variant here, because that has some genuine > > >> reliance on the character encoding. But then again, this can be > > >> handled by case-folding the strings first, something that = mbstring > > >> is going to do internally anyway. > > >> > > >> I would happily accept a proposal for str_starts_with() + > > >> str_ends_with(), but I'm a lot more apprehensive about adding = these > > >> 8 new functions. > > >> > > >> Regards, > > >> Nikita > > > > > > > > > I like the idea of simplifying this to the two functions > > > str_starts_with() and str_ends_with(). > > > > > > When I was looking through this the other day, I had trouble = coming > > > up with an example of a string with the mb_* versions would ever > > > generate a different result from the non-multibyte versions, since > > > the implementation only needs to count and analyze bytes for = uniqueness. > > > Perhaps it would only be an issue with the case-insensitive > > > versions, as Nikita points out? If so, can someone provide some > > > example strings where an mb_starts_with_ci() would return true, > > > while > > > str_starts_with_ci() would return false? > > > > > > I think the case sensitivity versions would be common enough in = use > > > cases (i.e. looking to see if a path ends with .CSV vs. .csv, = etc.), > > > but maybe the signatures could be revised to pass a third = parameter? > > > > > > str_starts_with($haystack, $needle, $case_sensitive =3D true): = bool > > > > > > -Ben > >