Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:106101 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 5622 invoked from network); 29 Jun 2019 00:50:41 -0000 Received: from unknown (HELO mail-lj1-f194.google.com) (209.85.208.194) by pb1.pair.com with SMTP; 29 Jun 2019 00:50:41 -0000 Received: by mail-lj1-f194.google.com with SMTP id 205so7403393ljj.8 for ; Fri, 28 Jun 2019 15:07:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=w0FOI9tcqYQbZbZbd536Wmfd0LOk2z8aQGb22qS5ZNE=; b=G3eYxWBWpQrO8AAhePWOLBFpMy0na6ghSl6fE2V5GeXuNzJ10hHmbhAEAWRSJp9Qqb +hsW+ZbzT7DCuOi9lSjD4KesORMZky8OPoqfWSFXI6eWeq6hzFD8QLPOr8MDC4DqJD9J f+CaoMZov/351qMev5JLzcFuFfbihCEzUoBFXZOCvkG3/5aiUfahhUAEBd/WiDUKbgQY H5pvMNoj+9c9NrD+NuQZ0kQAWaTLZZ/qWKaeMgIFuDtIhFjIF5q5EH+x0Ka/WduWTikb cSquFNSPeblzM2nORRakB+LknV6RpA9qnz+S/2hvNqbFihHS9qS1oO/kZje4UrMoHoKt IayA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=w0FOI9tcqYQbZbZbd536Wmfd0LOk2z8aQGb22qS5ZNE=; b=p1jPpFU0GMgxYMaAAW8dR8fnRFuUB0KLj/eDzyZcBvBNdq75YNd4qQFxPgjw7b6BF+ JwCZc9ED3+hVw44ZGPxjEZOAzwDZ1Qs+4lgIb+poeiERgjtbJ+XStMsLIWyYS1Yrhos4 cwq7criEj3wi0fvKx7h+y2aapluCgCGGlu+JPJx+vA4ctJQNLWWG374zo/xUWebh0dZb Rx5dF8jJVF4y8hiO5jUSj4/RCpajkfDk3HyeL9WdUMxN09wyyHACDnfJ6agmjlaDf79G y7s0lqVJNqj1Y43iRwRq5k0eN4E4CT9Zgl1Mj8B1E4Teo7hoBVTT98HoH0W3NSnIMLAe aFrA== X-Gm-Message-State: APjAAAXctotGjTESK5+3zu1Z5I8rFacnqHFLuqeTOeUQw15M8KFxdfw+ cIhSMA22MZTIXbqs97xebLKEjQMwlfmAh2H6icI= X-Google-Smtp-Source: APXvYqzNB7moBxURAWSaWC3lLrwSRwVbmztd0GZz4uRgSa7exZw+B5Hmx0xgWJ5CDUZLOSCgQUoYW1c/BB1cvnVUigE= X-Received: by 2002:a2e:2c04:: with SMTP id s4mr7436822ljs.61.1561759660557; Fri, 28 Jun 2019 15:07:40 -0700 (PDT) MIME-Version: 1.0 References: <8442f1fa5544b2ca03e7cebbc64e8e5c@wkhudgins.info> <683c5da474e13283030cac3d0c0ec080@wkhudgins.info> <2c37999d1e5372ae6ab48bfce5420796@wkhudgins.info> <2CF672F8-12F5-4D37-8B8C-591A6E695220@benramsey.com> <78034520cdb610d923e25d47ed718938@wkhudgins.info> In-Reply-To: <78034520cdb610d923e25d47ed718938@wkhudgins.info> Date: Sat, 29 Jun 2019 00:07:24 +0200 Message-ID: To: will@wkhudgins.info Cc: Ben Ramsey , PHP internals Content-Type: multipart/alternative; boundary="000000000000caca9f058c698161" Subject: Re: [PHP-DEV] [RFC] Desire to move RFC add_str_begin_and_end_functions to a vote From: nikita.ppv@gmail.com (Nikita Popov) --000000000000caca9f058c698161 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jun 28, 2019 at 10:54 PM wrote: > These are good points. Originally my RFC called for less functions but > based on feedback I added the others. My proposal: take the RFC as-is to > a vote. If it fails, I will raise another RFC for a vote that will just > contain the two basic functions: str_begins and str_ends. > To put my comments into more actionable form, here is what I would recommend for this RFC: * Rename str_begins -> str_starts_with, str_ends -> str_ends_with, str_ibegins -> str_starts_with_ci, str_iends -> str_ends_with_ci. As mentioned before, this is standard terminology used by many, many programming languages and it would be great if PHP did not deviate from convention without strong reason. * Have a separate vote (in the same RFC) for the addition of the corresponding mb_* variants. I believe doing those two changes will ensure that the core part of the RFC passes. I personally would be voting yes on the first part and no on the second, but others may decide as they see fit. Nikita > On 2019-06-22 15:56, Ben Ramsey wrote: > >> On Jun 22, 2019, at 10:32, Nikita Popov wrote: > >> > >>> On Thu, Jun 20, 2019 at 12:32 AM wrote: > >>> > >>> I sent this earlier this week without [RFC] in the subject > >>> line...since > >>> some people might have filters to check the subject line I wanted to > >>> send this again with the proper substring in the subject line=E2=80= =93to make > >>> it > >>> clear I intend to take this to a vote in two weeks. Apologies for the > >>> duplicate email. > >>> > >>> -Will > >>> > >>>> On 2019-06-18 14:45, will@wkhudgins.info wrote: > >>>> Hello all, > >>>> > >>>> I submitted this RFC several years ago. I collected a lot of > >>>> feedback > >>>> and I have updated the RFC and corresponding github patch. Please > >>>> see > >>>> the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions > >>>> and the github patch at https://github.com/php/php-src/pull/2049. I > >>>> have addressed many concerns > >>>> (order of arguments, name of functions, multibye support, etc). I > >>>> plan > >>>> to move this RFC to a vote in the coming weeks. > >>>> > >>>> Thanks, > >>>> > >>>> Will > >>> > >> > >> Unfortunately, this looks like a case where the RFC feedback has made > >> the > >> proposal worse, rather than better :( > >> > >> I think it's easier to start with what I think this proposal should > >> be: > >> There should be just two functions, str_starts_with() and > >> str_ends_with() > >> -- and that's it. > >> > >> The important realization to have here is that these functions are a > >> bit of > >> sugar for an operation that is quite common, but can also be easily > >> implemented with existing functions (using strcmp, strpos or substr, > >> depending on what you like). There is no need for us to cover every > >> conceivable combination, just make the common case more convenient and > >> easier to read. > >> > >> With that in mind: > >> * I believe the "starts with" and "ends with" naming is a lot more > >> canonical, used by Python, Ruby, Java, JavaScript and probably lots > >> more. > >> * In my experience case-insensitive "i" variants of strings functions > >> are > >> used much less, by an order of magnitude. With this being sugar in the > >> first place, I don't think there's a need to cover case-insensitive > >> variations (and from a quick look, these don't seem to be first class > >> methods in other languages either). If we do want to have them, I'd > >> suggest > >> making the names str_starts_with_ci() and str_ends_with_ci(), which is > >> more > >> obvious and harder to miss than str_istarts_with() etc. > >> * Having mb_* variants of these functions doesn't really make sense. I > >> realize that there's this knee-jerk reaction about how if it doesn't > >> have > >> "mb" in the name it's not Unicode compatible, but in this case it's > >> even > >> more wrong than usual. The normal str_starts_with() function is > >> perfectly > >> safe to use on UTF-8 strings, the only difference between it and > >> mb_str_starts_with() is that it's going to be implemented a lot more > >> efficiently. The only case that *might* make some sense is the > >> case-insensitive variant here, because that has some genuine reliance > >> on > >> the character encoding. But then again, this can be handled by > >> case-folding > >> the strings first, something that mbstring is going to do internally > >> anyway. > >> > >> I would happily accept a proposal for str_starts_with() + > >> str_ends_with(), > >> but I'm a lot more apprehensive about adding these 8 new functions. > >> > >> Regards, > >> Nikita > > > > > > I like the idea of simplifying this to the two functions > > str_starts_with() and str_ends_with(). > > > > When I was looking through this the other day, I had trouble coming up > > with an example of a string with the mb_* versions would ever generate > > a different result from the non-multibyte versions, since the > > implementation only needs to count and analyze bytes for uniqueness. > > Perhaps it would only be an issue with the case-insensitive versions, > > as Nikita points out? If so, can someone provide some example strings > > where an mb_starts_with_ci() would return true, while > > str_starts_with_ci() would return false? > > > > I think the case sensitivity versions would be common enough in use > > cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.), > > but maybe the signatures could be revised to pass a third parameter? > > > > str_starts_with($haystack, $needle, $case_sensitive =3D true): bool > > > > -Ben > --000000000000caca9f058c698161--