Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:106035 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 66485 invoked from network); 22 Jun 2019 22:40:57 -0000 Received: from unknown (HELO out2-smtp.messagingengine.com) (66.111.4.26) by pb1.pair.com with SMTP; 22 Jun 2019 22:40:57 -0000 Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 6EB18220CF; Sat, 22 Jun 2019 15:56:25 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute7.internal (MEProxy); Sat, 22 Jun 2019 15:56:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=benramsey.com; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s=fm2; bh=p WJ7SjKSDwYKBjN4yNsoP9pcc83rjej47acawAud+cI=; b=31DjyB7JE4D3BJt/B /Z7ufHe4K1pdqkpsEX3nU/pW+QJotOL7mOXcP94LnwEWWAVug+OUB9kvLll+a5+z C4f+F0tB46+sePLNGaKYeet41mLnB6VjevoZK9+WjqN3Wub9NrmOkavhRNZqisNQ 7WYX2wQ7Y3Z3KuJJchANayrnb7WjQ4Jy2V55lqCxCkjtTgQ+xtslVlxvVrXYrFo8 Kmlq/iSG32dWi6ofsPu+AlvmC+DbBlT/hkoVTlgFzrX7sivREkGleCSLEcL3Yi40 NxnWz08QbyfDJK06yVurjHKus/7y5NeB3oEIn2r+BSdQ3AWq3mE0Yw13ezvBOuaH J35PQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=pWJ7SjKSDwYKBjN4yNsoP9pcc83rjej47acawAud+ cI=; b=Y0lG68vWgxMI8untnm6ZB3OHOJDWp6sPqXbsAPI732t+QrnBPYglhcFv2 0wYmVGI2ubNGJL8/znYeDeNBrZyZQj2RudClEC8ulMR8VhVbF/UObYJHNuHE0gcH y/ehwUvAI8pP8zg2TwcvWAyehBsYBYEFIzP1xPmbvpwYUHAAp+MTHvt6lNSAxZr5 REsFWABX1SH7DpY2oNN6/TQmW2HCMqBjrwE6kqQ784EtyJ0w6ki4PXNQdUCWkxzl n04waORfyJz6cS3tN+23D0UZxGmyyMTguxSlBtRVBes9d5+3Qj/WgatwFqZSL9sb enS2QYIabFlu1fdCgg9BTgIgkAR+Q== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduvddrtdekgddugeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurheptggguffhofgjfffgkfhfvfesthhqmhdthhdtjeenucfhrhhomhepuegvnhcu tfgrmhhsvgihuceosggvnhessggvnhhrrghmshgvhidrtghomheqnecuffhomhgrihhnpe hgihhthhhusgdrtghomhdpphhhphdrnhgvthenucfkphepleeirdeiuddrudejtddrhedt necurfgrrhgrmhepmhgrihhlfhhrohhmpegsvghnsegsvghnrhgrmhhsvgihrdgtohhmne cuvehluhhsthgvrhfuihiivgeptd X-ME-Proxy: Received: from [10.10.42.46] (h96-61-170-50.lvrgtn.dsl.dynamic.tds.net [96.61.170.50]) by mail.messagingengine.com (Postfix) with ESMTPA id E2DBF380079; Sat, 22 Jun 2019 15:56:24 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) X-Mailer: iPhone Mail (16F203) In-Reply-To: Date: Sat, 22 Jun 2019 14:56:24 -0500 Cc: will@wkhudgins.info, PHP internals Content-Transfer-Encoding: quoted-printable Message-ID: <2CF672F8-12F5-4D37-8B8C-591A6E695220@benramsey.com> References: <8442f1fa5544b2ca03e7cebbc64e8e5c@wkhudgins.info> <683c5da474e13283030cac3d0c0ec080@wkhudgins.info> <2c37999d1e5372ae6ab48bfce5420796@wkhudgins.info> To: Nikita Popov Subject: Re: [PHP-DEV] [RFC] Desire to move RFC add_str_begin_and_end_functions to a vote From: ben@benramsey.com (Ben Ramsey) > On Jun 22, 2019, at 10:32, Nikita Popov wrote: >=20 >> On Thu, Jun 20, 2019 at 12:32 AM wrote: >>=20 >> I sent this earlier this week without [RFC] in the subject line...since >> some people might have filters to check the subject line I wanted to >> send this again with the proper substring in the subject line=E2=80=93to m= ake it >> clear I intend to take this to a vote in two weeks. Apologies for the >> duplicate email. >>=20 >> -Will >>=20 >>> On 2019-06-18 14:45, will@wkhudgins.info wrote: >>> Hello all, >>>=20 >>> I submitted this RFC several years ago. I collected a lot of feedback >>> and I have updated the RFC and corresponding github patch. Please see >>> the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions >>> and the github patch at https://github.com/php/php-src/pull/2049. I >>> have addressed many concerns >>> (order of arguments, name of functions, multibye support, etc). I plan >>> to move this RFC to a vote in the coming weeks. >>>=20 >>> Thanks, >>>=20 >>> Will >>=20 >=20 > Unfortunately, this looks like a case where the RFC feedback has made the > proposal worse, rather than better :( >=20 > I think it's easier to start with what I think this proposal should be: > There should be just two functions, str_starts_with() and str_ends_with() > -- and that's it. >=20 > The important realization to have here is that these functions are a bit o= f > sugar for an operation that is quite common, but can also be easily > implemented with existing functions (using strcmp, strpos or substr, > depending on what you like). There is no need for us to cover every > conceivable combination, just make the common case more convenient and > easier to read. >=20 > With that in mind: > * I believe the "starts with" and "ends with" naming is a lot more > canonical, used by Python, Ruby, Java, JavaScript and probably lots more. > * In my experience case-insensitive "i" variants of strings functions are > used much less, by an order of magnitude. With this being sugar in the > first place, I don't think there's a need to cover case-insensitive > variations (and from a quick look, these don't seem to be first class > methods in other languages either). If we do want to have them, I'd sugges= t > making the names str_starts_with_ci() and str_ends_with_ci(), which is mor= e > obvious and harder to miss than str_istarts_with() etc. > * Having mb_* variants of these functions doesn't really make sense. I > realize that there's this knee-jerk reaction about how if it doesn't have > "mb" in the name it's not Unicode compatible, but in this case it's even > more wrong than usual. The normal str_starts_with() function is perfectly > safe to use on UTF-8 strings, the only difference between it and > mb_str_starts_with() is that it's going to be implemented a lot more > efficiently. The only case that *might* make some sense is the > case-insensitive variant here, because that has some genuine reliance on > the character encoding. But then again, this can be handled by case-foldin= g > the strings first, something that mbstring is going to do internally anywa= y. >=20 > I would happily accept a proposal for str_starts_with() + str_ends_with(),= > but I'm a lot more apprehensive about adding these 8 new functions. >=20 > Regards, > Nikita I like the idea of simplifying this to the two functions str_starts_with() a= nd str_ends_with(). When I was looking through this the other day, I had trouble coming up with a= n example of a string with the mb_* versions would ever generate a different= result from the non-multibyte versions, since the implementation only needs= to count and analyze bytes for uniqueness. Perhaps it would only be an issu= e with the case-insensitive versions, as Nikita points out? If so, can someo= ne provide some example strings where an mb_starts_with_ci() would return tr= ue, while str_starts_with_ci() would return false? I think the case sensitivity versions would be common enough in use cases (i= .e. looking to see if a path ends with .CSV vs. .csv, etc.), but maybe the s= ignatures could be revised to pass a third parameter? str_starts_with($haystack, $needle, $case_sensitive =3D true): bool -Ben=