RFC Posted for str_begins and str_ends functions

9 years ago by will@wkhudgins.info — view source

unread

Hello,

I recently emailed the group about submitting an RFC for str_begins()
and str_ends() functions. The RFC has now been officially submitted and
is viewable at:

https://wiki.php.net/rfc/add_str_begin_and_end_functions

The github PR may be found at:

https://github.com/php/php-src/pull/2049

Hope to be hearing about this,

Will

9 years ago by Sara Golemon — view source

unread

I recently emailed the group about submitting an RFC for str_begins() and
str_ends() functions. The RFC has now been officially submitted and is
viewable at:

https://wiki.php.net/rfc/add_str_begin_and_end_functions

Feeling "meh" on it (neither for nor against), but I would consider
consistency with other str*() functions by making case-insensitivity
live in separate functions rather than as a parameter. e.g.
str_begins(), str_ibegins(), str_ends(), end_iends()

-Sara

9 years ago by Yasuo Ohgaki — view source

unread

I recently emailed the group about submitting an RFC for str_begins() and
str_ends() functions. The RFC has now been officially submitted and is
viewable at:

https://wiki.php.net/rfc/add_str_begin_and_end_functions

Feeling "meh" on it (neither for nor against), but I would consider
consistency with other str*() functions by making case-insensitivity
live in separate functions rather than as a parameter. e.g.
str_begins(), str_ibegins(), str_ends(), end_iends()

+1 for having functions for case insensitivity.
I'm not sure if we should have "s". i.e. str_begin"s".

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

9 years ago by David Rodrigues — view source

unread

Sara Golemon wrote:

Feeling "meh" on it (neither for nor against), but I would consider
consistency with other str*() functions by making case-insensitivity
live in separate functions rather than as a parameter. e.g.
str_begins(), str_ibegins(), str_ends(), end_iends()

I guess that "i" isn't appliable when it have slashes.
In this case, functions should be: strbegins, stribegins, strends, striends.
In all case, I think that is better a third parameter and keep underlined.

Yasuo Ohgaki wrote:

+1 for having functions for case insensitivity.
I'm not sure if we should have "s". i.e. str_begin"s".

I think that "s" is good here.
Sounds better for me, but I don't know if it is right in english.
In JS, for instance, we have startsWith. It have a "s" too.

2016-08-01 21:06 GMT-03:00 Yasuo Ohgaki yohgaki@ohgaki.net:

I recently emailed the group about submitting an RFC for str_begins() and
str_ends() functions. The RFC has now been officially submitted and is
viewable at:

https://wiki.php.net/rfc/add_str_begin_and_end_functions

Feeling "meh" on it (neither for nor against), but I would consider
consistency with other str*() functions by making case-insensitivity
live in separate functions rather than as a parameter. e.g.
str_begins(), str_ibegins(), str_ends(), end_iends()

+1 for having functions for case insensitivity.
I'm not sure if we should have "s". i.e. str_begin"s".

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

--

--
David Rodrigues

9 years ago by Yasuo Ohgaki — view source

unread

Hi David,

Sara Golemon wrote:

Feeling "meh" on it (neither for nor against), but I would consider
consistency with other str*() functions by making case-insensitivity
live in separate functions rather than as a parameter. e.g.
str_begins(), str_ibegins(), str_ends(), end_iends()

I guess that "i" isn't appliable when it have slashes.
In this case, functions should be: strbegins, stribegins, strends, striends.
In all case, I think that is better a third parameter and keep underlined.

This is difficult issue.
String function names are inconsistent currently.
It is better to stick to CODING_STANDARDS naming convention for new
function names. Therefore, new string functions are better to be named
str_*() unless they are too strange.

e.g.
http://php.net/manual/en/function.str-replace.php
http://php.net/manual/en/function.str-ireplace.php

I would like to fix function name inconsistencies by having aliases in
near future.
https://wiki.php.net/rfc/consistent_function_names

It might be okay to have "s" in function names, but if we want to be
consistent,

str_replace -> str_replaces
str_ireplace -> str_ireplaces

IMO, following names are better for consistency.

str_begin
str_ibegin
str_end
str_iend

In addition, str_replace() has seach_value at first, so signature might be

boolean str_begin(string $search_value, string $str, [boolean
$case_sensitive = true])
boolean str_end(string $search_value, string $str, string
$search_value [boolean $case_sensitive = true])

However, strstr() (and other str functions without "_". e.g.
strpos/stripos/strrpos/strripos) has search_value as the 2nd
parameter. If we follow this format, current signature is fine.

It may be better sort out and fix consistency issues first, then add
new functions. Otherwise, we may introduce more consistency issues.

Regards,

BTW, having "i" is more readable.

str_ibegin("searchthis", $str);
is more readable than
str_begin("seachthis", $str, TRUE);
as programmer does not have to know that's the TRUE means.
It's small thing, but small things add up.

--
Yasuo Ohgaki
yohgaki@ohgaki.net

9 years ago by Rowan Collins — view source

unread

It might be okay to have "s" in function names, but if we want to be
consistent,

str_replace -> str_replaces
str_ireplace -> str_ireplaces

IMO, following names are better for consistency.

str_begin
str_ibegin
str_end
str_iend

I think those names mean something different: "str_begin" sounds like an
imperative "make this string begin with X"; "str_begins" is more of an
assertion "the string begins with X". Ruby would spell it with a ? at
the end. It's also the same form, grammatically, as the common "isFoo".

Note that this logic holds for "str_replace", which is an imperative -
you are not saying "tell me if X replaces Y", you are saying "please
replace X with Y".

Regards,

--
Rowan Collins
[IMSoP]

9 years ago by will@wkhudgins.info — view source

unread

Hi David,

On Tue, Aug 2, 2016 at 10:36 AM, David Rodrigues
david.proweb@gmail.com wrote:

Sara Golemon wrote:

Feeling "meh" on it (neither for nor against), but I would consider
consistency with other str*() functions by making case-insensitivity
live in separate functions rather than as a parameter. e.g.
str_begins(), str_ibegins(), str_ends(), end_iends()

I guess that "i" isn't appliable when it have slashes.
In this case, functions should be: strbegins, stribegins, strends,
striends.
In all case, I think that is better a third parameter and keep
underlined.

This is difficult issue.
String function names are inconsistent currently.
It is better to stick to CODING_STANDARDS naming convention for new
function names. Therefore, new string functions are better to be named
str_*() unless they are too strange.

e.g.
http://php.net/manual/en/function.str-replace.php
http://php.net/manual/en/function.str-ireplace.php

I would like to fix function name inconsistencies by having aliases in
near future.
https://wiki.php.net/rfc/consistent_function_names

It might be okay to have "s" in function names, but if we want to be
consistent,

str_replace -> str_replaces
str_ireplace -> str_ireplaces

IMO, following names are better for consistency.

str_begin
str_ibegin
str_end
str_iend

In addition, str_replace() has seach_value at first, so signature might
be

boolean str_begin(string $search_value, string $str, [boolean
$case_sensitive = true])
boolean str_end(string $search_value, string $str, string
$search_value [boolean $case_sensitive = true])

However, strstr() (and other str functions without "_". e.g.
strpos/stripos/strrpos/strripos) has search_value as the 2nd
parameter. If we follow this format, current signature is fine.

It may be better sort out and fix consistency issues first, then add
new functions. Otherwise, we may introduce more consistency issues.

Regards,

BTW, having "i" is more readable.

str_ibegin("searchthis", $str);
is more readable than
str_begin("seachthis", $str, TRUE);
as programmer does not have to know that's the TRUE means.
It's small thing, but small things add up.

--
Yasuo Ohgaki
yohgaki@ohgaki.net

Everyone has raised important considerations. For me, the most important
thing is maintaining consistency with the existing PHP string library. I
do not want these functions to feel "tacked" on, as if they were
haphazardly added to PHP. If these functions are added to the language,
it should feel as if they have always been a part of the language (even
if they haven't been). This consistency is important in order to ensure
these functions ADD to PHP instead of just cluttering it up.

Having separate functions for case sensitivity makes sense, that is much
more consistent with the existing string library. I think the proposal
should be amended to separate those two functionalities. I think like
having an "s" at the end of the function names reads better, but
omitting the "s" fits better with the existing function names and does
not read bad. Therefore, I am in favor of dropping the "s".

As far as str_begin vs strbegin, I think str_begin is more readable.
Therefore, I think it would be better to implement:

boolean str_begin(string $search_value, string $str)
boolean str_ibegin(string $search_value, string $str)
boolean str_end(string $search_value, string $str)
boolean str_iend(string $search_value, string $str)

This is much more consistent with the existing string library.

Regards,

Will

9 years ago by Yasuo Ohgaki — view source

unread

as programmer does not have to know that's the TRUE means.

s/that's/what's/

I shouldn't write mails while writing code :(

--
Yasuo Ohgaki
yohgaki@ohgaki.net

9 years ago by Stanislav Malyshev — view source

unread

Hi!

I guess that "i" isn't appliable when it have slashes.
In this case, functions should be: strbegins, stribegins, strends, striends.
In all case, I think that is better a third parameter and keep underlined.

Please, not stribegins. We have enough functions with weird names :)
I am ambivalent of the question whether to have additional argument or
two functions, I guess with a slight preference for argument.

--
Stas Malyshev
smalyshev@gmail.com

8 years ago by will@wkhudgins.info — view source

unread

I've updated the RFC to reflect the discussion here and on github. You
may see it at
https://wiki.php.net/rfc/add_str_begin_and_end_functions . You can see
the github PR at https://github.com/php/php-src/pull/2049 .

The motivation for these changes was to maximize consistency between the
proposed functions and existing PHP string functions. The goal is to
make these functions feel natural and add functionality to the language
without cluttering it up.

Thanks,

Will

8 years ago by Bishop Bettini — view source

unread

I've updated the RFC to reflect the discussion here and on github. You may
see it at
https://wiki.php.net/rfc/add_str_begin_and_end_functions . You can see
the github PR at https://github.com/php/php-src/pull/2049 .

The motivation for these changes was to maximize consistency between the
proposed functions and existing PHP string functions. The goal is to make
these functions feel natural and add functionality to the language without
cluttering it up.

Generally, +1. A few thoughts.

First, the RFC refers to these working on "characters". I assume you mean
ASCII characters and these actually work strictly on bytes. Working on
"characters" would be more in-line for a multi-byte extension. Would you
please clarify this point?

Second, and related to the multi-byte issue: do the case insensitive
versions honor case-folding in a multi-byte fashion? Either way, it's
probably a good idea to separate the vote between the sensitive and
insensitive versions because this is fundamentally a different, and perhaps
more contentious, question.

Third, perhaps these functions could provide more information than just
yes/no. Return boolean TRUE if and only if the needle completely
begins/ends the haystack, otherwise return INT representing the length in
common. Yes, that'll probably be a trap for new developers who don't honor
===, but that could be illuminated in docs. Formally:

boolean|int str_begin(string $needle, string $haystack)
boolean|int str_end(string $needle, string $haystack)

For example:

str_begin('http://', 'http://example.com') === true
str_begin('http://', 'https://example.com') === 4

Finally, since the RFC will fuel the final documentation, it might be a
good idea to use needle/haystack terminology in the function signatures for
some kind of consistency.

8 years ago by Lester Caine — view source

unread

I guess that "i" isn't appliable when it have slashes.

In this case, functions should be: strbegins, stribegins, strends, striends.
In all case, I think that is better a third parameter and keep underlined.

Please, not stribegins. We have enough functions with weird names :)
I am ambivalent of the question whether to have additional argument or
two functions, I guess with a slight preference for argument.

The bulk of the time I'm applying this to the SQL query that is going to
return a set of results rather than direct to a string. In that case
it's STARTING 'xYZ'. Because the need has not arisen I've only just
noticed - after 20 odd years - there is no matching ENDING. Although
normally one needs to build a phantom field to index the data, so I do
have ONE case of reversed_field STARTING 'ZYX'.

Is starting just a Firebird SQL thing or is it more generally available.
I do a few google searches but as usual when searching for things like
'starting' one gets hundreds of pages on 'running' the software and it's
other connotations.

I suspect like PHP the other methods of doing things take the strain, so
certainly LIKE 'XYZ%' and '%XYZ' are probably the 'generic' solution but
suffer from slower search times, especially when looking for the ENDING
string.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Rowan Collins — view source

unread

Is starting just a Firebird SQL thing or is it more generally available.
I do a few google searches but as usual when searching for things like
'starting' one gets hundreds of pages on 'running' the software and it's
other connotations.

I've never come across it in Postgres, MS SQL Server, or MySQL.
Generally LIKE 'abc%' is the recommended approach (and will I think hit
the index in many cases, because the DBMS can optimize the case of a
prefix match if it knows at planning time). A "starting" keyword would
certainly be useful if it was there. :)

It doesn't quite fill the same need as a PHP function, of course,
because you might be checking user input, or API results, or all sorts
of things that won't, or haven't yet, hit the database. Currently the
common idiom for that is the ugly strpos($string, 'abc') === 0

Regards,

Rowan Collins
[IMSoP]

8 years ago by Lester Caine — view source

unread

Is starting just a Firebird SQL thing or is it more generally available.
I do a few google searches but as usual when searching for things like
'starting' one gets hundreds of pages on 'running' the software and it's
other connotations.

I've never come across it in Postgres, MS SQL Server, or MySQL.
Generally LIKE 'abc%' is the recommended approach (and will I think hit
the index in many cases, because the DBMS can optimize the case of a
prefix match if it knows at planning time). A "starting" keyword would
certainly be useful if it was there. :)

It doesn't quite fill the same need as a PHP function, of course,
because you might be checking user input, or API results, or all sorts
of things that won't, or haven't yet, hit the database. Currently the
common idiom for that is the ugly strpos($string, 'abc') === 0

PHP is never going to be loading millions of records into memory and
searching them. That is the job of a database, and while LIKE 'abc%' can
be optimised to use an index and speed up results, if the 'abc%' is
supplied as a parameter that is not generally possible to prepare the
query using an index. While STARTING always knows the matching string is
the first characters of the index. While PHP and SQL share a number of
alternatives, the SQL versions will have a premium on search time if an
index can't be used.

I was just wondering if str_starting and str_ending matched better with
other string handling options.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

9 years ago by lauri.kentta@gmail.com — view source

unread

Hello,

I only saw you mention strpos, preg_match and substr as (slower)
alternatives. However, there's already a function called substr_compare
which is meant for just this kind of comparisons but which is more
general than your RFC.

function str_begins($a, $b) {
return substr_compare($a, $b, 0, strlen($b)) === 0;
}
function str_ends($a, $b) {
return substr_compare($a, $b, -strlen($b)) === 0;
}

--
Lauri Kenttä

9 years ago by Christoph Becker — view source

unread

I only saw you mention strpos, preg_match and substr as (slower)
alternatives. However, there's already a function called substr_compare
which is meant for just this kind of comparisons but which is more
general than your RFC.

Thanks for pointing out substr_compare(), of which I even have not been
aware of. And indeed, substr_compare() is perfectly suitable to verify
whether a string starts or ends with a certain substring, so, in my
opinion, there is no need for the other functions to be added to
ext/standard.

--
Christoph M. Becker

8 years ago by Simon Welsh — view source

unread

Hello,

I recently emailed the group about submitting an RFC for str_begins() and str_ends() functions. The RFC has now been officially submitted and is viewable at:

https://wiki.php.net/rfc/add_str_begin_and_end_functions

The github PR may be found at:

https://github.com/php/php-src/pull/2049

Hope to be hearing about this,

Will

Firstly, the argument ordering is the wrong way round for a string function. String functions — especially search-related ones — are haystack, needle (see strpos, strstr, strcspn, strpbrk, etc).

Secondly, I feel like this RFC does need to include that it’s a BC break by introducing new global functions. A quick search shows that SugarCRM[1] already implements str_begin and str_end functions and there’s likely to be other projects that do too.

[1]: https://github.com/sugarcrm/sugarcrm_dev/blob/ae189cfa4ed4edd6a4e1e0d9d1d5ec66f46a0b74/include/utils.php#L2082-L2090

Simon Welsh

8 years ago by will@wkhudgins.info — view source

unread

Hello,

I recently emailed the group about submitting an RFC for str_begins()
and str_ends() functions. The RFC has now been officially submitted
and is viewable at:

https://wiki.php.net/rfc/add_str_begin_and_end_functions

The github PR may be found at:

https://github.com/php/php-src/pull/2049

Hope to be hearing about this,

Will

Firstly, the argument ordering is the wrong way round for a string
function. String functions — especially search-related ones — are
haystack, needle (see strpos, strstr, strcspn, strpbrk, etc).

Secondly, I feel like this RFC does need to include that it’s a BC
break by introducing new global functions. A quick search shows that
SugarCRM[1] already implements str_begin and str_end functions and
there’s likely to be other projects that do too.

[1]:
https://github.com/sugarcrm/sugarcrm_dev/blob/ae189cfa4ed4edd6a4e1e0d9d1d5ec66f46a0b74/include/utils.php#L2082-L2090

Simon Welsh

You are correct, functions like strpos and strstr do follow (haystack,
needle) but functions like str_replace follow the format (needle,
haystack). Because I did these functions with the underscore, it made
sense to make the functions follow the format found in other str_*
functions. If the functions were changed to be strbegin, stribegin,
strend, and striend, then it would make sense to follow the (haystack,
needle) format. However, I think adding the underscore greatly improves
the readability of these functions. And if the functions are named with
an underscore, I think it should follow the format found in the other
underscore functions.

Good call on the BC break, I had not thought about it breaking userland
functions with the same name.

-Will

8 years ago by Simon Welsh — view source

unread

Hello,
I recently emailed the group about submitting an RFC for str_begins() and str_ends() functions. The RFC has now been officially submitted and is viewable at:
https://wiki.php.net/rfc/add_str_begin_and_end_functions
The github PR may be found at:
https://github.com/php/php-src/pull/2049
Hope to be hearing about this,
Will
Firstly, the argument ordering is the wrong way round for a string
function. String functions — especially search-related ones — are
haystack, needle (see strpos, strstr, strcspn, strpbrk, etc).
Secondly, I feel like this RFC does need to include that it’s a BC
break by introducing new global functions. A quick search shows that
SugarCRM[1] already implements str_begin and str_end functions and
there’s likely to be other projects that do too.
[1]:
https://github.com/sugarcrm/sugarcrm_dev/blob/ae189cfa4ed4edd6a4e1e0d9d1d5ec66f46a0b74/include/utils.php#L2082-L2090
--
Simon Welsh

You are correct, functions like strpos and strstr do follow (haystack, needle) but functions like str_replace follow the format (needle, haystack). Because I did these functions with the underscore, it made sense to make the functions follow the format found in other str_* functions. If the functions were changed to be strbegin, stribegin, strend, and striend, then it would make sense to follow the (haystack, needle) format. However, I think adding the underscore greatly improves the readability of these functions. And if the functions are named with an underscore, I think it should follow the format found in the other underscore functions.

str_replace and str_ireplace are the only str_* functions that don’t take the full string (haystack) as the first argument. str_pad, str_repeat, str_split and str_word_count all take the full string first even if there are other compulsory arguments.

Also, these functions are replacements for current usage of strpos/strrpos/substr_compare, so I feel like the argument ordering should match those rather than another function that isn’t closely related in functionality.

Good call on the BC break, I had not thought about it breaking userland functions with the same name.

-Will

--
Simon Welsh

6 years ago by will@wkhudgins.info — view source

unread

Hello all,

I submitted this RFC several years ago. I collected a lot of feedback
and I have updated the RFC and corresponding github patch. Please see
the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions and
the github patch at https://github.com/php/php-src/pull/2049. I have
addressed many concerns
(order of arguments, name of functions, multibye support, etc). I plan
to move this RFC to a vote in the coming weeks.

Thanks,

Will

6 years ago by will@wkhudgins.info — view source

unread

I sent this earlier this week without [RFC] in the subject line...since
some people might have filters to check the subject line I wanted to
send this again with the proper substring in the subject line–to make it
clear I intend to take this to a vote in two weeks. Apologies for the
duplicate email.

-Will

Hello all,

I submitted this RFC several years ago. I collected a lot of feedback
and I have updated the RFC and corresponding github patch. Please see
the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions
and the github patch at https://github.com/php/php-src/pull/2049. I
have addressed many concerns
(order of arguments, name of functions, multibye support, etc). I plan
to move this RFC to a vote in the coming weeks.

Thanks,

Will

6 years ago by Nikita Popov — view source

unread

I sent this earlier this week without [RFC] in the subject line...since
some people might have filters to check the subject line I wanted to
send this again with the proper substring in the subject line–to make it
clear I intend to take this to a vote in two weeks. Apologies for the
duplicate email.

-Will

Hello all,

I submitted this RFC several years ago. I collected a lot of feedback
and I have updated the RFC and corresponding github patch. Please see
the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions
and the github patch at https://github.com/php/php-src/pull/2049. I
have addressed many concerns
(order of arguments, name of functions, multibye support, etc). I plan
to move this RFC to a vote in the coming weeks.

Thanks,

Will

Unfortunately, this looks like a case where the RFC feedback has made the
proposal worse, rather than better :(

I think it's easier to start with what I think this proposal should be:
There should be just two functions, str_starts_with() and str_ends_with()
-- and that's it.

The important realization to have here is that these functions are a bit of
sugar for an operation that is quite common, but can also be easily
implemented with existing functions (using strcmp, strpos or substr,
depending on what you like). There is no need for us to cover every
conceivable combination, just make the common case more convenient and
easier to read.

With that in mind:

I believe the "starts with" and "ends with" naming is a lot more
canonical, used by Python, Ruby, Java, JavaScript and probably lots more.
In my experience case-insensitive "i" variants of strings functions are
used much less, by an order of magnitude. With this being sugar in the
first place, I don't think there's a need to cover case-insensitive
variations (and from a quick look, these don't seem to be first class
methods in other languages either). If we do want to have them, I'd suggest
making the names str_starts_with_ci() and str_ends_with_ci(), which is more
obvious and harder to miss than str_istarts_with() etc.
Having mb_* variants of these functions doesn't really make sense. I
realize that there's this knee-jerk reaction about how if it doesn't have
"mb" in the name it's not Unicode compatible, but in this case it's even
more wrong than usual. The normal str_starts_with() function is perfectly
safe to use on UTF-8 strings, the only difference between it and
mb_str_starts_with() is that it's going to be implemented a lot more
efficiently. The only case that might make some sense is the
case-insensitive variant here, because that has some genuine reliance on
the character encoding. But then again, this can be handled by case-folding
the strings first, something that mbstring is going to do internally anyway.

I would happily accept a proposal for str_starts_with() + str_ends_with(),
but I'm a lot more apprehensive about adding these 8 new functions.

Regards,
Nikita

6 years ago by Ben Ramsey — view source

unread

I sent this earlier this week without [RFC] in the subject line...since
some people might have filters to check the subject line I wanted to
send this again with the proper substring in the subject line–to make it
clear I intend to take this to a vote in two weeks. Apologies for the
duplicate email.

-Will

Hello all,

I submitted this RFC several years ago. I collected a lot of feedback
and I have updated the RFC and corresponding github patch. Please see
the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions
and the github patch at https://github.com/php/php-src/pull/2049. I
have addressed many concerns
(order of arguments, name of functions, multibye support, etc). I plan
to move this RFC to a vote in the coming weeks.

Thanks,

Will

Unfortunately, this looks like a case where the RFC feedback has made the
proposal worse, rather than better :(

I think it's easier to start with what I think this proposal should be:
There should be just two functions, str_starts_with() and str_ends_with()
-- and that's it.

The important realization to have here is that these functions are a bit of
sugar for an operation that is quite common, but can also be easily
implemented with existing functions (using strcmp, strpos or substr,
depending on what you like). There is no need for us to cover every
conceivable combination, just make the common case more convenient and
easier to read.

With that in mind:

I believe the "starts with" and "ends with" naming is a lot more
canonical, used by Python, Ruby, Java, JavaScript and probably lots more.

In my experience case-insensitive "i" variants of strings functions are
used much less, by an order of magnitude. With this being sugar in the
first place, I don't think there's a need to cover case-insensitive
variations (and from a quick look, these don't seem to be first class
methods in other languages either). If we do want to have them, I'd suggest
making the names str_starts_with_ci() and str_ends_with_ci(), which is more
obvious and harder to miss than str_istarts_with() etc.

Having mb_* variants of these functions doesn't really make sense. I
realize that there's this knee-jerk reaction about how if it doesn't have
"mb" in the name it's not Unicode compatible, but in this case it's even
more wrong than usual. The normal str_starts_with() function is perfectly
safe to use on UTF-8 strings, the only difference between it and
mb_str_starts_with() is that it's going to be implemented a lot more
efficiently. The only case that might make some sense is the
case-insensitive variant here, because that has some genuine reliance on
the character encoding. But then again, this can be handled by case-folding
the strings first, something that mbstring is going to do internally anyway.

I would happily accept a proposal for str_starts_with() + str_ends_with(),
but I'm a lot more apprehensive about adding these 8 new functions.

Regards,
Nikita

I like the idea of simplifying this to the two functions str_starts_with() and str_ends_with().

When I was looking through this the other day, I had trouble coming up with an example of a string with the mb_* versions would ever generate a different result from the non-multibyte versions, since the implementation only needs to count and analyze bytes for uniqueness. Perhaps it would only be an issue with the case-insensitive versions, as Nikita points out? If so, can someone provide some example strings where an mb_starts_with_ci() would return true, while str_starts_with_ci() would return false?

I think the case sensitivity versions would be common enough in use cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.), but maybe the signatures could be revised to pass a third parameter?

str_starts_with($haystack, $needle, $case_sensitive = true): bool

-Ben

6 years ago by Bruce Weirdan — view source

unread

The normal str_starts_with() function is perfectly safe to use on UTF-8 strings,

Only if you assume strings to be normalized to the same form. Checking if NFC
string starts with NFD substring by checking them bit by bit is going
to yield false
negatives [1]

[1] https://3v4l.org/4HgUL

--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.com

6 years ago by Nikita Popov — view source

unread

The normal str_starts_with() function is perfectly safe to use on UTF-8
strings,

Only if you assume strings to be normalized to the same form. Checking if
NFC
string starts with NFD substring by checking them bit by bit is going
to yield false negatives [1]

[1] https://3v4l.org/4HgUL

That's correct, but not really relevant in the context of the discussion,
as mbstring does not perform Unicode normalization, so mb_* functions
wouldn't change anything about this. (Not that basic string operations
should be performing implicit Unicode normalization...)

Nikita

6 years ago by Rowan Collins — view source

unread

Perhaps it would only be an issue with the case-insensitive versions,
as Nikita points out? If so, can someone provide some example strings
where an mb_starts_with_ci() would return true, while
str_starts_with_ci() would return false?

That's easy: any character that has a lower- and uppercase form, and is not represented as one byte in the target encoding. For that matter, any such character in the non-ASCII section of a single-byte encoding, since a non-mbstring case insensitive flag would presumably leave everything other than ASCII letters untouched.

So, any non-Latin script, like Greek or Cyrillic; any accented characters, unless you're lucky and they're represented by ASCII-letter plus combining modifier; the Turkish "i", which if I remember rightly has three forms not two; and so on.

Regards,

--
Rowan Collins
[IMSoP]

6 years ago by Ben Ramsey — view source

unread

Perhaps it would only be an issue with the case-insensitive versions,
as Nikita points out? If so, can someone provide some example strings
where an mb_starts_with_ci() would return true, while
str_starts_with_ci() would return false?

That's easy: any character that has a lower- and uppercase form, and is not represented as one byte in the target encoding. For that matter, any such character in the non-ASCII section of a single-byte encoding, since a non-mbstring case insensitive flag would presumably leave everything other than ASCII letters untouched.

So, any non-Latin script, like Greek or Cyrillic; any accented characters, unless you're lucky and they're represented by ASCII-letter plus combining modifier; the Turkish "i", which if I remember rightly has three forms not two; and so on.

According to Google, "İyi akşamlar” is the Turkish phrase for “Good evening” (Turkish speakers, please correct me, if this wrong). However, using the existing mb_* functions, I can’t get mb_stripos() to return 0 when trying to see if the string “İYI AKŞAMLAR” begins with “i̇yi.”

I’m just using UTF-8, so maybe there’s an encoding issue here?

$string = 'İyi akşamlar';
$upper = mb_strtoupper($string);
$lowerChars = mb_strtolower(mb_substr($string, 0, 3));

var_dump($string, $upper, $lowerChars);
var_dump(mb_stripos($upper, $lowerChars));

6 years ago by Nikita Popov — view source

unread

On Jun 23, 2019, at 05:35, Rowan Collins rowan.collins@gmail.com
wrote:

Perhaps it would only be an issue with the case-insensitive versions,
as Nikita points out? If so, can someone provide some example strings
where an mb_starts_with_ci() would return true, while
str_starts_with_ci() would return false?

That's easy: any character that has a lower- and uppercase form, and is
not represented as one byte in the target encoding. For that matter, any
such character in the non-ASCII section of a single-byte encoding, since a
non-mbstring case insensitive flag would presumably leave everything other
than ASCII letters untouched.

So, any non-Latin script, like Greek or Cyrillic; any accented
characters, unless you're lucky and they're represented by ASCII-letter
plus combining modifier; the Turkish "i", which if I remember rightly has
three forms not two; and so on.

According to Google, "İyi akşamlar” is the Turkish phrase for “Good
evening” (Turkish speakers, please correct me, if this wrong). However,
using the existing mb_* functions, I can’t get mb_stripos() to return 0
when trying to see if the string “İYI AKŞAMLAR” begins with “i̇yi.”

I’m just using UTF-8, so maybe there’s an encoding issue here?

$string = 'İyi akşamlar';
$upper = mb_strtoupper($string);
$lowerChars = mb_strtolower(mb_substr($string, 0, 3));

var_dump($string, $upper, $lowerChars);
var_dump(mb_stripos($upper, $lowerChars));

The reason why this doesn't work is that mb_stripos internally performs a
simple case fold, while a full case fold would be needed in this case
(Turkish i is hard). It's a bit tricky due to the need to remap character
offsets.

Nikita

6 years ago by Rowan Collins — view source

unread

According to Google, "İyi akşamlar” is the Turkish phrase for “Good evening” (Turkish speakers, please correct me, if this wrong). However, using the existing mb_* functions, I can’t get mb_stripos() to return 0 when trying to see if the string “İYI AKŞAMLAR” begins with “i̇yi.”

Probably mb_string not using the right case-folding routines; as
mentioned in another thread, ext/mbstring wasn't written for Unicode,
but for older multibyte encodings, particularly those used for Japanese
text. grapheme_stripos (from ext/intl) apparently gets it right as of
PHP 7.3: https://3v4l.org/0431j

A much simpler example, though, is using just the second word of that
string: the accented "s" confuses plain stripos but not mb_stripos.

Regards,

--
Rowan Collins
[IMSoP]

6 years ago by Nikita Popov — view source

unread

On Jun 23, 2019, at 05:35, Rowan Collins rowan.collins@gmail.com
wrote:

Perhaps it would only be an issue with the case-insensitive versions,
as Nikita points out? If so, can someone provide some example strings
where an mb_starts_with_ci() would return true, while
str_starts_with_ci() would return false?

That's easy: any character that has a lower- and uppercase form, and is
not represented as one byte in the target encoding. For that matter, any
such character in the non-ASCII section of a single-byte encoding, since a
non-mbstring case insensitive flag would presumably leave everything other
than ASCII letters untouched.

So, any non-Latin script, like Greek or Cyrillic; any accented
characters, unless you're lucky and they're represented by ASCII-letter
plus combining modifier; the Turkish "i", which if I remember rightly has
three forms not two; and so on.

According to Google, "İyi akşamlar” is the Turkish phrase for “Good
evening” (Turkish speakers, please correct me, if this wrong). However,
using the existing mb_* functions, I can’t get mb_stripos() to return 0
when trying to see if the string “İYI AKŞAMLAR” begins with “i̇yi.”

I’m just using UTF-8, so maybe there’s an encoding issue here?

$string = 'İyi akşamlar';
$upper = mb_strtoupper($string);
$lowerChars = mb_strtolower(mb_substr($string, 0, 3));

var_dump($string, $upper, $lowerChars);
var_dump(mb_stripos($upper, $lowerChars));

The reason why this doesn't work is that mb_stripos internally performs a
simple case fold, while a full case fold would be needed in this case
(Turkish i is hard). It's a bit tricky due to the need to remap character
offsets.

I've implemented use of full case folding in
https://github.com/php/php-src/pull/4303. While doing that I kind of
convinced myself that we probably shouldn't actually do this, because it
breaks simple mb_stripos loops in a subtle way. It probably makes more
sense for people to explicitly call mb_convert_case($string, MB_CASE_FOLD)
and then operate on the resulting strings. Both much more efficient, and
avoids offset remapping issues.

Nikita

6 years ago by AllenJB — view source

unread

The reason why this doesn't work is that mb_stripos internally
performs a
simple case fold, while a full case fold would be needed in this case
(Turkish i is hard). It's a bit tricky due to the need to remap character
offsets.

I've implemented use of full case folding in
https://github.com/php/php-src/pull/4303. While doing that I kind of
convinced myself that we probably shouldn't actually do this, because it
breaks simple mb_stripos loops in a subtle way. It probably makes more
sense for people to explicitly call mb_convert_case($string, MB_CASE_FOLD)
and then operate on the resulting strings. Both much more efficient, and
avoids offset remapping issues.

Nikita

If these functions (mb_stripos and any others affected by the same
issue) are not the recommended way, and may not act as users expect,
should they be deprecated?

Or at least notes added to the manual pages regarding this behavior /
the differences between the different methods?

AllenJB

6 years ago by will@wkhudgins.info — view source

unread

These are good points. Originally my RFC called for less functions but
based on feedback I added the others. My proposal: take the RFC as-is to
a vote. If it fails, I will raise another RFC for a vote that will just
contain the two basic functions: str_begins and str_ends.

Thanks,

Will

I sent this earlier this week without [RFC] in the subject
line...since
some people might have filters to check the subject line I wanted to
send this again with the proper substring in the subject line–to make
it
clear I intend to take this to a vote in two weeks. Apologies for the
duplicate email.

-Will

Hello all,

I submitted this RFC several years ago. I collected a lot of
feedback
and I have updated the RFC and corresponding github patch. Please
see
the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions
and the github patch at https://github.com/php/php-src/pull/2049. I
have addressed many concerns
(order of arguments, name of functions, multibye support, etc). I
plan
to move this RFC to a vote in the coming weeks.

Thanks,

Will

Unfortunately, this looks like a case where the RFC feedback has made
the
proposal worse, rather than better :(

I think it's easier to start with what I think this proposal should
be:
There should be just two functions, str_starts_with() and
str_ends_with()
-- and that's it.

The important realization to have here is that these functions are a
bit of
sugar for an operation that is quite common, but can also be easily
implemented with existing functions (using strcmp, strpos or substr,
depending on what you like). There is no need for us to cover every
conceivable combination, just make the common case more convenient and
easier to read.

With that in mind:

I believe the "starts with" and "ends with" naming is a lot more
canonical, used by Python, Ruby, Java, JavaScript and probably lots
more.

In my experience case-insensitive "i" variants of strings functions
are
used much less, by an order of magnitude. With this being sugar in the
first place, I don't think there's a need to cover case-insensitive
variations (and from a quick look, these don't seem to be first class
methods in other languages either). If we do want to have them, I'd
suggest
making the names str_starts_with_ci() and str_ends_with_ci(), which is
more
obvious and harder to miss than str_istarts_with() etc.

Having mb_* variants of these functions doesn't really make sense. I
realize that there's this knee-jerk reaction about how if it doesn't
have
"mb" in the name it's not Unicode compatible, but in this case it's
even
more wrong than usual. The normal str_starts_with() function is
perfectly
safe to use on UTF-8 strings, the only difference between it and
mb_str_starts_with() is that it's going to be implemented a lot more
efficiently. The only case that might make some sense is the
case-insensitive variant here, because that has some genuine reliance
on
the character encoding. But then again, this can be handled by
case-folding
the strings first, something that mbstring is going to do internally
anyway.

I would happily accept a proposal for str_starts_with() +
str_ends_with(),
but I'm a lot more apprehensive about adding these 8 new functions.

Regards,
Nikita

I like the idea of simplifying this to the two functions
str_starts_with() and str_ends_with().

When I was looking through this the other day, I had trouble coming up
with an example of a string with the mb_* versions would ever generate
a different result from the non-multibyte versions, since the
implementation only needs to count and analyze bytes for uniqueness.
Perhaps it would only be an issue with the case-insensitive versions,
as Nikita points out? If so, can someone provide some example strings
where an mb_starts_with_ci() would return true, while
str_starts_with_ci() would return false?

I think the case sensitivity versions would be common enough in use
cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.),
but maybe the signatures could be revised to pass a third parameter?

str_starts_with($haystack, $needle, $case_sensitive = true): bool

-Ben

6 years ago by Nikita Popov — view source

unread

These are good points. Originally my RFC called for less functions but
based on feedback I added the others. My proposal: take the RFC as-is to
a vote. If it fails, I will raise another RFC for a vote that will just
contain the two basic functions: str_begins and str_ends.

To put my comments into more actionable form, here is what I would
recommend for this RFC:

Rename str_begins -> str_starts_with, str_ends -> str_ends_with,
str_ibegins -> str_starts_with_ci, str_iends -> str_ends_with_ci. As
mentioned before, this is standard terminology used by many, many
programming languages and it would be great if PHP did not deviate from
convention without strong reason.
Have a separate vote (in the same RFC) for the addition of the
corresponding mb_* variants.

I believe doing those two changes will ensure that the core part of the RFC
passes. I personally would be voting yes on the first part and no on the
second, but others may decide as they see fit.

Nikita

I sent this earlier this week without [RFC] in the subject
line...since
some people might have filters to check the subject line I wanted to
send this again with the proper substring in the subject line–to make
it
clear I intend to take this to a vote in two weeks. Apologies for the
duplicate email.

-Will

Hello all,

I submitted this RFC several years ago. I collected a lot of
feedback
and I have updated the RFC and corresponding github patch. Please
see
the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions
and the github patch at https://github.com/php/php-src/pull/2049. I
have addressed many concerns
(order of arguments, name of functions, multibye support, etc). I
plan
to move this RFC to a vote in the coming weeks.

Thanks,

Will

Unfortunately, this looks like a case where the RFC feedback has made
the
proposal worse, rather than better :(

I think it's easier to start with what I think this proposal should
be:
There should be just two functions, str_starts_with() and
str_ends_with()
-- and that's it.

The important realization to have here is that these functions are a
bit of
sugar for an operation that is quite common, but can also be easily
implemented with existing functions (using strcmp, strpos or substr,
depending on what you like). There is no need for us to cover every
conceivable combination, just make the common case more convenient and
easier to read.

With that in mind:

I believe the "starts with" and "ends with" naming is a lot more
canonical, used by Python, Ruby, Java, JavaScript and probably lots
more.

In my experience case-insensitive "i" variants of strings functions
are
used much less, by an order of magnitude. With this being sugar in the
first place, I don't think there's a need to cover case-insensitive
variations (and from a quick look, these don't seem to be first class
methods in other languages either). If we do want to have them, I'd
suggest
making the names str_starts_with_ci() and str_ends_with_ci(), which is
more
obvious and harder to miss than str_istarts_with() etc.

Having mb_* variants of these functions doesn't really make sense. I
realize that there's this knee-jerk reaction about how if it doesn't
have
"mb" in the name it's not Unicode compatible, but in this case it's
even
more wrong than usual. The normal str_starts_with() function is
perfectly
safe to use on UTF-8 strings, the only difference between it and
mb_str_starts_with() is that it's going to be implemented a lot more
efficiently. The only case that might make some sense is the
case-insensitive variant here, because that has some genuine reliance
on
the character encoding. But then again, this can be handled by
case-folding
the strings first, something that mbstring is going to do internally
anyway.

I would happily accept a proposal for str_starts_with() +
str_ends_with(),
but I'm a lot more apprehensive about adding these 8 new functions.

Regards,
Nikita

I like the idea of simplifying this to the two functions
str_starts_with() and str_ends_with().

When I was looking through this the other day, I had trouble coming up
with an example of a string with the mb_* versions would ever generate
a different result from the non-multibyte versions, since the
implementation only needs to count and analyze bytes for uniqueness.
Perhaps it would only be an issue with the case-insensitive versions,
as Nikita points out? If so, can someone provide some example strings
where an mb_starts_with_ci() would return true, while
str_starts_with_ci() would return false?

I think the case sensitivity versions would be common enough in use
cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.),
but maybe the signatures could be revised to pass a third parameter?

str_starts_with($haystack, $needle, $case_sensitive = true): bool

-Ben

6 years ago by me@jhdxr.com — view source

unread

Agreed. I'm wondering why the author choose to use begin(s) /end(s) while almost all other popular language has a more clear naming. e.g. starts_with or has_prefix.

In addition, like someone else pointed out two years ago, userland may already have functions with the same name, and this should be considered as a potential BC break, which is not reflected in the RFC yet.

Regards,
CHU Zhaowei

-----Original Message-----
From: Nikita Popov nikita.ppv@gmail.com
Sent: Saturday, June 29, 2019 6:07 AM
To: will@wkhudgins.info
Cc: Ben Ramsey ben@benramsey.com; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] [RFC] Desire to move RFC
add_str_begin_and_end_functions to a vote

These are good points. Originally my RFC called for less functions but
based on feedback I added the others. My proposal: take the RFC as-is
to a vote. If it fails, I will raise another RFC for a vote that will
just contain the two basic functions: str_begins and str_ends.

To put my comments into more actionable form, here is what I would
recommend for this RFC:

Rename str_begins -> str_starts_with, str_ends -> str_ends_with, str_ibegins ->
str_starts_with_ci, str_iends -> str_ends_with_ci. As mentioned before, this is
standard terminology used by many, many programming languages and it would
be great if PHP did not deviate from convention without strong reason.

Have a separate vote (in the same RFC) for the addition of the corresponding
mb_* variants.

I believe doing those two changes will ensure that the core part of the RFC
passes. I personally would be voting yes on the first part and no on the second,
but others may decide as they see fit.

Nikita

I sent this earlier this week without [RFC] in the subject
line...since some people might have filters to check the subject
line I wanted to send this again with the proper substring in the
subject line–to make it clear I intend to take this to a vote in
two weeks. Apologies for the duplicate email.

-Will

Hello all,

I submitted this RFC several years ago. I collected a lot of
feedback and I have updated the RFC and corresponding github
patch. Please see the RFC at
https://wiki.php.net/rfc/add_str_begin_and_end_functions
and the github patch at https://github.com/php/php-src/pull/2049.
I have addressed many concerns (order of arguments, name of
functions, multibye support, etc). I plan to move this RFC to a
vote in the coming weeks.

Thanks,

Will

Unfortunately, this looks like a case where the RFC feedback has
made the proposal worse, rather than better :(

I think it's easier to start with what I think this proposal should
be:
There should be just two functions, str_starts_with() and
str_ends_with()
-- and that's it.

The important realization to have here is that these functions are
a bit of sugar for an operation that is quite common, but can also
be easily implemented with existing functions (using strcmp, strpos
or substr, depending on what you like). There is no need for us to
cover every conceivable combination, just make the common case more
convenient and easier to read.

With that in mind:

I believe the "starts with" and "ends with" naming is a lot more
canonical, used by Python, Ruby, Java, JavaScript and probably lots
more.

In my experience case-insensitive "i" variants of strings
functions are used much less, by an order of magnitude. With this
being sugar in the first place, I don't think there's a need to
cover case-insensitive variations (and from a quick look, these
don't seem to be first class methods in other languages either). If
we do want to have them, I'd suggest making the names
str_starts_with_ci() and str_ends_with_ci(), which is more obvious
and harder to miss than str_istarts_with() etc.

Having mb_* variants of these functions doesn't really make
sense. I realize that there's this knee-jerk reaction about how if
it doesn't have "mb" in the name it's not Unicode compatible, but
in this case it's even more wrong than usual. The normal
str_starts_with() function is perfectly safe to use on UTF-8
strings, the only difference between it and
mb_str_starts_with() is that it's going to be implemented a lot
more efficiently. The only case that might make some sense is the
case-insensitive variant here, because that has some genuine
reliance on the character encoding. But then again, this can be
handled by case-folding the strings first, something that mbstring
is going to do internally anyway.

I would happily accept a proposal for str_starts_with() +
str_ends_with(), but I'm a lot more apprehensive about adding these
8 new functions.

Regards,
Nikita

I like the idea of simplifying this to the two functions
str_starts_with() and str_ends_with().

When I was looking through this the other day, I had trouble coming
up with an example of a string with the mb_* versions would ever
generate a different result from the non-multibyte versions, since
the implementation only needs to count and analyze bytes for uniqueness.
Perhaps it would only be an issue with the case-insensitive
versions, as Nikita points out? If so, can someone provide some
example strings where an mb_starts_with_ci() would return true,
while
str_starts_with_ci() would return false?

I think the case sensitivity versions would be common enough in use
cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.),
but maybe the signatures could be revised to pass a third parameter?

str_starts_with($haystack, $needle, $case_sensitive = true): bool

-Ben

6 years ago by will@wkhudgins.info — view source

unread

Nikita: I like the idea of splitting the mb_* versions from the main
vote...I'll have to see how to do that in the docu-wiki GUI but I like
the idea!

CHU: I will add a note that some userland functions may not be
compatible with this change although I don't think that should be a
showstopper, voters can decide as they see fit.

How do people tend to feel about the "str_startswith" vs
"str_starts_with" naming convention? I've seen people propose both.

Thanks,

Will

Agreed. I'm wondering why the author choose to use begin(s) /end(s)
while almost all other popular language has a more clear naming. e.g.
starts_with or has_prefix.

In addition, like someone else pointed out two years ago, userland may
already have functions with the same name, and this should be
considered as a potential BC break, which is not reflected in the RFC
yet.

Regards,
CHU Zhaowei

-----Original Message-----
From: Nikita Popov nikita.ppv@gmail.com
Sent: Saturday, June 29, 2019 6:07 AM
To: will@wkhudgins.info
Cc: Ben Ramsey ben@benramsey.com; PHP internals
internals@lists.php.net
Subject: Re: [PHP-DEV] [RFC] Desire to move RFC
add_str_begin_and_end_functions to a vote

These are good points. Originally my RFC called for less functions but
based on feedback I added the others. My proposal: take the RFC as-is
to a vote. If it fails, I will raise another RFC for a vote that will
just contain the two basic functions: str_begins and str_ends.

To put my comments into more actionable form, here is what I would
recommend for this RFC:

Rename str_begins -> str_starts_with, str_ends -> str_ends_with,
str_ibegins ->
str_starts_with_ci, str_iends -> str_ends_with_ci. As mentioned
before, this is
standard terminology used by many, many programming languages and it
would
be great if PHP did not deviate from convention without strong reason.

Have a separate vote (in the same RFC) for the addition of the
corresponding
mb_* variants.

I believe doing those two changes will ensure that the core part of
the RFC
passes. I personally would be voting yes on the first part and no on
the second,
but others may decide as they see fit.

Nikita

I sent this earlier this week without [RFC] in the subject
line...since some people might have filters to check the subject
line I wanted to send this again with the proper substring in the
subject line–to make it clear I intend to take this to a vote in
two weeks. Apologies for the duplicate email.

-Will

Hello all,

I submitted this RFC several years ago. I collected a lot of
feedback and I have updated the RFC and corresponding github
patch. Please see the RFC at
https://wiki.php.net/rfc/add_str_begin_and_end_functions
and the github patch at https://github.com/php/php-src/pull/2049.
I have addressed many concerns (order of arguments, name of
functions, multibye support, etc). I plan to move this RFC to a
vote in the coming weeks.

Thanks,

Will

Unfortunately, this looks like a case where the RFC feedback has
made the proposal worse, rather than better :(

I think it's easier to start with what I think this proposal should
be:
There should be just two functions, str_starts_with() and
str_ends_with()
-- and that's it.

The important realization to have here is that these functions are
a bit of sugar for an operation that is quite common, but can also
be easily implemented with existing functions (using strcmp, strpos
or substr, depending on what you like). There is no need for us to
cover every conceivable combination, just make the common case more
convenient and easier to read.

With that in mind:

I believe the "starts with" and "ends with" naming is a lot more
canonical, used by Python, Ruby, Java, JavaScript and probably lots
more.

In my experience case-insensitive "i" variants of strings
functions are used much less, by an order of magnitude. With this
being sugar in the first place, I don't think there's a need to
cover case-insensitive variations (and from a quick look, these
don't seem to be first class methods in other languages either). If
we do want to have them, I'd suggest making the names
str_starts_with_ci() and str_ends_with_ci(), which is more obvious
and harder to miss than str_istarts_with() etc.

Having mb_* variants of these functions doesn't really make
sense. I realize that there's this knee-jerk reaction about how if
it doesn't have "mb" in the name it's not Unicode compatible, but
in this case it's even more wrong than usual. The normal
str_starts_with() function is perfectly safe to use on UTF-8
strings, the only difference between it and
mb_str_starts_with() is that it's going to be implemented a lot
more efficiently. The only case that might make some sense is the
case-insensitive variant here, because that has some genuine
reliance on the character encoding. But then again, this can be
handled by case-folding the strings first, something that mbstring
is going to do internally anyway.

I would happily accept a proposal for str_starts_with() +
str_ends_with(), but I'm a lot more apprehensive about adding these
8 new functions.

Regards,
Nikita

I like the idea of simplifying this to the two functions
str_starts_with() and str_ends_with().

When I was looking through this the other day, I had trouble coming
up with an example of a string with the mb_* versions would ever
generate a different result from the non-multibyte versions, since
the implementation only needs to count and analyze bytes for uniqueness.
Perhaps it would only be an issue with the case-insensitive
versions, as Nikita points out? If so, can someone provide some
example strings where an mb_starts_with_ci() would return true,
while
str_starts_with_ci() would return false?

I think the case sensitivity versions would be common enough in use
cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.),
but maybe the signatures could be revised to pass a third parameter?

str_starts_with($haystack, $needle, $case_sensitive = true): bool

-Ben

6 years ago by Dik Takken — view source

unread

How do people tend to feel about the "str_startswith" vs
"str_starts_with" naming convention? I've seen people propose both.

For best readability one should write '_' between separate words, just
like a space would be used in regular text, IHMO. The PHP standard
library already has a number of methods that are named that way, like
str_word_count(). So, I would favor to have str_starts_with().

Regards,
Dik Takken

6 years ago by will@wkhudgins.info — view source

unread

I have updated the RFC here
https://wiki.php.net/rfc/add_str_begin_and_end_functions to reflect
changes from the mailing list discussions. I will promptly open voting
on this RFC.

-Will

6 years ago by will@wkhudgins.info — view source

unread

Hello all,

After 15 days of discussion I have opened up voting on the following RFC
(https://wiki.php.net/rfc/add_str_begin_and_end_functions) .

You can access the voting page here:
https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote

I have never set up a vote on doku-wiki so please let me know if I made
the vote incorrectly!

Thanks,

Will

6 years ago by will@wkhudgins.info — view source

unread

Following up on this, I plan to leave voting open for a full 15 days,
until July 20, 2019 Anywhere-on-Earth (AOE) time. If there are issues
with this time, let me know.

Thanks,

Will

Hello all,

After 15 days of discussion I have opened up voting on the following
RFC (https://wiki.php.net/rfc/add_str_begin_and_end_functions) .

You can access the voting page here:
https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote

I have never set up a vote on doku-wiki so please let me know if I
made the vote incorrectly!

Thanks,

Will

6 years ago by Theodore Brown — view source

unread

After 15 days of discussion I have opened up voting on the following RFC
(https://wiki.php.net/rfc/add_str_begin_and_end_functions) .

You can access the voting page here:
https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote

I have never set up a vote on doku-wiki so please let me know if I made
the vote incorrectly!

It seems really unusual for voting to be on a separate page than the
RFC. Can you move the doodle voting macro to a "Vote" section on the
main RFC page?

Thanks,
Theodore

6 years ago by Peter Cowburn — view source

unread

After 15 days of discussion I have opened up voting on the following RFC
(https://wiki.php.net/rfc/add_str_begin_and_end_functions) .

You can access the voting page here:
https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote

I have never set up a vote on doku-wiki so please let me know if I made
the vote incorrectly!

It seems really unusual for voting to be on a separate page than the
RFC. Can you move the doodle voting macro to a "Vote" section on the
main RFC page?

Further to this, please follow the instructions at
https://wiki.php.net/rfc/howto. It has simple to follow steps detailing
exactly what to do.
Also, this RFC is still showing as "inactive" on the RFC list (
https://wiki.php.net/rfc) - anyone watching that page won't even know it
was back under discussion never mind in voting.

Thanks,
Theodore

6 years ago by Nikita Popov — view source

unread

On Fri, Jul 5, 2019 at 6:17 AM Theodore Brown theodorejb@outlook.com
wrote:

After 15 days of discussion I have opened up voting on the following RFC
(https://wiki.php.net/rfc/add_str_begin_and_end_functions) .

You can access the voting page here:
https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote

I have never set up a vote on doku-wiki so please let me know if I made
the vote incorrectly!

It seems really unusual for voting to be on a separate page than the
RFC. Can you move the doodle voting macro to a "Vote" section on the
main RFC page?

Thanks,
Theodore

I've taken the liberty to move the voting widgets onto the main RFC page,
the new voting link is:
https://wiki.php.net/rfc/add_str_begin_and_end_functions#vote

I've also move the RFC into the voting section on the RFC overview page.

Nikita

6 years ago by Theodore Brown — view source

unread

Hello all,

After 15 days of discussion I have opened up voting on the following
RFC (https://wiki.php.net/rfc/add_str_begin_and_end_functions).

Thank you for your work on this. I'm surprised that so far the vote
is so controversial, with 8 votes in favor and 8 opposed.

For those voting against adding these functions, can you clarify why?
Do you dislike how they are named, or do you not see the need for the
case insensitive versions, or is there an issue with the implementation?

Personally I'd find the basic str_starts_with and str_ends_with
functions very valuable. Currently I either have to implement functions
like this myself in almost every script, or else write repetitious
code like the following:

$needle = "foobar";

if (substr($haystack, 0, strlen($needle)) === $needle) {
    // starts with "foobar"
}

To avoid repetition, many developers use the following pattern instead:

if (strpos($haystack, "foobar") === 0) {
    // starts with "foobar"
}

However, with longer strings this becomes far less efficient, since PHP
has to search through the entire haystack to find the needle position.

If this RFC is accepted, these awkward and inefficient approaches
could be replaced with straightforward and fast code like this:

if (str_starts_with($haystack, "foobar")) {
    // ...
}

Please vote on the RFC if you haven't already. Clarification would be
appreciated if don't feel that these functions would be a good addition.

Best regards,
Theodore

6 years ago by Sara Golemon — view source

unread

On Sun, Jul 7, 2019 at 3:45 PM Theodore Brown theodorejb@outlook.com
wrote:

For those voting against adding these functions, can you clarify why?

Explaining my non-vote. I'm explicitly abstaining as I don't see the value
in these functions (I'd rather see a community driven library which does
the same thing in a more agile way), but neither do I see much intrinsic
harm in allowing these functions in.

I did vote against the mb* variants as I'd like to see those die in favor
of ext/intl in all ways and every way.

-Sara

6 years ago by Theodore Brown — view source

unread

For those voting against adding these functions, can you clarify why?

Explaining my non-vote. I'm explicitly abstaining as I don't see the
value in these functions (I'd rather see a community driven library
which does the same thing in a more agile way), but neither do I see
much intrinsic harm in allowing these functions in.

Thanks Sara. I understand your perspective of not wanting to add more
functions to PHP core which can be easily implemented in userland.

However, when it comes to basic string functions which are needed in
almost every script, I don't think it makes sense to ask users to
depend on an extra library for this. Almost every other language
has built-in functions for simply checking if a string starts or ends
with another string.

Best regards,
Theodore

6 years ago by Ben Ramsey — view source

unread

Having this _ci postfix is a new way of indicating case insensitivity.
I think that it might add to negative votes. Personally I think it's a
good idea to mimic existing ways, even if they are a bit awkward.

How about using a flag or following "tradition", like stri_starts_with
& stri_ends_with or str_istarts_with & str_iends_with? That would
follow strstr / stristr and str_replace / str_ireplace.

I have no voting rights though.

I made this recommendation earlier in the other thread (https://externals.io/message/94787#106035), but it didn’t get any traction or response:

maybe the signatures could be revised to pass a third parameter?

str_starts_with($haystack, $needle, $case_sensitive = true): bool

Since voting has already begun, is this something that could still be considered?

-Ben

6 years ago by will@wkhudgins.info — view source

unread

On Jul 8, 2019, at 13:09, Björn Larsson bjorn.x.larsson@telia.com
wrote:

Having this _ci postfix is a new way of indicating case
insensitivity.
I think that it might add to negative votes. Personally I think it's a
good idea to mimic existing ways, even if they are a bit awkward.

How about using a flag or following "tradition", like stri_starts_with
& stri_ends_with or str_istarts_with & str_iends_with? That would
follow strstr / stristr and str_replace / str_ireplace.

I have no voting rights though.

I made this recommendation earlier in the other thread
(https://externals.io/message/94787#106035), but it didn’t get any
traction or response:

maybe the signatures could be revised to pass a third parameter?

str_starts_with($haystack, $needle, $case_sensitive = true): bool

Since voting has already begun, is this something that could still be
considered?

-Ben

Thanks for the interest everyone! I've been following the email thread
and have a few thoughts.

At one point I had it set to take case sensitivity as a parameter
(https://github.com/php/php-src/pull/2049/commits/f89d8edc5f32d8a4b702699209e72d864e2ca440).
That isn't a bad idea IMO. I changed it to have split functions to match
str_ireplace, stripos, etc.
I agree the *_ci naming convention is different than most of the
existing codebase, but a lot of discussion during the process led to the
*_ci naming convention. And while the *_ci naming convention isn't
traditional, it does seem more intuitive. The i is easier to read in
something short like "str_ireplace" but kind of gets lost in something
long like "str_istarts_with".
I'd considered splitting the vote into 3 parts:
1) str_starts_with and str_ends_with
2) str_starts_with_ci and str_ends_with_ci
3) The mb_* functions.
```
But I decided against that as I felt that might be overly splitting 
```

up the proposal.

If the main issue is naming and not functionality, I am happy to rework
the RFC (if it fails) to be more palatable. I primarily would like to
add this functionality to PHP, regardless of the naming. In my opinion
one of the nice things about PHP is that it comes with so many things
under the hood. As a user of the language that is something I
appreciate. A lot of powerful functionality is baked into the language
and that functionality is available on almost every web host. A language
like Python or Java just can't compare in that respect. Even NodeJS
requires an extensive amount of packages to accomplish even simple
tasks. PHP is nice because it ships with "batteries" included. Sure,
that brings some issues along with it, but that is as much a strength of
the language as it is a challenge. Adding a common task like starts_with
and ends_with seems like a reasonable thing to do.

Thanks,

Will

6 years ago by Peter Bowyer — view source

unread

On Mon, 8 Jul 2019 at 19:09, Björn Larsson bjorn.x.larsson@telia.com
wrote:

Having this _ci postfix is a new way of indicating case insensitivity.
I think that it might add to negative votes. Personally I think it's a
good idea to mimic existing ways, even if they are a bit awkward.

How about using a flag or following "tradition", like stri_starts_with
& stri_ends_with or str_istarts_with & str_iends_with? That would
follow strstr / stristr and str_replace / str_ireplace.

I would vote yes with that naming. It's a damn silly tradition, but it's
what PHP uses for other functions, and keeping consistency is better than
improving individual functions.

Peter

6 years ago by Claude Pache — view source

unread

Le 9 juil. 2019 à 09:40, Peter Bowyer phpmailinglists@gmail.com a écrit :

On Mon, 8 Jul 2019 at 19:09, Björn Larsson bjorn.x.larsson@telia.com
wrote:

Having this _ci postfix is a new way of indicating case insensitivity.
I think that it might add to negative votes. Personally I think it's a
good idea to mimic existing ways, even if they are a bit awkward.

How about using a flag or following "tradition", like stri_starts_with
& stri_ends_with or str_istarts_with & str_iends_with? That would
follow strstr / stristr and str_replace / str_ireplace.

I would vote yes with that naming. It's a damn silly tradition, but it's
what PHP uses for other functions, and keeping consistency is better than
improving individual functions.

Peter

There are currently (at least) two ways for marking case insensitivity in the name: the character “i” as in stripos() and the substring “case” as in: strcasecmp().

Adding a third way, namely the “ci” suffix (or, even worse, a flag) is absolutely in the silly tradition of inconsistent naming of PHP functions (although admittedly not one we should strive to maintain)... except that “ci” is maybe more meaningful than “i” and “case”.

—Claude

6 years ago by Nikita Popov — view source

unread

Hello all,

After 15 days of discussion I have opened up voting on the following RFC
(https://wiki.php.net/rfc/add_str_begin_and_end_functions) .

You can access the voting page here:
https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote

I have never set up a vote on doku-wiki so please let me know if I made
the vote incorrectly!

Thanks,

Will

As we're already two days past the announced end, I've closed the RFC vote.
The final outcome is 26 in favor vs 20 against for str_starts_with and
friends, and 4 in favor to 36 against for mb_starts_with and friends.
Because a 2/3 majority is required, both parts of the proposal are declined.

Based on the discussion during voting, I think that trying this again with
just str_starts_with+str_ends_with without the case-insensitive variants
might pass, as that's where the main controversy seems to be -- though some
people also expressed the view that these functions are too trivial to add
to the standard library.

In any case, thanks for driving this through the RFC process!

Nikita

5 years ago by Guilliam Xavier — view source

unread

Hello all,

After 15 days of discussion I have opened up voting on the following RFC
(https://wiki.php.net/rfc/add_str_begin_and_end_functions) .

You can access the voting page here:
https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote

I have never set up a vote on doku-wiki so please let me know if I made
the vote incorrectly!

Thanks,

Will

As we're already two days past the announced end, I've closed the RFC vote.
The final outcome is 26 in favor vs 20 against for str_starts_with and
friends, and 4 in favor to 36 against for mb_starts_with and friends.
Because a 2/3 majority is required, both parts of the proposal are declined.

Based on the discussion during voting, I think that trying this again with
just str_starts_with+str_ends_with without the case-insensitive variants
might pass, as that's where the main controversy seems to be -- though some
people also expressed the view that these functions are too trivial to add
to the standard library.

In any case, thanks for driving this through the RFC process!

Nikita

Hello Will,

More than 6 months have passed, and in the meantime the related
str_contains RFC has been accepted for the next PHP 8.0
(https://externals.io/message/109050). Would you be willing to
reboot your RFC with just str_starts_with and str_ends_with (and a
stronger case of how people keep implementing them using the
inefficient and/or error-prone currently available alternatives)?

Best regards,

--
Guilliam Xavier

5 years ago by will@wkhudgins.info — view source

unread

On Mon, Jul 22, 2019 at 10:54 AM Nikita Popov nikita.ppv@gmail.com
wrote:

Hello all,

After 15 days of discussion I have opened up voting on the following RFC
(https://wiki.php.net/rfc/add_str_begin_and_end_functions) .

You can access the voting page here:
https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote

I have never set up a vote on doku-wiki so please let me know if I made
the vote incorrectly!

Thanks,

Will

As we're already two days past the announced end, I've closed the RFC
vote.
The final outcome is 26 in favor vs 20 against for str_starts_with and
friends, and 4 in favor to 36 against for mb_starts_with and friends.
Because a 2/3 majority is required, both parts of the proposal are
declined.

Based on the discussion during voting, I think that trying this again
with
just str_starts_with+str_ends_with without the case-insensitive
variants
might pass, as that's where the main controversy seems to be -- though
some
people also expressed the view that these functions are too trivial to
add
to the standard library.

In any case, thanks for driving this through the RFC process!

Nikita

Hello Will,

More than 6 months have passed, and in the meantime the related
str_contains RFC has been accepted for the next PHP 8.0
(https://externals.io/message/109050). Would you be willing to
reboot your RFC with just str_starts_with and str_ends_with (and a
stronger case of how people keep implementing them using the
inefficient and/or error-prone currently available alternatives)?

Best regards,

Yes, I'll start working on that again.

Thanks,

Will

RFC Posted for str_begins and str_ends functions

-- Lester Caine - G8HFL

Regards,

-- Lester Caine - G8HFL

[1]: https://github.com/sugarcrm/sugarcrm_dev/blob/ae189cfa4ed4edd6a4e1e0d9d1d5ec66f46a0b74/include/utils.php#L2082-L2090

[1]: https://github.com/sugarcrm/sugarcrm_dev/blob/ae189cfa4ed4edd6a4e1e0d9d1d5ec66f46a0b74/include/utils.php#L2082-L2090

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

[1]:
https://github.com/sugarcrm/sugarcrm_dev/blob/ae189cfa4ed4edd6a4e1e0d9d1d5ec66f46a0b74/include/utils.php#L2082-L2090