Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:100144 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 81019 invoked from network); 1 Aug 2017 07:36:13 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Aug 2017 07:36:13 -0000 Authentication-Results: pb1.pair.com smtp.mail=andreas@dqxtech.net; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=andreas@dqxtech.net; sender-id=unknown Received-SPF: error (pb1.pair.com: domain dqxtech.net from 209.85.214.44 cause and error) X-PHP-List-Original-Sender: andreas@dqxtech.net X-Host-Fingerprint: 209.85.214.44 mail-it0-f44.google.com Received: from [209.85.214.44] ([209.85.214.44:36111] helo=mail-it0-f44.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 79/75-07025-56F20895 for ; Tue, 01 Aug 2017 03:36:07 -0400 Received: by mail-it0-f44.google.com with SMTP id 77so3659986itj.1 for ; Tue, 01 Aug 2017 00:36:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dqxtech-net.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=vGpMRVysEojvuOm2qpsOw2/aU06oID0bUYugap2sBEA=; b=GVwngaXwxTx0FV5L3OdRUOcOvUJdXL6ye27c/b0pxMj6ha0gBG/6Tj6x40vU0QqGyP eH6mbjc8d1tACe+IEmiI0d/UZHoxGUupVBx9YxxTen5LB0KhCDJG0l2RdtauKoTivk1g tEHLfNwUPwAYtJq/mtKfR4rIWLUpT5xJW+xbAQwzMD6K8AXNLn/ulHaSfdhFOsLMzheg JyjJ9XUVJ3hgPam/BDPEqzpwFEs671Stt6hk0FuZ7XRGiWMZhcK+eVfavG6bbfIvdkVR zojVzydes9vbk84tpyk/7nxNGGFdxAcDeiNvozGf3wCn6oyuAnLR+NR7TppWox/mcIAH Nv+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=vGpMRVysEojvuOm2qpsOw2/aU06oID0bUYugap2sBEA=; b=NBXWKGtpuZs1kyb/t272qxudeEO4RxRjPCsNeirk3wRbK3L0v89mSVHPSxI688G0Gx 9eJr6GMzNTsTrX7YHMsOD979Smhec2vkO6HBkyvxo1cHWJufGRrdiP+MQsi1tvJS3Ysp rDZEJCyWLMKqzLz48oocZxO9OEJo+ZwdzEI4i4dfvunPq+YY8/Vru/F609SkT4ZzaicW 2OePS62Q8oEKNuhKnSZqPx45fBHadEmFAUp6/qkXCZmhEaX5oZtAflCiGZjUX7lL0oCk +T5k/PPNY+3yCH7g7o93WKpH+LS7243RUiwN6B26YGIi4sfQu70iGcZX6ej51xacZNjn 85nw== X-Gm-Message-State: AIVw113BbWh5+xpSuDU113oyfUUmdDgwA8FgMQYPJr4Omx0DMtiCHtN6 NqTMCfaXAiM+YWthugg= X-Received: by 10.36.34.135 with SMTP id o129mr781087ito.85.1501572962203; Tue, 01 Aug 2017 00:36:02 -0700 (PDT) Received: from mail-it0-f41.google.com (mail-it0-f41.google.com. [209.85.214.41]) by smtp.googlemail.com with ESMTPSA id g84sm414835ita.23.2017.08.01.00.35.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Aug 2017 00:35:59 -0700 (PDT) Received: by mail-it0-f41.google.com with SMTP id 77so3659410itj.1 for ; Tue, 01 Aug 2017 00:35:59 -0700 (PDT) X-Received: by 10.36.73.151 with SMTP id e23mr828446itd.43.1501572959418; Tue, 01 Aug 2017 00:35:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.245.194 with HTTP; Tue, 1 Aug 2017 00:35:38 -0700 (PDT) In-Reply-To: References: Date: Tue, 1 Aug 2017 09:35:38 +0200 X-Gmail-Original-Message-ID: Message-ID: To: =?UTF-8?Q?Micha=C5=82_Brzuchalski?= Cc: PHP internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] New functions: string_starts_with(), string_ends_with() From: andreas@dqxtech.net (Andreas Hennings) Thanks! I did not find those, maybe the emails need to be enriched with keywords. Like SEO-aware email authoring. Ok. I am looking at the RFC and the old discussions at https://marc.info/?l=3Dphp-internals&m=3D147017797404339&w=3D2 I don't know how to follow up on old threads that I don't have in my email inbox. So here is my feedback. The RFC seems mostly fine as it is. It does not contain anything like the string_clip_suffix() / string_clip_prefix(), but I think these should be discussed separately. About the naming: The "i" in str_ibegin and str_iend() seems ok to me. I also strongly support separate functions instead of a parameter for case sensitivity. I also support the underscore. str_begin() is better than strbegin(). ------------------------ Whether to have an "s" at the end: https://marc.info/?l=3Dphp-internals&m=3D147017797404339&w=3D2 (Yasuo Ohgaki) > It might be okay to have "s" in function names, but if we want to be > consistent, > str_replace -> str_replaces > str_ireplace -> str_ireplaces I disagree with this analogy. The "s" in str_begins() would be for "haystack beginS with needle". An "s" in str_replaces() would stand for what? Both "begin" and "replace" are verbs, but they have a different role in the function name. "begin" describes a state or condition we want to verfiy, whereas "replace" is a command we give to the machine. So to me it would make sense to have str_begins() and str_ends() instead of str_begin() and str_end(). To me, str_end() means either "End the string!" (command) or "Give me the end of the string!" (noun). In fact Rowan Collins made the same argument here, https://marc.info/?l=3Dphp-internals&m=3D147017844704431&w=3D2 > I think those names mean something different: "str_begin" sounds like an > imperative "make this string begin with X"; "str_begins" is more of an > assertion "the string begins with X". Ruby would spell it with a ? at > the end. It's also the same form, grammatically, as the common "isFoo". > > Note that this logic holds for "str_replace", which *is* an imperative - > you are not saying "tell me if X replaces Y", you are saying "please > replace X with Y". But then Will talks about consistency again. https://marc.info/?l=3Dphp-internals&m=3D147018700406320&w=3D2 > I think like > having an "s" at the end of the function names reads better, but > omitting the "s" fits better with the existing function names and does > not read bad. Therefore, I am in favor of dropping the "s". Honestly, looking at the existing string functions at http://php.net/manual/en/ref.strings.php I don't see a lot of consistency here. Just a long list of garbled abbreviations. I also don't see any existing function where the verb has a similar role as the "begin" in str_begin(). For all the existing string functions, the verb is a command. I think a better comparison would be file_exists() function_exists() class_exists() is_subclass_of() extension_loaded() ncurses_has_colors() ncurses_can_change_color() What these functions have in common: - The return value is boolean. - The verb is not a command, but it describes a state or condition. The verb is not always at the end of the function name, and it does not always end with -s. But the form and ending of the verb follows its grammatical role in the sentence. I think this is a much better guideline than following a wrong idea of consistency. ------------------------- Finally, I don't know why everything needs to be abbreviated. Having str_* instead of string_* seems ok to me, and is consistent with existing string functions. But my first idea would have been more complete phrases like str_ends_with, str_has_ending(), str_has_suffix(). Instead of just str_end(), or str_ends(). On the other hand, shorter function names have their benefits. So.. no strong opinion here. -------------- -- Andreas On Tue, Aug 1, 2017 at 8:29 AM, Micha=C5=82 Brzuchalski wrote: > Hi Andreas, > > 2017-08-01 6:57 GMT+02:00 Andreas Hennings : >> >> Hello list, >> a quite common use case is that one needs to find out if a string >> $haystack begins or ends with another string $needle. >> Or in other words, if $needle is a prefix or a suffix of $haystack. >> >> One prominent example would be in PSR-4 or PSR-0 class loaders. >> Maybe the use case also occurs when writing parsers.. >> In each of these two examples (parsers, class loaders), we care about >> performance. >> >> (forgive me if this was discussed before, I did not find it anywhere >> in the archives) >> >> -------------------------- >> >> Existing solutions to this problem feel non-trivial, and/or are >> suboptimal in performance. >> >> https://stackoverflow.com/questions/2790899/how-to-check-if-a-string-sta= rts-with-a-specified-string >> >> https://stackoverflow.com/questions/834303/startswith-and-endswith-funct= ions-in-php >> This answer compares different solutions, >> https://stackoverflow.com/a/7168986/246724 >> >> Existing solutions: >> (Let's focus on string_starts_with(), the other case is mostly >> equivalent / symmetric) >> >> if (0 =3D=3D=3D strpos($haystack, $needle)) {..} >> I have often seen this presented as the preferable solution. >> Unfortunately, this searches the entire string, not just the >> beginning. Especially if $haystack is really long, this can be a >> waste. >> E.g. if (0 =3D=3D=3D strpos(file_get_contents('some_source_file.php'), >> '> '> >> if ($needle =3D=3D=3D substr($haystack, 0, strlen($needle))) {..} >> This reserves new memory for the substring, which later needs to be >> garbage-collected. >> Also, this requires an additional function call to strlen() - which >> adds even more clutter if $needle is an expression, not just a >> variable. >> >> if (0 =3D=3D=3D strncmp($haystack, $needle, strlen($needle))) {..} >> Needs the additional call to strlen(). >> Otherwise, this seems like a really good solution. >> >> if ('' =3D=3D=3D $needle || false !=3D=3D strrpos($haystack, $needle, >> -strlen($haystack))) {..} >> This is the funky solution from >> https://stackoverflow.com/a/10473026/246724 >> The author says that it will be outperformed by strncmp() - so.. >> >> if (preg_match('/^' . preg_quote($needle, '/') . '/', $haystack)) {..} >> Clearly gonna be slower than other options. >> >> As said, all these solutions do work, but they are either suboptimal, >> or they add clutter and overhead, or feel a bit like mind acrobatics. >> >> ----------------- >> >> So, I wonder if it would be worthwhile to add new functions >> string_starts_with() / string_has_prefix(), and string_ends_with() / >> string_has_suffix(). >> >> (Or maybe change strncmp(), so that the 3rd parameter $len is >> optional. If $len is NULL / not provided, it would use the length of >> the second (or first?) string. >> (idea was that second parameter =3D needle).) >> >> For me personally, I am sure that I would use a new >> string_starts_with() a lot more often than a lot of the other existing >> string functions. >> I don't think it is an exotic or niche use case. >> >> -------------- >> >> Spinning this further: >> A lot of times if I want to check if $haystack begins with $needle, I >> will then need the rest of the string after $needle. >> So >> if (string_starts_with($haystack, $needle)) { >> $suffix =3D substr($haystack, strlen($needle)); >> } >> or >> if (string_ends_with($filename, '.php')) { >> $basename =3D substr($filename, 0, -4); >> } >> >> I wonder if this could be somehow combined. >> E.g. >> if (FALSE !=3D=3D $basename =3D string_clip_suffix($filename, '.php')) { >> // Do something with $basename. >> } >> >> ------------------ >> >> One flaw of these new functions would be that they are less versatile >> than other string functions. >> They solve this problem, and nothing else. >> On the other hand, this is the point, to avoid unnecessary overhead. >> >> The other problem would be, of course, "feature creep" aka "we have so >> many string functions already". >> This is a matter of opinion. >> I would imagine the "cost" of new native functions is: >> - global namespace pollution >> - increased mental load to learn and remember all of them >> - higher memory footprint of php engine? >> - more C code to maintain >> - a new doc page. >> Did I miss something? >> >> ------------------ >> >> -- Andreas >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php >> > > This idea was discussed 11 months ago https://externals.io/message/94787 > There is also a proper RFC > https://wiki.php.net/rfc/add_str_begin_and_end_functions > You might wanna contact with Will to get feedback from the idea. > > -- > regards / pozdrawiam, > -- > Micha=C5=82 Brzuchalski > about.me/brzuchal > brzuchalski.com