Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:100143 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 76624 invoked from network); 1 Aug 2017 06:29:51 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Aug 2017 06:29:51 -0000 Authentication-Results: pb1.pair.com smtp.mail=michal@brzuchalski.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=michal@brzuchalski.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain brzuchalski.com designates 188.165.245.118 as permitted sender) X-PHP-List-Original-Sender: michal@brzuchalski.com X-Host-Fingerprint: 188.165.245.118 ns220893.ip-188-165-245.eu Received: from [188.165.245.118] ([188.165.245.118:42150] helo=poczta.brzuchalski.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id E6/F4-07025-6DF10895 for ; Tue, 01 Aug 2017 02:29:43 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by poczta.brzuchalski.com (Postfix) with ESMTP id 1DA8A29842CF for ; Tue, 1 Aug 2017 08:29:40 +0200 (CEST) Received: from poczta.brzuchalski.com ([127.0.0.1]) by localhost (poczta.brzuchalski.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dzc-4gbGbU3z for ; Tue, 1 Aug 2017 08:29:32 +0200 (CEST) Received: from mail-wm0-f41.google.com (unknown [74.125.82.41]) by poczta.brzuchalski.com (Postfix) with ESMTPSA id 97A8029842CC for ; Tue, 1 Aug 2017 08:29:32 +0200 (CEST) Received: by mail-wm0-f41.google.com with SMTP id t201so5173027wmt.1 for ; Mon, 31 Jul 2017 23:29:32 -0700 (PDT) X-Gm-Message-State: AIVw1100z1b78qTgIoLoy/vD/z5ilp3Ubs0y3sw4OwkqLYiTO65NfUp9 ShfzjupXJQp5BhfQzPhNnAz86yaawA== X-Received: by 10.28.145.205 with SMTP id t196mr561393wmd.107.1501568972333; Mon, 31 Jul 2017 23:29:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.155.194 with HTTP; Mon, 31 Jul 2017 23:29:31 -0700 (PDT) In-Reply-To: References: Date: Tue, 1 Aug 2017 08:29:31 +0200 X-Gmail-Original-Message-ID: Message-ID: To: Andreas Hennings Cc: PHP internals Content-Type: multipart/alternative; boundary="001a1145a91233a91e0555ab452c" Subject: Re: [PHP-DEV] New functions: string_starts_with(), string_ends_with() From: michal@brzuchalski.com (=?UTF-8?Q?Micha=C5=82_Brzuchalski?=) --001a1145a91233a91e0555ab452c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Andreas, 2017-08-01 6:57 GMT+02:00 Andreas Hennings : > Hello list, > a quite common use case is that one needs to find out if a string > $haystack begins or ends with another string $needle. > Or in other words, if $needle is a prefix or a suffix of $haystack. > > One prominent example would be in PSR-4 or PSR-0 class loaders. > Maybe the use case also occurs when writing parsers.. > In each of these two examples (parsers, class loaders), we care about > performance. > > (forgive me if this was discussed before, I did not find it anywhere > in the archives) > > -------------------------- > > Existing solutions to this problem feel non-trivial, and/or are > suboptimal in performance. > https://stackoverflow.com/questions/2790899/how-to- > check-if-a-string-starts-with-a-specified-string > https://stackoverflow.com/questions/834303/startswith- > and-endswith-functions-in-php > This answer compares different solutions, > https://stackoverflow.com/a/7168986/246724 > > Existing solutions: > (Let's focus on string_starts_with(), the other case is mostly > equivalent / symmetric) > > if (0 =3D=3D=3D strpos($haystack, $needle)) {..} > I have often seen this presented as the preferable solution. > Unfortunately, this searches the entire string, not just the > beginning. Especially if $haystack is really long, this can be a > waste. > E.g. if (0 =3D=3D=3D strpos(file_get_contents('some_source_file.php'), > ' ' > if ($needle =3D=3D=3D substr($haystack, 0, strlen($needle))) {..} > This reserves new memory for the substring, which later needs to be > garbage-collected. > Also, this requires an additional function call to strlen() - which > adds even more clutter if $needle is an expression, not just a > variable. > > if (0 =3D=3D=3D strncmp($haystack, $needle, strlen($needle))) {..} > Needs the additional call to strlen(). > Otherwise, this seems like a really good solution. > > if ('' =3D=3D=3D $needle || false !=3D=3D strrpos($haystack, $needle, > -strlen($haystack))) {..} > This is the funky solution from https://stackoverflow.com/a/ > 10473026/246724 > The author says that it will be outperformed by strncmp() - so.. > > if (preg_match('/^' . preg_quote($needle, '/') . '/', $haystack)) {..} > Clearly gonna be slower than other options. > > As said, all these solutions do work, but they are either suboptimal, > or they add clutter and overhead, or feel a bit like mind acrobatics. > > ----------------- > > So, I wonder if it would be worthwhile to add new functions > string_starts_with() / string_has_prefix(), and string_ends_with() / > string_has_suffix(). > > (Or maybe change strncmp(), so that the 3rd parameter $len is > optional. If $len is NULL / not provided, it would use the length of > the second (or first?) string. > (idea was that second parameter =3D needle).) > > For me personally, I am sure that I would use a new > string_starts_with() a lot more often than a lot of the other existing > string functions. > I don't think it is an exotic or niche use case. > > -------------- > > Spinning this further: > A lot of times if I want to check if $haystack begins with $needle, I > will then need the rest of the string after $needle. > So > if (string_starts_with($haystack, $needle)) { > $suffix =3D substr($haystack, strlen($needle)); > } > or > if (string_ends_with($filename, '.php')) { > $basename =3D substr($filename, 0, -4); > } > > I wonder if this could be somehow combined. > E.g. > if (FALSE !=3D=3D $basename =3D string_clip_suffix($filename, '.php')) { > // Do something with $basename. > } > > ------------------ > > One flaw of these new functions would be that they are less versatile > than other string functions. > They solve this problem, and nothing else. > On the other hand, this is the point, to avoid unnecessary overhead. > > The other problem would be, of course, "feature creep" aka "we have so > many string functions already". > This is a matter of opinion. > I would imagine the "cost" of new native functions is: > - global namespace pollution > - increased mental load to learn and remember all of them > - higher memory footprint of php engine? > - more C code to maintain > - a new doc page. > Did I miss something? > > ------------------ > > -- Andreas > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > This idea was discussed 11 months ago https://externals.io/message/94787 There is also a proper RFC https://wiki.php.net/rfc/add_str_begin_and_end_functions You might wanna contact with Will to get feedback from the idea. --=20 regards / pozdrawiam, -- Micha=C5=82 Brzuchalski about.me/brzuchal brzuchalski.com --001a1145a91233a91e0555ab452c--