Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:112918 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 80560 invoked from network); 18 Jan 2021 11:31:43 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 18 Jan 2021 11:31:43 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 5B5751804A7 for ; Mon, 18 Jan 2021 03:11:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: *** X-Spam-Status: No, score=3.4 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DOS_OUTLOOK_TO_MX, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail611b.mxthunder.net (mail611b.mxthunder.net [209.41.68.219]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 18 Jan 2021 03:11:06 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 adsar.co.uk 10IB92GN003399 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adsar.co.uk; s=default; t=1610968142; bh=ICEM5J64UHjne0Nlq1ByTS5hQDxCYQqkZy1ukdI3JIA=; h=From:To:Cc:References:In-Reply-To:Subject:Date:From; b=4uQFx744yKQjDpkkHUbdeNQfFDv4k6SDaKb2MZmXtrEwBKDxc4YEDXf0sC0GBCusu sWjmeUUx8xtqgvX4kQcZ75AEDtSvGBFjZgzyINHYwoQaHifNgCi7k7ys6VpbRaXJX2 Gv7fTad1oajEtQgc5LiMI5KoMGRumF0wrXjgiJew= To: "'Nikita Popov'" Cc: "'PHP internals'" References: <046401d6e999$f4de5d50$de9b17f0$@adsar.co.uk> In-Reply-To: Date: Mon, 18 Jan 2021 11:11:02 -0000 Message-ID: <025001d6ed8a$9c059690$d410c3b0$@adsar.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQNPZU8Fz6dY65dlmkI0xPDrgZqK+QK5nd95pyZ8FjA= Content-Language: en-gb X-Antivirus: Avast (VPS 210118-0, 18/01/2021), Outbound message X-Antivirus-Status: Clean X-FILTER-ID: C2gwfZ X-MXT-FILTER-ID: aC2gwfZ_500b_111b Subject: RE: [PHP-DEV] Addition of substring and subistring functions. From: adam@adsar.co.uk ("Adam Cable") From: Nikita Popov =20 Sent: 18 January 2021 10:32 To: Adam Cable Cc: PHP internals Subject: Re: [PHP-DEV] Addition of substring and subistring functions. On Wed, Jan 13, 2021 at 11:51 AM Adam Cable = wrote: Hi internals. I've been coding in PHP for 15 years now, and spend most days using it = to transform content into meaningful data. Most of this is pulling out prices and attributes of certain products = from HTML, for example, grabbing the price from content such as "Total price including delivery: £15.00". To grab the "15.00" from the string can take quite a few lines of PHP = and can be pretty cumbersome. I've built some helper functions - substring, and it's case-insensitive variant subistring to help. Functions take in the string plus a to and from string, and return a = trim'd string found between the two. So substring("Total price including delivery: £15.00", "£", = "") would return "15.00". >From and to strings are optional and therefore return from the = beginning or to the end. In the past I hadn't thought about adding this to PHP core, but with the introduction of str_starts/ends_with functions in PHP 8.0 I thought it = may be useful to include the sub(i)string building blocks too. Implementation and tests can be found @ https://github.com/php/php-src/pull/6602 I'm sure the C implementation can be made a lot better, but it seems to = work OK at present. This is my first e-mail to internals, so please excuse my naivety with things, but hope this is useful. Thanks, Adam Hi Adam, A few thoughts: 1. The name of the function is not clear. It's not obvious what the = difference between substr() and substring() is. To make it worse, = JavaScript has both substr() and substring(), but the meaning of = substring() there is a different one from what you propose. I think a = better name would be something like str_between(). 2. Why does this perform an implicit trim() call? I understand that this = may be useful in some cases, but it will also limit applicability of the = function. It's easy to write trim(substring(...)), but if the trim() is = part of the call, there's no way to avoid it. 3. More generally, I feel that this API is a bit too specific for = inclusion in the standard library. It can certainly be useful, but I = don't think it's anywhere near as ubiquitous as operations like = str_ends_with(). For complex string matching tasks, I would probably = pick preg_match() over a combination of str* functions anyway. Regards, Nikita Hi Nikita, Thanks for the reply, appreciated. Answers below: 1 - str_between() makes much more sense to align with similar PHP = function names 2 - We have never had a time when excess whitespace was wanted - but it = doesn't need to trim 3 - That's fine, just thought I'd propose it as I use this nearly every = day, and helps to keep things simple when regex might be a bit overkill Happy to leave this for now - but again, appreciate the reply. Best, Adam