Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:108815 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 36589 invoked from network); 2 Mar 2020 21:41:52 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 2 Mar 2020 21:41:52 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 281441804DF for ; Mon, 2 Mar 2020 12:00:51 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 2 Mar 2020 12:00:50 -0800 (PST) Received: by mail-lf1-f44.google.com with SMTP id s23so571166lfs.10 for ; Mon, 02 Mar 2020 12:00:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0PjOARpiNbdqWk5hF5dr0p3YmEYoRSariKRXa6XK+C4=; b=lq821a/5K0jvHmMFbOVJI9mXHOCtMNiFeUdGQtGrnwt6vPrFpX2G1/kaOVxQIulibK OT6/JtRSAK7PYu5XDPfMqw5zs2uhamnC52ddn1Xop/PPI2jtmnD/wL2227nOHZOHoUjZ +JSimicOtJxgoFhIQSDFnAMkDtT+x8pN5D2ULAtIAbRay/fxlWb8hsI/yrTlUSk9dtxs 74PavDTWSsxwQqMObsLNq57xMTFbffI5pCBTB4QhgYGHloZrXKme2skbd0UKcV1aLkyg Bx+/gbJ//MQNFIyGt43+gykoTpfJYjsv7OpxBra9qKGB8SH27s7PNsn2PEV3f1DCZd0p C72w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0PjOARpiNbdqWk5hF5dr0p3YmEYoRSariKRXa6XK+C4=; b=WVu9T9J2SH86yndX1rxTdt0CW1tuFk9I9kktnyEgjc9aTBAM4aFZRu7WT8YpzlyhSz p94jMUPkkDzD2g/g2HCbrVvj6vtUrCvsAuk4wz2I0cLRNcgNqe3xWhBhieiCr9QUiIJz tR0DOgLoJ0BJhOg8uUqSkf+5Q9SIi+VbCQKHTHrSqKdWak2402o+Dcs27uDLIXkMHN2u SqIO6dnKcerPMdoRgBVqANlb/AQ16hm7LzBcG1nILXlDt8jjFWfD3IWN/DqOfnk9HMlp EUTYEsTsAbDw16bVNPIaLzQWIC6HMeVhSiWa8kGL3chYpdJghcnR5a3CAhGR5e9d5ox/ Tb2g== X-Gm-Message-State: ANhLgQ35FfRpli+CFDqIsamAvZUQgtRl1tjZ9a053VXERs4Nx5q18My/ iu+GcLQ7dWkAcFPetOjt/yVohq4lJOM3/2DIwj0= X-Google-Smtp-Source: ADFU+vurNv1cT8uWDr+TJSBBscro05ML8AsdZG45je5td/dx7cOE8dK18RA0uozlJ3C3gdZuoGatAyOvWSqH04XAuIQ= X-Received: by 2002:ac2:596d:: with SMTP id h13mr459792lfp.190.1583179249016; Mon, 02 Mar 2020 12:00:49 -0800 (PST) MIME-Version: 1.0 References: <5e5d5a0f.1c69fb81.8bdd0.93cdSMTPIN_ADDED_MISSING@mx.google.com> In-Reply-To: <5e5d5a0f.1c69fb81.8bdd0.93cdSMTPIN_ADDED_MISSING@mx.google.com> Date: Mon, 2 Mar 2020 21:00:32 +0100 Message-ID: To: Andrea Faulds Cc: PHP internals Content-Type: multipart/alternative; boundary="000000000000c0e331059fe4a4f9" Subject: Re: [PHP-DEV] Proposal for a new basic function: str_contains From: nikita.ppv@gmail.com (Nikita Popov) --000000000000c0e331059fe4a4f9 Content-Type: text/plain; charset="UTF-8" On Mon, Mar 2, 2020 at 8:10 PM Andrea Faulds wrote: > Hi, > > Philipp Tanlak wrote: > > I like to elaborate on Nikitas response: I don't think a mb_str_contains > is > > necessary, because the proposed function does not behave differently, if > > the input strings are multibyte strings. > > This is not true for all character encodings. For UTF-8 it is correct, > but consider for example the Japanese encoding Shift_JIS, where the > second byte of a multi-byte character can be a valid first byte of a > single-byte character. str_contains() would have incorrect behaviour for > this case. > That's of course true, but I consider it ultimately unimportant. Accepting non UTF-8 encodings for anything other than mb_convert_encoding() is just another failure of the mbstring API. The mb_strpos() function literally works by converting the given string to UTF-8 and then calling the normal strpos() on it, after sprinkling in some nice O(n) offset to byte offset (and reverse) conversions. You are generally much better off canonicalizing everything to UTF-8 for internal processing and using normal str* functions. But well, that's a different discussion... Regards, Nikita --000000000000c0e331059fe4a4f9--