Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:108821 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 67222 invoked from network); 3 Mar 2020 15:10:16 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 3 Mar 2020 15:10:16 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 33A291804DF for ; Tue, 3 Mar 2020 05:29:26 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-ot1-f49.google.com (mail-ot1-f49.google.com [209.85.210.49]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 3 Mar 2020 05:29:25 -0800 (PST) Received: by mail-ot1-f49.google.com with SMTP id b3so2975629otp.4 for ; Tue, 03 Mar 2020 05:29:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=/Dw2moTCCGNxvnqw7wSAqixMeTP7s+Os8pYTVzbJBII=; b=u9uOVyk4um9r2DdwnLz8jJ/W7DCKz09ja5hhGhzZ3ZoZm1XApfcMUGDnd9yaW0gS4l EiigkUt8w+NZ3PTymWnRQWiq3xnSMqsr8ER6zHseHw3DT8sA0hcAdPnWF8e5v0/h5Gvf bSxugxCPuYCQjgA3B3swHIOLfpH3tO4CUgLfOKpLlXRZesIoFiFNwcrjlkT2hUtahEVn 93/9mFEK5vmwNNpig5Ilf8XHHzMIYEJWsjVtW4mv1AXKI3XCiVDtSek/KDIw/7w76s2D vCnEThvsZ/61d2c9Pb0AqC+4R6LmaV7wbt0SLMDKYZ9dFGpjrCIaZ5FUm0OJBu6AiXWx JDFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=/Dw2moTCCGNxvnqw7wSAqixMeTP7s+Os8pYTVzbJBII=; b=ImoFnwSxX6+wthzdgtX58vhh8M7mSD7eNevdrKBQUFk+OQeQj2UoX+FnFGyY/Tsa0I 0HjWee//wogfSfhYrdlQp+vFDqTFbU5HM4xYRSodwrLe92iMuBwtKEKZZpmB84yOX7X4 8zlxZszA5YYuHP7SIHaHwP1lW7cxyNYwZ0ZAhSlAD1lPUegC/lA6QNgCAXYq1G5geMLY g4jKOcKXxw7eaB5lKO0IqNPCaY1z/EY2Mj/NSLqex3tBQ55xryB8E/vd5ByRfBwbSwAq b96VW22SRCEpVGOEGe1welBJd/oTPDoSUGAPENabNF4MHM43VYk/UYyarXAtfh/tGiRz TTgQ== X-Gm-Message-State: ANhLgQ3a0XIA+7OTUas05AS4uH21B5diM8qqy43ecmculI49pCknr/go 2U2rKHRS2hi7uB6LYth7qy3kHadRn7NcwnmWSLlzx27V X-Google-Smtp-Source: ADFU+vsuYstBZnJ2oZFshR4tMogscfwk3oqPftHW1CTg79S9LX5HmzaUncSdNnxdhxMXyTF1XqaqQ8KlhrqOPUbLYJ8= X-Received: by 2002:a05:6830:1198:: with SMTP id u24mr3271179otq.215.1583242162411; Tue, 03 Mar 2020 05:29:22 -0800 (PST) MIME-Version: 1.0 References: <704fa268-6194-0ec2-d6b0-8f5efdf1009f@heigl.org> In-Reply-To: Date: Tue, 3 Mar 2020 14:29:07 +0100 Message-ID: To: PHP internals Content-Type: multipart/alternative; boundary="000000000000af1d9a059ff34a10" Subject: Re: [PHP-DEV] Proposal for a new basic function: str_contains From: nicolas.grekas+php@gmail.com (Nicolas Grekas) --000000000000af1d9a059ff34a10 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Le mar. 3 mars 2020 =C3=A0 11:04, Rowan Tommins a =C3=A9crit : > On Tue, 3 Mar 2020 at 08:46, Andreas Heigl wrote: > > > > > While it is mainly aimed at being a mere convenience-function that coul= d > > also be easily implemented in userland it misses one main thing IMO whe= n > > handling unicode-strings: Normalization. > > > > > > While I would love to see more functionality for handling Unicode which > didn't treat it as just another character set, I don't think sprinkling i= t > into the main string functions of the language would be the right approac= h. > Even if we changed all the existing functions to be "Unicode-aware", as w= as > planned for PHP 6, the resulting API would not handle all cases correctly= . > > In this case, a Unicode-based string API ought to provide at least two > variants of "contains", as options or separate functions: > > - a version which matches on code point, for answering queries like "does > this string contain right-to-left override characters?" > - at least one form of normalization, but probably several > > If there was serious work on a new string API in progress, a freeze on > additions to the current API would make sense; but right now, the > byte-based string API is what we have, and I think this function is a > sensible addition to it. > FYI, I wrote a String handling lib, shipped as Symfony String: - doc: https://symfony.com/doc/current/components/string.html - src: https://github.com/symfony/string TL;DR, it provides 3 classes of value objects, dealing with bytes, code points and grapheme cluster (~=3D normalized unicode) It makes no sense to have `str_contains()` or any global function able to deal with Unicode normalization *unless* the PHP string values embed their unit system (one of: bytes, codepoints or graphemes). With this rationale, I agree with Rowan: PHP's native string functions deal with bytes. So should str_contains(). Other unit systems can be implemented in userland (until PHP implements something similar to Symfony String in core - but that's another topic.) Nicolas --000000000000af1d9a059ff34a10--