Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124880 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 0AE851A00B7 for ; Mon, 12 Aug 2024 03:30:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1723433527; bh=9TOJTmu3QKAMVq5jMG9fW/QQgZsMmzsHjb+atqIn66Y=; h=Date:Subject:To:References:From:In-Reply-To:From; b=HwH8VEKzunNTCl3SpjOl/7ezp6G/yDVPI5C3FMBn8GA46obj/ccLBuWwv88gbOlW9 KkTIb6OJT0Y5Kqlhur00IOQ3DUwfO7wrvbOrFrf+h7xcNNv02Uja1iwB2MCLlMjG6p g3VChFFIIA9jfQN5QcvqjgfZcujzA0+OQ2R2x1XE6IcMg0gr6VSYlBbJuAMVIvJAiM UEgSa5EmMqC88VgUA4VoE6BX9Wwg2k/UkPyc3Ml/El1japZBzFKtAymA3wKbRYz3/P zByKJR9ZU9HZ6bQ/S5PAq6w0KRUIjlpaN2HGFwftjUKjt0V7bXZXz9Ds8Bf+IetcIO gXW5AkVRol4lw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 9C22E18005B for ; Mon, 12 Aug 2024 03:32:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-108-mta37.mxroute.com (mail-108-mta37.mxroute.com [136.175.108.37]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 12 Aug 2024 03:32:05 +0000 (UTC) Received: from filter006.mxroute.com ([136.175.111.3] filter006.mxroute.com) (Authenticated sender: mN4UYu2MZsgR) by mail-108-mta37.mxroute.com (ZoneMTA) with ESMTPSA id 19144a2fb620000a78.001 for (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384); Mon, 12 Aug 2024 03:30:17 +0000 X-Zone-Loop: 96b160ea5ce62249179e808d2fc00a29a204d01f7da4 X-Originating-IP: [136.175.111.3] DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sandfox.me; s=x; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:From:References:To: Subject:MIME-Version:Date:Message-ID:Sender:Reply-To:Cc:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=/3cmgouyx/z+j0q/ymrLKA1YodDtAm+yJq84J6C4QfA=; b=GRz2EQ4Re8I10Q/AfmPzaj/6Dp tAY40+PpGBwbaV/YeilOmmT0hRdFxjfeQiVK3AqRKKgcD0VZtybeL9AqG/9HzkGfbd61ZTLQtvJi9 9WW/WowvtNivoa2xZn6gkqpIvCPtz1u1JUQDAVultB1hHJSoUUwPBo5U6Lp6ca2s56DNWddUKPxVn D4LiCAVtZ6szGj8JM2Bne/f26xz8DCFVYnxu+foPbLm1GUsZznkURmR+g2CXXbc6tCprkUaewCgdo nxsPWJZmBFLa+bi0+B2jfvc2/BVwIHyOgRRCI/8aOCfhWwOSPqiQTSL3kaOWZ2q1JtHtEDP+EOHat P8yxLsbQ==; Message-ID: <9aa10400-c967-47c7-8f23-592ac96554bd@sandfox.me> Date: Mon, 12 Aug 2024 06:30:14 +0300 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PHP-DEV][Discussion] Should All String Functions Become Multi-Byte Safe? To: internals@lists.php.net References: <8a60a5d76bf3bbdda821160c6141b45914a33b98.camel@ageofdream.com> <410a8188-06bf-439f-bdab-c47e73d1db70@app.fastmail.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated-Id: sandfox@sandfox.me From: sandfox@sandfox.me (Anton Smirnov) On 12/08/2024 00:36, Nick Lockheart wrote: > So what I would propose is: > > (1) All string functions should state in the official man page if they > are safe for UTF-8 or not. Reasonable but see below > (2) Functions intended for working with text should be made UTF-8 safe. Define precisely UTF-8 safe. Also, what about BC breaks here? > (3) Functions intended for processing binary should be added if > necessary, and should be named something like "binary" or "byte". That would require renaming and deprecating most of the standard string library, I guess no one would agree to that. But generally they are already named differently, str* are binary, mb_* and grapheme_* are text-oriented