Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124881 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 031C21A00B7 for ; Mon, 12 Aug 2024 06:54:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1723445748; bh=HH2ijfSQ57S6lJduLEVFDYIvgwuLjS9W9EyCVUODL+w=; h=Date:From:To:Subject:In-Reply-To:References:From; b=KTK+gufe23SgpaUtMjKVdACNAB8Z8/e7xEg7ogsUKTUWy8a8B1g2c04bIyoDDySwt rOdwHfS6SZZKLZFHD7R2gDZsxbpEkELvZ3mVJ3yeOisO5Sra3tu7gMYMWHYPRprUNS 3bC/M1ptE3fwMxDrQi07pedh/Qm3MSkqVF2ZdJJzH6YT2BfBMBcecraHA81CTZ8b9C BFFl6iU/IkAe5kl3gZtVEYtCWPjB0C0rhGr6IwscM6i7cgvdgRDo7r93JiBqAwGxj/ ELw0aeZJFPytefxp5EBApp+AKD+hInAsHXf8Cpn0kIClMytyg9al/J9+sb6E020pQ4 tJSs1JgfZCFzA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 86BC718005B for ; Mon, 12 Aug 2024 06:55:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,RCVD_IN_DNSWL_LOW, SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from fout2-smtp.messagingengine.com (fout2-smtp.messagingengine.com [103.168.172.145]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 12 Aug 2024 06:55:46 +0000 (UTC) Received: from phl-compute-03.internal (phl-compute-03.nyi.internal [10.202.2.43]) by mailfout.nyi.internal (Postfix) with ESMTP id A41A8138FCEF for ; Mon, 12 Aug 2024 02:54:01 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Mon, 12 Aug 2024 02:54:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rwec.co.uk; h=cc :content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1723445641; x=1723532041; bh=HH2ijfSQ57S6lJduLEVFDYIvgwuLjS9W9EyCVUODL+w=; b= cPA85lm8o+lW5WOLiw6r3jYq93r50v1T//mo6isLy/ShMrgqFBfKbAzkP/mXhAXS Vc8fLQEx6JWZxSaILlJ5qs02FLnLR8caCLcz9T9+gYvNlPaNr0N2I4f6LJdFXQGZ ZzVLZHMk8LTwvNiI5RZ54eI+2L9rlgh/fQtioLhBfwsie6gZz4gGWJFWeALw43pB cvO/zTWzN9otsr4apuRZmnzsQ2EFdGLXaxW0dFC12LPIpUdz5OQpk3a0vrHvdMKt 3LWRiry5GX6gXBVuRcYcIX/uN18IuzeyKpYIYuX+vpfTp96zJFIHsLvmTiZ1bNht fwcJDHm3y9ZO3Xd+l0Ngdw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1723445641; x= 1723532041; bh=HH2ijfSQ57S6lJduLEVFDYIvgwuLjS9W9EyCVUODL+w=; b=k FQApkjrnxi5U46cBmTYOilCuBbcGgMaNoYhcFpnbjLTQ2FY9YBN4M/3GV9fJX6uz juCQQpoxS0jdccdBSzxFRN/nibRzNvftKG6PSODGkVBLXGPwFvMujpC3kHlq4Ak4 kX7yiOhVDWqqkcLQS/9/aWxp+zp8waBmn8rVpHGQBK1tuiDwbecFPKm4KXaqvKPV rBwB+vGUBvLrffc9s2PKlzdWwtimjS64RkqPElkLQB1dECh42+Bzida2McXRD0Hy M9o0FUol7REe5/KaqhUw68+8K/0tRZiDbq0FJ47Bd/i5ay1CclwUGBze7jXpmdRo NtObtPyW8oCE+kvwJM0NA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrleelgdduudegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuf gfjghfkfggtgfgsehtqhhmtddtreejnecuhfhrohhmpedftfhofigrnhcuvfhomhhmihhn shculgfkoffuohfrngdfuceoihhmshhophdrphhhphesrhifvggtrdgtohdruhhkqeenuc ggtffrrghtthgvrhhnpeehleffteeigfevudetfedugedtudevledugeeugeelheeihfeh gfdtkeevvefgleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpehimhhsohhprdhphhhpsehrfigvtgdrtghordhukhdpnhgspghrtghpthhtohep uddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepihhnthgvrhhnrghlsheslhhish htshdrphhhphdrnhgvth X-ME-Proxy: Feedback-ID: id5114917:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Mon, 12 Aug 2024 02:54:00 -0400 (EDT) Date: Mon, 12 Aug 2024 07:53:56 +0100 To: internals@lists.php.net Subject: =?US-ASCII?Q?Re=3A_=5BPHP-DEV=5D=5BDiscussion=5D_Should_All_S?= =?US-ASCII?Q?tring_Functions_Become_Multi-Byte_Safe=3F?= User-Agent: K-9 Mail for Android In-Reply-To: <8a60a5d76bf3bbdda821160c6141b45914a33b98.camel@ageofdream.com> References: <8a60a5d76bf3bbdda821160c6141b45914a33b98.camel@ageofdream.com> Message-ID: <47D63911-3C48-4514-9296-F1CAAC9597B9@rwec.co.uk> Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: imsop.php@rwec.co.uk ("Rowan Tommins [IMSoP]") On 11 August 2024 16:50:52 BST, Nick Lockheart wr= ote: >It seems that if everything on the Internet is multi-byte encoded now, >then all of the PHP string functions should be multi-byte safe=2E The phrase "multibyte safe" may have made sense about 30 years ago, when i= t was thought that a "universal character set" could just be a "wide ASCII"= , encoding a straightforward list of characters, just more of them=2E=20 Modern Unicode is so much more than that, because the world's writing syst= ems don't all work the same way=2E Should strlen() measure bytes, code poin= ts, or graphemes? Should strtoupper() accept a locale, so it can handle cas= es like Turkish "dotless i" where "I" is not the uppercase of "i"? And so o= n, and so on=2E I've seen plenty of languages boast that they are "Unicode aware" but few = actually engaging with the question of what that actually means=2E Often th= ey equate "character" with "code point" and stop there, which leads to resu= lts that are just as useless to most of the world as if they'd equated it w= ith "byte"=2E Regards, Rowan Tommins [IMSoP]