Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124854 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 31CA31A00B7 for ; Sun, 11 Aug 2024 15:50:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1723391559; bh=qnIx3pJGwU7P3AU7ylj+/OJ8xz4U0AKnxQ87+92i9Zc=; h=Subject:From:To:Date:From; b=ODtAZsfs7CJRWoErYwk5s09HwgykrvxOdM+JZ8mxIKq9Y0TXlj1uy3ZIfsAG8Ob8U Yj4CVpKe6lWVTKVFOPs7ThdPP+uHsz1Ji+OodUon5F2je0KXKtJykaSlgFUpoZY/1Q bz04GQ5wXUevkr4b7UzzY3ioVUXgnvykEV+CYCmOZa5j/ZsKb4odzRahkAV91OKmMz RiTdddT5rZY4EkR7qd0iRzlboaxN0+IJeDGNOGoLImQ6HtJGUrTiRkwBwiPdR0YQr0 i+sy4UejAAscRYUtum3eRsE9gDgb7uXcXyDNp5XJBjTLXEVrGJ4B4nJ3o1mMoaQmqD 5+PsWJBaNokPA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 9C94C180069 for ; Sun, 11 Aug 2024 15:52:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,SPF_HELO_PASS, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from ageofdream.com (ageofdream.com [45.33.21.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 11 Aug 2024 15:52:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ageofdream.com; s=ageofdream; t=1723391452; bh=qnIx3pJGwU7P3AU7ylj+/OJ8xz4U0AKnxQ87+92i9Zc=; h=Subject:From:To:Date:From; b=FRlW+KcTlq7wuvgfrABGVEimMHPcxg2IX3L31Y52vqZeF29LX5gXE3f8xraQkoUmr rAsib6iaRsHbax00iLBNjgP/j041on3JmGFWalFN6rmKsDFGGkOub+oB9l20V7995h qcMV/h7A/3FwZKzg5wLLQozDidXGlR7W/QZsoFV540V2mz6VGAwNb7XGDWLCxeWYwU pYBADR3KhGpg3PvTVbdF0f1DNOyy6Wj8p0xFCZm7aDe1IazTiivhbdXdWAHd8P4EKk VsJzqFTkS64PawqoyF4k2HgvdArqdrqGcq4/asWr4THFE1X0InL4fg6Z4PxTIM0fqF 0oY5hzPz9Pf7w== Received: from [192.168.1.7] (unknown [72.255.193.122]) by ageofdream.com (Postfix) with ESMTPSA id A75F425090 for ; Sun, 11 Aug 2024 11:50:52 -0400 (EDT) Message-ID: <8a60a5d76bf3bbdda821160c6141b45914a33b98.camel@ageofdream.com> Subject: [PHP-DEV][Discussion] Should All String Functions Become Multi-Byte Safe? To: internals@lists.php.net Date: Sun, 11 Aug 2024 11:50:52 -0400 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.4-2 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 From: lists@ageofdream.com (Nick Lockheart) HTML 5 was adopted in 2014, over ten years ago. HTML 5 only supports the UTF-8 multi-byte character encoding. It seems like there's still a lot of string functions that assume that a character is a single byte, and these may actually work as expected when dealing with Latin characters, but may fail unexpectedly if a sequence is more than one byte. Are there any use cases for PHP where **single-byte** characters are the norm? It seems that if everything on the Internet is multi-byte encoded now, then all of the PHP string functions should be multi-byte safe. The WHATWG Encoding Standard: https://encoding.spec.whatwg.org/ Also, according to Mozilla, "[The meta charset] attribute declares the document's character encoding. If the attribute is present, its value must be an ASCII case-insensitive match for the string "utf-8", because UTF-8 is the only valid encoding for HTML5 documents." https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#charset