Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124863 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 9B43E1A00B7 for ; Sun, 11 Aug 2024 17:03:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1723395914; bh=+wSP05NnOVJAAzkm++XaJt9G7i4oe5DY6mQV79xc+f8=; h=References:In-Reply-To:From:Date:Subject:To:From; b=DRGG2hJJMZ59hjGyzM5ZM19vna33TuElsJNMDpNEXtGs7F5SKRqKYK+anw0uLyiu0 t+1IndxIbHtdAOmprHZfwJCSb6Yp7Y11KTwZii/2tlvEPs1ls/cAzzOF1H8S4osD1i T2Ddxx6/JRbKNZwScBhjVAzQlqQXIkmyhZo/IUZ1hjGnndyVSTWLRtwvOxHg+7HbDX koBQKLHgl5iCXlJZfHYnz8j3V62X/uJkyEcrpBTgIwEhRaEKXK4EHwTRKQuAwWqZcK JKv4u8n0CNoKUUb0KvDnxNkhaF6T5HsJw2VoAMEAsNUSnVeNRd8NtFnQtEf9wmYEhF DIHCOZy7cRekg== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 4D9BB18006F for ; Sun, 11 Aug 2024 17:05:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 11 Aug 2024 17:05:06 +0000 (UTC) Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-368663d7f80so1896658f8f.3 for ; Sun, 11 Aug 2024 10:03:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723395801; x=1724000601; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+wSP05NnOVJAAzkm++XaJt9G7i4oe5DY6mQV79xc+f8=; b=fXXgJ55sP5jz+M+tar2mO6g4SSdJNC40nBTf10FrXpo0mjXmMvkzYS5ile77tUumAg Z7IKZ4kfX6ylNJCBCckYL8grAK9nmMbD4fo5xdeukJ6CsYlNv8F5o4mwMfPjtQkx15hE kwxFy1dX+0ce3DDUS+Zvm2vJrvsMroZBY+Y3rGa4hOdmbkkgz6EeMcH1HRRqnSheaYjc 6VxgbbX/YZXSFrW5YspSYJQfOWn+zmBEA4W0Y+zHWiSVJ4aERAo0GB7ejvHICGcl9Q92 P3W7oZdK+BHKFu/PK6LBu0zpP3Ac5nSSxQf+TBdQS8ianEz2OI/dOhuEFAbTectScfXe ZJYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723395801; x=1724000601; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+wSP05NnOVJAAzkm++XaJt9G7i4oe5DY6mQV79xc+f8=; b=tcUszXwiP4ng/UPOsfQJYTu5sh/sTVv+BF5KOu2vhGt/6eXLlqCBos5WmIAsL0kZ/w ymN96W3UU7xTqDQZ49P0XVs5oqx4TLrgN9qHZSDuAVogPeLZTtmnU6UG6uxdHB8zYnjb IoL8TEcUMdSewofZ5+4Nddp96wAXJ2RXB5ZlE4S6vWWqHTclEv/BG3DgVpO8cPsilv7L hxfH0PfhuY/n0dDChq2IoNhrMuB5z/ro+FvdVNEqjL2GzWKl7zCCBEmhm859+WCoRyIu i8l9RWpYVd3tgiUf+rY6jQzMPuxwPIuUMAcRl1zQyx2aTUm8s+9wpClO5P063P/VC2KU wn6g== X-Gm-Message-State: AOJu0YwkT21azpkUNlQd1ohqQkijkVkXHeWG2Fk5EeVttBE6LLGN9RiA iL6tpATW7OFi/pnfBUkluclijg58b0CRT0r5JePr9JZiMapCKTvD8/Lu4lozM+oyKqMFrp6sOxr NoM+WFgwyN+11SFiUP5Ex7fITGp5Vj90= X-Google-Smtp-Source: AGHT+IG6efNN5wX716ONIS38n1+I4iMXQDRbDisM+JSk6YGdoP7/fxnQ38EicjXH2/kG8F9ZI0AhkJ7YGx2Eieh9Hic= X-Received: by 2002:a5d:54cb:0:b0:368:4626:1327 with SMTP id ffacd0b85a97d-36d5e1c7a92mr4592427f8f.23.1723395800557; Sun, 11 Aug 2024 10:03:20 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <8a60a5d76bf3bbdda821160c6141b45914a33b98.camel@ageofdream.com> In-Reply-To: Date: Mon, 12 Aug 2024 02:03:08 +0900 Message-ID: Subject: Re: [PHP-DEV][Discussion] Should All String Functions Become Multi-Byte Safe? To: internals@lists.php.net Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) 2024=E5=B9=B48=E6=9C=8812=E6=97=A5(=E6=9C=88) 1:42 Anton Smirnov : > > Hi Nick, > > As a developer who often deals with binary data (like bencode, ipv6 > addresses and my own hacks for multibyte arithmetic) I would prefer that > functions and syntaxes that allow me to work with bytes keep working > with bytes, not characters or code points. So the closest solution would > be separate binary/text strings, but then we have PHP6 all over again. > Maybe this time it might work in some form, who knows. > > On 8/11/24 18:50, Nick Lockheart wrote: > > > > HTML 5 was adopted in 2014, over ten years ago. HTML 5 only supports > > the UTF-8 multi-byte character encoding. > > > > It seems like there's still a lot of string functions that assume that > > a character is a single byte, and these may actually work as expected > > when dealing with Latin characters, but may fail unexpectedly if a > > sequence is more than one byte. > > > > Are there any use cases for PHP where **single-byte** characters are > > the norm? > > > > It seems that if everything on the Internet is multi-byte encoded now, > > then all of the PHP string functions should be multi-byte safe. > > > > > > The WHATWG Encoding Standard: > > > > https://encoding.spec.whatwg.org/ > > > > Also, according to Mozilla, "[The meta charset] attribute declares the > > document's character encoding. If the attribute is present, its value > > must be an ASCII case-insensitive match for the string "utf-8", because > > UTF-8 is the only valid encoding for HTML5 documents." > > > > https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#charset Hi Nick I'm confused what is "multibyte safe". Usually, PHP's string type is binary. https://www.php.net/manual/en/language.types.string.php If you want to use multibyte character, you can use mbstring functions. (Is "multibyte safe" says about mbstring functions?) https://www.php.net/manual/en/book.mbstring.php There is no consistent solution I think, because you have to think a lot about multibyte characters. Regards Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------