Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124886 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 9E31A1ADC69 for ; Mon, 12 Aug 2024 10:34:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1723458949; bh=BnfGccbn6R/4ZSoxzT5Kzhnz8x6fkjxiI9KoK7fYPXY=; h=References:In-Reply-To:From:Date:Subject:To:From; b=c3Sk0SWOM9OxATuWV10oIeCOBltSu3dB06UwuY8YwGYhQO+VjjSq/W/IfuILp0O3a hkI6Wn/iYDdcp7ylxYHaszWm1/kYoCeWDDtfAdeufcB8yiemXXQYVglA/zZgKxim7i A9+7T/fY7uAa4pCAbLzIMUmIS13H5SwvjjiS1ZTpixll2IhdNiCe3M+LR8jZcnnbR9 c+vIqf80foZb5SBLvqiKAppon7+ABpWu3Kggs86FzFLYi6Ylw5R7y+erYSmRe4j1rk NJzQTlELHQm/c8dMwaPY4cHGbFO4YumVxoxV1rT7cD6f6vD5hgOARzMvVPjclBR6HW X0gCUaxODR+hw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 41E4418007D for ; Mon, 12 Aug 2024 10:35:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 12 Aug 2024 10:35:45 +0000 (UTC) Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-428141be2ddso31087975e9.2 for ; Mon, 12 Aug 2024 03:34:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723458839; x=1724063639; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BnfGccbn6R/4ZSoxzT5Kzhnz8x6fkjxiI9KoK7fYPXY=; b=iTRU7N/4sCx1yL/y8yXCh25B6+4FRh4bLNxPWs/qBhjimLICSSLQq6wCCD+bApVeNp IUMk92T55F6IjCkIbyp79SZMBUIpOt1QZ/A7WEL0t4VV4B9ome0gFFrKDmlGxZIAJ+wh azFmyfs7wRAufKDwSKCNmSbDoc//s6gh2Llz9Pc2nYZEL/YTtHJzlKx4iCnpsLMnD7EO KqJujDKSCLmgWL4Rw03zt4gbhODVsxUblawjXweA9kMLZC64nabWjgbqnFEu6ZvCIMVW cBjAMJIbewHglAD5pkqkr8Rx6jm6XAF2rIoqVVLLCeFNE0HJ7AVyWiJktX1HSap68dfx CfDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723458839; x=1724063639; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BnfGccbn6R/4ZSoxzT5Kzhnz8x6fkjxiI9KoK7fYPXY=; b=BPRyQ7Sa+fbC44tY7SEocGGWuSeTHwA4ZByZNbp0T24P7PtpomIvJRkdJxj/Ji/YGm vkWXtYuUV6DmKW3FBCX1IMKknN0JIF+/BgRsJX9nOVMcrqCNd36iua66/Y0cJorwLOBv rDagwGbCkcwT5Ar+EPcOV4MNDx13tHaiImgIpEy6B9wAu641glXdR9gf1TWqYA35TbB9 1c1bpknRXcq33YDR92xBe1S0AKRha3CpFS5QCKZLGDAdzrS+eO3gjYBTHNjpHq88n1jM Wh2E9RhLeLnoL60amT195el8Qu1l3gPS9zLaCublBihZ1x/EFVAu9nvVTd1L2+IZQKJI 7vqw== X-Gm-Message-State: AOJu0Yy3F54oyqia1T/JLMgnXBFpMtn4fSg7AzGZi9Ct2eiX/r1TLoUt zBgQXrJFek6O8yrBOTr1w+DRIBk3pIxq+7kDPOm8wG/5qHbbdA5U1N5VPJ5fkpQiJxyj8tMXdsN 6SeytJLUdDRQSVx/Ga27HBfjS814d018= X-Google-Smtp-Source: AGHT+IFl4ou2IETyUnVfwKNTvQqkGhX2IXgC7aYFEFBMLN3l/ANMlAL3AlQEPKr411jZ9qWTxd3XvAIt8n4THaRvn4Y= X-Received: by 2002:a05:600c:310a:b0:427:d8f1:e243 with SMTP id 5b1f17b1804b1-429c3a6098emr59217415e9.37.1723458838745; Mon, 12 Aug 2024 03:33:58 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <8a60a5d76bf3bbdda821160c6141b45914a33b98.camel@ageofdream.com> <47D63911-3C48-4514-9296-F1CAAC9597B9@rwec.co.uk> <20240812095038.D97AC1A00BD@qa.php.net> In-Reply-To: <20240812095038.D97AC1A00BD@qa.php.net> Date: Mon, 12 Aug 2024 19:33:47 +0900 Message-ID: Subject: Re: [PHP-DEV][Discussion] Should All String Functions Become Multi-Byte Safe? To: internals@lists.php.net Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) 2024=E5=B9=B48=E6=9C=8812=E6=97=A5(=E6=9C=88) 18:54 Daniel Haber : > > On 8/12/2024 9:53 AM, Rowan Tommins [IMSoP] wrote: > > > > > > On 11 August 2024 16:50:52 BST, Nick Lockheart w= rote: > >> It seems that if everything on the Internet is multi-byte encoded now, > >> then all of the PHP string functions should be multi-byte safe. > > > > The phrase "multibyte safe" may have made sense about 30 years ago, whe= n it was thought that a "universal character set" could just be a "wide ASC= II", encoding a straightforward list of characters, just more of them. > > > > Modern Unicode is so much more than that, because the world's writing s= ystems don't all work the same way. Should strlen() measure bytes, code poi= nts, or graphemes? Should strtoupper() accept a locale, so it can handle ca= ses like Turkish "dotless i" where "I" is not the uppercase of "i"? And so = on, and so on. > > > > I've seen plenty of languages boast that they are "Unicode aware" but f= ew actually engaging with the question of what that actually means. Often t= hey equate "character" with "code point" and stop there, which leads to res= ults that are just as useless to most of the world as if they'd equated it = with "byte". > > > > Regards, > > Rowan Tommins > > [IMSoP] > > Feels appropriate to link to this: > "The Absolute Minimum Every Software Developer Must Know About Unicode > in 2023 (Still No Excuses!)" > https://tonsky.me/blog/unicode/ Hi, there > Feels appropriate to link to this: > "The Absolute Minimum Every Software Developer Must Know About Unicode > in 2023 (Still No Excuses!)" > https://tonsky.me/blog/unicode/ I think it's the same as the quoted site. However, In programming, there are times when you want to operate on bytes, code points, or grapheme clusters. UTF-8 can't solve everything, what to program is important for programmers (byte programming, character programming etc). Also, other character encodings are also important in mainly CJK. Character set has a lot of consider of many things. Regards Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------