Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117104 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 18665 invoked from network); 21 Feb 2022 15:10:43 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 21 Feb 2022 15:10:43 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 83E5F18053C for ; Mon, 21 Feb 2022 08:29:58 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS8468 78.32.0.0/15 X-Spam-Virus: No X-Envelope-From: Received: from mint.phcomp.co.uk (freshmint.phcomp.co.uk [78.32.209.33]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 21 Feb 2022 08:29:57 -0800 (PST) Received: from addw by mint.phcomp.co.uk with local (Exim 4.92) (envelope-from ) id 1nMBZR-0002dT-M0 for internals@lists.php.net; Mon, 21 Feb 2022 16:29:53 +0000 Date: Mon, 21 Feb 2022 16:29:53 +0000 To: PHP Internals Message-ID: <20220221162953.GA26851@phcomp.co.uk> Mail-Followup-To: PHP Internals References: <22242169-a16d-5261-696c-3cf00b00336a@gmail.com> <93e83a99-8f03-b823-1b4b-a10519d41dd7@gmail.com> <64095373-f73b-0231-dbd2-3b3271ab0e96@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Organization: Parliament Hill Computers Ltd User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [PHP-DEV] [RFC] Deprecate and Remove utf8_encode and utf8_decode From: addw@phcomp.co.uk (Alain D D Williams) On Mon, Feb 21, 2022 at 03:52:57PM +0000, Craig Francis wrote: > I would personally encourage everyone to have ext/intl installed and use > > grapheme_strlen() instead of mb_strlen(), because knowing whether a > > particular instance of the string "Nguyễn" is written with 6, 7, or 8 > > code points is not nearly as useful as knowing that it looks like 6 > > "characters" to a user either way. Looking at the description of grapheme_strlen() I note that it can return null. However it does not say why. https://www.php.net/manual/en/function.grapheme-strlen.php Digging in the code I see that it will return null if intl_convert_utf8_to_utf16() fails. I think because of one of: U_BUFFER_OVERFLOW_ERROR means that *target buffer is not large enough U_STRING_NOT_TERMINATED_WARNING usually means that the input string is empty u_strFromUTF8() failing. -- Alain Williams Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer. +44 (0) 787 668 0256 https://www.phcomp.co.uk/ Parliament Hill Computers Ltd. Registration Information: https://www.phcomp.co.uk/Contact.html #include