Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122562 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 832291AD8F6 for ; Tue, 5 Mar 2024 08:25:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1709627137; bh=QEBXzKt3fC1jF9yrdBCn46Ek4w18tOnDVsckFHoezKM=; h=References:In-Reply-To:From:Date:Subject:To:From; b=i5W9TBWSp7phNIro8YY+uNvWFzR0UvWwyMFs2zG9lwrZE2lwtEKIRnO/OYoxGVAko uX+2Y4pUGl6e1ndCJU2yhAxadCD1vSpXGOEuhZSM59liY8oQcB3oovvYvFEjbxuyB6 epc5QnTNSZVL7XoxeV0upu1DBic4UZzPeAlEyDqSx4gQ1XxS7De4oz1rOGdcviVZqS T/9LYLsdfWaPRuEBJlbduMj9gsqDm3+5p8vTZKoYGgGxvqHPgzTCRPHaHkVlq8WzGK urHXBoRGMvK7jIIPO5sU6yC640uRmUy0AWRHjrnO86v9CdIP7clji4fcltI+WDxfEc lADeJXU+3EsRQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 8FA12180627 for ; Tue, 5 Mar 2024 08:25:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 5 Mar 2024 08:25:36 +0000 (UTC) Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-33da51fd636so2756414f8f.3 for ; Tue, 05 Mar 2024 00:25:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709627122; x=1710231922; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=psy1c1F3jKumaulWOgCx1ngCvfev+qwF5ajhADfoSEQ=; b=UwyVYyvVh2A4LhO0WY1iaaRHy0cxS5P1zGLxtusbBmMomBly9UdiBYkmeHcpMyZ2mC cCwsUMcjcUWcnf1TnorLJhS4qJANuo3QHsTxoujca7viiwtDiZmZqt3Vmh9cF+YlSGa1 tQGB+mp1xXqK8XryB+7qKzhBn8yBr0zfKtdUcN8mUBaI+B2K5wKF7TTkKni78SSgjf9g V/sp4sszdwe9rDej1Q832p0qXP0Bwp2yo469RnthWO2KkIaB9x5PX0ddX7oerJ7IJjSv DQmdFfjW9GHbFAVG2K7oDyLkY8yQxDPagcv+e2Q78ThgNNFJztaKump+9k9jJHHwnnJk /nKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709627122; x=1710231922; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=psy1c1F3jKumaulWOgCx1ngCvfev+qwF5ajhADfoSEQ=; b=P2CnIoxwVwdD7arh3ntGi5QZ4Gq8CyHO3mLMB8cYyeq62Hf4ChJOmJbm/DHmeAKX9m GjWEhs1VGgEwMFY1S/HF2pGdAy83Kvk1URz4HYkwLzWNNKjWSJLq/TUc1ot3YZqDT4aF WTrG36U12lCpRVE4W9rN2xsdybeRdGlpsv0mUZyl2gMyV6r0J5DnciwSqiEuz1NQgEBb b1wuriT6KxXJRFEYtn7XVwjribu4irYhhgHW/UJkYJh4OxoRtZOA8Q9lVVCABJZoc2Xv hfHTEd39T/v4yjqjMLm8hZpCcxMB4ZiDzhHt5sajHKP0JLCkSBXdpUiWcU5a6r1aKjv4 LInw== X-Gm-Message-State: AOJu0YyZUuW0qznfkcKpiBVQbSONXJkGb/C6XbPg9b1x+4mGPHr62YwI rg2HW+IxT4Sa0IpBqZSOWxROFXR0IZCcgzcQUVZ0IMPz/kjyV5zCC9cWYJoJppUZhnJB1b8U+Ex GNpnh0ajs8yy4+5zj2TPxRkeAtArJZ2U8sMr8 X-Google-Smtp-Source: AGHT+IGwyG60CPUS2ELaHaRYxj2dhfy6Pm97MPRlcCRDgOSwZHDm5sVaOliGb4OrSjKsYMWu8nnE5YEr7AyY21+3mVk= X-Received: by 2002:a5d:55c4:0:b0:33e:17fe:8c23 with SMTP id i4-20020a5d55c4000000b0033e17fe8c23mr7603097wrw.22.1709627122441; Tue, 05 Mar 2024 00:25:22 -0800 (PST) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: In-Reply-To: Date: Tue, 5 Mar 2024 17:25:11 +0900 Message-ID: Subject: Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function To: internals@lists.php.net Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) > > Hi, Niels > > Thank you for your comment. > Indeed, returns false is make sense. > > Therefore, I changed to returns false when invalid UTF-8 strings. > > Regards > Yuya > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > ----------------------------- Sorry, again. I checked behavior of mb_str_split function. So Illegal byte sequences are returned as is. ``` sapi/cli/php -r 'var_dump(mb_str_split("=E3=81=82\xc2\xf4\x80=E3=81=82"));' array(4) { [0]=3D> string(3) "=E3=81=82" [1]=3D> string(2) "=EF=BF=BD=EF=BF=BD" [2]=3D> string(1) "=EF=BF=BD" [3]=3D> string(3) "=E3=81=82" } ``` And, I reading ICU document about utext_openUTF8 (below is link): https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/utext_8h.html#a130e= 7cba201c4b38799b432eb269f6d5 > Any invalid UTF-8 in the input will be handled in this way: a sequence of= bytes that has the form of a truncated, but otherwise valid, UTF-8 sequenc= e will be replaced by a single unicode replacement character, \uFFFD. Any o= ther illegal bytes will each be replaced by a \uFFFD. Therefore, I think encoding check is not need. Returns only arrays together with mb_str_split. Regards Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------