Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122576 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id DA1061AD8F6 for ; Wed, 6 Mar 2024 00:37:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1709685464; bh=0ZolWAi6AXeJxtvoZrqsHAQgxVsMbPabepbVLiJn2fo=; h=References:In-Reply-To:From:Date:Subject:To:From; b=A72AbkjY8pj/dIqeEJ+m6K1wc6SqiG4DErAuJYt+bcKGRRO5i+DSrNzjggUMPrmNu eYSZPhFZP2xZb9eecm+l3aJJsfpy7fFxqBp0cOMp1u/7SWnJ4cEX//fMpd88HThQto fAarUjhXi6bOfmcnUdf+B+YHdxe/IwvT0z8jsGv5l5pQwZdojlIMn+O779iCW5tbBB p7xzANZwsFhH1RkfDkb4NA3CqF2mMNN+IfnU7EqYz8O03bP39agrIJN+Bz+ibB6y4C G0ImENZaiAWmpDAVDSOAp+/q/+nYz8fn6rntMu9doYy/KhsmVqE4fWuxTCI6lz6YJL fGgxaRFcwijkw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 2290A18006B for ; Wed, 6 Mar 2024 00:37:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_40,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 6 Mar 2024 00:37:43 +0000 (UTC) Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-412eeb018ffso8952965e9.1 for ; Tue, 05 Mar 2024 16:37:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709685450; x=1710290250; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gfi7AcKcBxIvTYydjCl0fSMBu2t46r1g0szbaDs2dxY=; b=dzjP6/Us3ZoM0988Yjmo3VN/ARz7UJcg22gi/p6rT/pxgpKsEvqQ63djsUMUlrHQ6a 1/FVVs97bsVhcCeo+TMpluxu+LOM1qKeWV9Yuk/X9/BWCdzh93Sc3pGRLFsJJjw8UViM 76sfFUuGRShoYx+zRUbEzAb/Jj+e71TkQwhCdAdgIgEjHbSg35XQ+NQO8T6qIlHv1/tS XCZ7YRlhJMBOxkovUHYCPuYrxD78zaaH42wZ22myrx5Q2ZgDQc/oiI3GMFiWHLRz6pFY Hi2GjgcFn7bxU/ZBvmhfQGC8gqDhlQExiHZojpmOBi+yx2x0CWZJyFCEOVKTlt/K0kF9 OMuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709685450; x=1710290250; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gfi7AcKcBxIvTYydjCl0fSMBu2t46r1g0szbaDs2dxY=; b=fzir95Bp91aIfEyoOe8s1X9dH2bcK6CIm2/LVjP9z0UTBSYaR5rFeNiBwz3FYV1AeD HW9vFeUwdfItEI4Rcd7bmINYHsWalLXMg9HTqko0z+zw/8xwUFeycfSVfyjeMuu0Fa/7 oP21+3xN7W3ZUHoDQVsE14+APJm4JCry4RLQugMfWKDvvx7k4bd1jYR5NfVnVcmyuTI2 PmUKzD8Nf9gEukQhPF5IhZOs9O0G3bpT8gteTrPprjLkzWXvWwaF0+yIMruAoV2PjHpK bEuv0GHBFng6Ku3jaVToz+6FMnyOt/Ki6r8fSWUbbX5ppEoGRPrTb9J6Rjr8G3jo2vJt mxCA== X-Gm-Message-State: AOJu0YytitBNY8MrNXpVCsiOe4oAJOOB8wS1lVdf1y1+ekm1rxoJNam8 Aaw/BrCGjegVxCuac+1Enb8sP9UoaqkMLSJApCU0PxPtZVL1RiH4t3gy86jeopFVjmgae42ZAo3 EJhzitoLanh8LN1PoTm3mWRa2yX5tZdtnib0r X-Google-Smtp-Source: AGHT+IHPXMqj73fxdvkzVcG3PgDUnZv+JJzMQYn7hZguFlN+g5884w0to19glFCUUKgPrWor4Y+LbEMNm+fDdUTCVa8= X-Received: by 2002:a05:600c:a384:b0:412:ae57:379 with SMTP id hn4-20020a05600ca38400b00412ae570379mr9648701wmb.17.1709685449585; Tue, 05 Mar 2024 16:37:29 -0800 (PST) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: <51122d7e-f218-4243-bbb8-dd59de2165b4@app.fastmail.com> <5817daaf-813c-430e-a12f-32908d1f7ec7@gmail.com> In-Reply-To: Date: Wed, 6 Mar 2024 09:37:18 +0900 Message-ID: Subject: Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function To: internals@lists.php.net Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) 2024=E5=B9=B43=E6=9C=886=E6=97=A5(=E6=B0=B4) 9:22 youkidearitai : > > Hi, Larry > Hi, Niels > > 2024=E5=B9=B43=E6=9C=886=E6=97=A5(=E6=B0=B4) 6:47 Niels Dossche : > > > > Hi Larry > > Hi Yuya > > > > So first of all, I meant the error handling in cases like these: https:= //github.com/php/php-src/pull/13580/files#diff-b8fe038d9d7539694593978bea56= 05f38dde4bcb6a016865130590e45e23202eR852-R860 > > The implementation still returns NULL here, so the signature is still i= ncorrect. Either it should return false to match the other functions, or th= row something and not return a value. > > > > On 05/03/2024 18:40, Larry Garfield wrote: > > > On Tue, Mar 5, 2024, at 7:25 AM, youkidearitai wrote: > > >> 2024=E5=B9=B43=E6=9C=885=E6=97=A5(=E7=81=AB) 5:52 Niels Dossche : > > >>> > > >>> Hi Yuya > > >>> > > >>> This sounds useful. > > >>> > > >>> I do have a question about the function signature: > > >>> function grapheme_str_split(string $string, int $length =3D 1): arr= ay {} > > >>> > > >>> This always returns an array. > > >>> However, looking at your PR it seems you return NULL on failure, bu= t the return type in the signature isn't nullable. > > >>> Also, from a quick look, it seems other functions return false inst= ead of null on failure. So perhaps the return type should be array|false. > > >>> > > >>> What do you think? :) > > >>> > > >>> Kind regards > > >>> Niels > > >>> > > >>> On 03/03/2024 00:21, youkidearitai wrote: > > >>>> Hi, Internals > > >>>> > > >>>> I noticed PHP does not have grapheme cluster for str_split functio= n., > > >>>> Until now, you had to use the PCRE function's \X. > > >>>> > > >>>> Therefore, I try create `grapheme_str_split` function. > > >>>> https://github.com/php/php-src/pull/13580 > > >>>> It is possible to convert array per emoji and variation selectors = using ICU. > > >>>> > > >>>> If it's fine, I'll create an RFC. > > >>>> > > >>>> Regards > > >>>> Yuya > > >>>> > > >> > > >> Hi, Niels > > >> > > >> Thank you for your comment. > > >> Indeed, returns false is make sense. > > >> > > >> Therefore, I changed to returns false when invalid UTF-8 strings. > > >> > > >> Regards > > >> Yuya > > > > > > Many legacy functions return false on error, but that is widely regar= ded as bad design. Please do not continue bad design. > > > > I agree that returning false on error isn't ideal for exceptional cases= , that's what exceptions are for. > > Looking at the other grapheme functions makes me wonder though how cons= istent this would be, especially w.r.t. intl_get_error_*() and intl_error_n= ame(). > > > > > > > > Right now, the best "standard" error handling mechanism available is = exceptions. false (or null) can very easily lead to incorrectly using that= value as though it were valid, when it's not, which will sometimes cause a= fatal error and sometimes cause a security leak. > > > > > > If the input value cannot be logically processed, that's an exception= . (Or Error, perhaps.) > > > > > > --Larry Garfield > > > > Kind regards > > Niels > > Thank you so much for advice. > Indeed, This current grapheme* functions seems inconsistent. > > Therefore, it's one thing when returns null, throws any exception. > Shall we do so just for the grapheme_str_split function? > > Regards > Yuya > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > ----------------------------- Ah, If throws exception when intl_error*, is required other an RFC? If we make grapheme_str_split's signature is below (include null): ``` function grapheme_str_split(string $string, int $length =3D 1): array|null = {} ``` For now, I change signature to `array|null`. Regards Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------