Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122579 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 594891AD8F6 for ; Wed, 6 Mar 2024 10:07:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1709719656; bh=44cSgcv2Jy8Ifk8WSqd5/dXMWkxGnoI7kBQra0VS7Uw=; h=References:In-Reply-To:From:Date:Subject:To:From; b=X9arSj7Q/FK8pUHkyjLpsOghTxOg+mSfCDZCPLvoUhAoKFJGPwOsiLdoG3Fxx7b+f uEeMGItjoQLtMkhn4zhD3re1j6KgsIlwhwzvjZsqhLHly6qVAGHZDSczRe9mrEcXiH N276VbixgIF6u6nZvWpFFEyebOtK5RMoL4+LZeT3apcAdQTY7j+tLonSV7j7E13U+f AeyMj7LuY1w5UolGvLtctZmLC4mzFAOiamBu35grPvB87Fi23wmX6M58+dJmpdHZy+ T7RTRLO2e2QoeSa7cOS16rl1cShLKeXQRmf6VU5uL15jzaDJfesXx0omH8yOMvvxFj GUFj7h4Jxz8og== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 8518218007F for ; Wed, 6 Mar 2024 10:07:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_20,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 6 Mar 2024 10:07:35 +0000 (UTC) Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-412ed3f1e3fso11198165e9.0 for ; Wed, 06 Mar 2024 02:07:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709719641; x=1710324441; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kQ0Bq7JwbzcCyTPoMwgMeLWPzl8wuTlcRbXBkuqCHcs=; b=hC9FZAJIM5BE3cVeAuQcE+4g8DmgGWLYVyv9Ww66pXHwC5fEGrguiTf1o7RhWImOkt UeCqJZOWSVKAvlMDhiVfOFnrnoPh9fipcF48cLqCu75pkmercZvI94GkPTyNGYL1i2Wq 10kuZWnzSalzjjlQBpr3490YZHEq/vzT4GKmYDJv65mA3uaqMYXUZr+POpCt0awFFi/q poumOsR1XWzb/MbBmU6phMBKOf1H+skDkNvRiKLre+PlHH8gZmpJ31ziN6/G82gTHRzr Gq2EFRuKVpev33lenp3cGG66Hw2c7RwrTRF90avytCmf4zZFpIqJ6c+sKAZDJc4cPXTW UXeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709719641; x=1710324441; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kQ0Bq7JwbzcCyTPoMwgMeLWPzl8wuTlcRbXBkuqCHcs=; b=gm5vhPR90jf8z4wixk16wEQMMBUGJF9mVOJU/pefd1PIC2pcpeizu5QiXp3LiKY0x7 pzIEJLTXn2UmdGNOWBH3Ag+tVyDY0AX1oOXxGODHK4EOc2Q0vmLtKBfR/VjK1zG/FIdl Br0Bgq56vCUxSOdDwnx0zjzUv2JeXZOZbbzWa7n5MWX3KkXyl62KfooJPcgGDfPJzK11 2Cae2kcN+n74M5++/7Hl5UaiXE60pzXZoeO01qdQPvBSzmH3bopAndrZMlFnB5DDWoRj MFHXBbxmeR0jxryfwDyummBzcWp1ha1jcSLTh2zwWOIQBRiSMSnoFnxHcg/X6DOdAxYi PmWA== X-Gm-Message-State: AOJu0Yz52EPpMS3ka6xhiRSFTrOKa5UH8V0bizqnCIq76CSJ6TbVlIhe BciBEIC7L+YzkLhJdCC5MAPku+sMlWGIYmiBis1ecYH5M8uD3DDw25D5ydMWnvit3hhywIhIr02 SsWyy1gI3gc8EJ/ZZo1XTv8fYfO8ynDuGacb7 X-Google-Smtp-Source: AGHT+IENCt+OlDLNlSV7/FPiv7kr97RlZHI4k/AsUvqxeqPZZDI4VBUB3BCG6TP74jhBb49M3LjcB5gbvgx+zVvR5Ns= X-Received: by 2002:a05:600c:34c2:b0:412:ef42:84df with SMTP id d2-20020a05600c34c200b00412ef4284dfmr2975654wmq.13.1709719640982; Wed, 06 Mar 2024 02:07:20 -0800 (PST) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: <51122d7e-f218-4243-bbb8-dd59de2165b4@app.fastmail.com> <5817daaf-813c-430e-a12f-32908d1f7ec7@gmail.com> <1af7cc52-e571-4975-9361-9cf2247e0cc2@gmail.com> In-Reply-To: <1af7cc52-e571-4975-9361-9cf2247e0cc2@gmail.com> Date: Wed, 6 Mar 2024 19:07:09 +0900 Message-ID: Subject: Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function To: internals@lists.php.net Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) 2024=E5=B9=B43=E6=9C=886=E6=97=A5(=E6=B0=B4) 18:42 Niels Dossche : > > On 06/03/2024 01:37, youkidearitai wrote: > > 2024=E5=B9=B43=E6=9C=886=E6=97=A5(=E6=B0=B4) 9:22 youkidearitai : > >> > >> Hi, Larry > >> Hi, Niels > >> > >> 2024=E5=B9=B43=E6=9C=886=E6=97=A5(=E6=B0=B4) 6:47 Niels Dossche : > >>> > >>> Hi Larry > >>> Hi Yuya > >>> > >>> So first of all, I meant the error handling in cases like these: http= s://github.com/php/php-src/pull/13580/files#diff-b8fe038d9d7539694593978bea= 5605f38dde4bcb6a016865130590e45e23202eR852-R860 > >>> The implementation still returns NULL here, so the signature is still= incorrect. Either it should return false to match the other functions, or = throw something and not return a value. > >>> > >>> On 05/03/2024 18:40, Larry Garfield wrote: > >>>> On Tue, Mar 5, 2024, at 7:25 AM, youkidearitai wrote: > >>>>> 2024=E5=B9=B43=E6=9C=885=E6=97=A5(=E7=81=AB) 5:52 Niels Dossche : > >>>>>> > >>>>>> Hi Yuya > >>>>>> > >>>>>> This sounds useful. > >>>>>> > >>>>>> I do have a question about the function signature: > >>>>>> function grapheme_str_split(string $string, int $length =3D 1): ar= ray {} > >>>>>> > >>>>>> This always returns an array. > >>>>>> However, looking at your PR it seems you return NULL on failure, b= ut the return type in the signature isn't nullable. > >>>>>> Also, from a quick look, it seems other functions return false ins= tead of null on failure. So perhaps the return type should be array|false. > >>>>>> > >>>>>> What do you think? :) > >>>>>> > >>>>>> Kind regards > >>>>>> Niels > >>>>>> > >>>>>> On 03/03/2024 00:21, youkidearitai wrote: > >>>>>>> Hi, Internals > >>>>>>> > >>>>>>> I noticed PHP does not have grapheme cluster for str_split functi= on., > >>>>>>> Until now, you had to use the PCRE function's \X. > >>>>>>> > >>>>>>> Therefore, I try create `grapheme_str_split` function. > >>>>>>> https://github.com/php/php-src/pull/13580 > >>>>>>> It is possible to convert array per emoji and variation selectors= using ICU. > >>>>>>> > >>>>>>> If it's fine, I'll create an RFC. > >>>>>>> > >>>>>>> Regards > >>>>>>> Yuya > >>>>>>> > >>>>> > >>>>> Hi, Niels > >>>>> > >>>>> Thank you for your comment. > >>>>> Indeed, returns false is make sense. > >>>>> > >>>>> Therefore, I changed to returns false when invalid UTF-8 strings. > >>>>> > >>>>> Regards > >>>>> Yuya > >>>> > >>>> Many legacy functions return false on error, but that is widely rega= rded as bad design. Please do not continue bad design. > >>> > >>> I agree that returning false on error isn't ideal for exceptional cas= es, that's what exceptions are for. > >>> Looking at the other grapheme functions makes me wonder though how co= nsistent this would be, especially w.r.t. intl_get_error_*() and intl_error= _name(). > >>> > >>>> > >>>> Right now, the best "standard" error handling mechanism available is= exceptions. false (or null) can very easily lead to incorrectly using tha= t value as though it were valid, when it's not, which will sometimes cause = a fatal error and sometimes cause a security leak. > >>>> > >>>> If the input value cannot be logically processed, that's an exceptio= n. (Or Error, perhaps.) > >>>> > >>>> --Larry Garfield > >>> > >>> Kind regards > >>> Niels > >> > >> Thank you so much for advice. > >> Indeed, This current grapheme* functions seems inconsistent. > >> > >> Therefore, it's one thing when returns null, throws any exception. > >> Shall we do so just for the grapheme_str_split function? > >> > >> Regards > >> Yuya > >> > >> -- > >> --------------------------- > >> Yuya Hamada (tekimen) > >> - https://tekitoh-memdhoi.info > >> - https://github.com/youkidearitai > >> ----------------------------- > > > > Ah, If throws exception when intl_error*, is required other an RFC? > > If we make grapheme_str_split's signature is below (include null): > > Hi Yuya > > If you want to change other grapheme functions with respect to error hand= ling, then it requires a separate RFC. > I think consistency between the functions is important. > > > > > ``` > > function grapheme_str_split(string $string, int $length =3D 1): array|n= ull {} > > ``` > > > > For now, I change signature to `array|null`. > > Most others functions return false on error, so I think it should be arra= y|false, and the implementation should use RETURN_FALSE instead of RETURN_N= ULL. > > > > > Regards > > Yuya > > > > > > Kind regards > Niels Hi, Niels Thank you very much. I got it. You're right. I couldn't understand well. I use returns false when something wrong. I updated Pull Request. Please see. Regards Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------