Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122578 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 2717D1AD8F6 for ; Wed, 6 Mar 2024 08:26:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1709713627; bh=8uXeulEwfDzeiNSjuWxI9rUm2N0nRxsI9Xarluv1K00=; h=Date:Subject:To:References:From:In-Reply-To:From; b=LUGwOzibam3ZnTyflKPdO3Z3+yMHrNKIsQnKGud8eNKQM6nFT/uytIa8LOrtbJemk ht5Mtc1vwHI/qhgxOF/dGQI/fLHUI/dlIWZQqzj9GBy+1tEdSMYzl/r0rfLaOjGsz6 rJXwVc8ZPi6mMQD0FoVbmpt8Sh27TtTge9krZYzGdgg8woSB5WVuzoiGkJF2G/rHHS 5gYwEejdThf3c1p4frwa4VoUiEZVuyUYpvvoPw8BimpW8WobGuxHviT0PjQiYgJpsL LIxxed+Hp4VGc5IX2ADQvjregCFQ5KlW4XO80K1OPPz/HmueJFdI3F3sZDx4nLYj6D NVcHPcyV9pPHg== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 851F918005D for ; Wed, 6 Mar 2024 08:27:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_40,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 6 Mar 2024 08:27:06 +0000 (UTC) Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-412e6ba32easo19984665e9.0 for ; Wed, 06 Mar 2024 00:26:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709713612; x=1710318412; darn=lists.php.net; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=Mqy7yCyGeKzS0M1wzpAifHuMs6iaW5scwvx3Uzt8Gek=; b=LcOwo+fJT+umCgCSr8Pv401rR5AGhIyxg4wIHeMGmSdHbJScvWD8T8X9cAjB574Yt+ BoM2sX+A4AK13OYS1vamkovnt0gXfZZVCUIkCw7maBjPFVjp4xcYJPmI/ErlC2xJyfwg GOcLrXweWTcKrKv/pUvgnxLDdsBdhBcDz0aC9A4FtJkNJBfSSNHDjvrYtCmgJrKOPrp+ itBzMS3h8HZVr5EqmjGRC32XlSsCfn6juKhrzx6kyUagKYOFzFbk1GkAiS/Ett7nE/zR 4sM5+j99HyYjI2Yp0GFLnWrUxAUAws5vR1f5XVIB2T5rOlCEhUpz5YstLOgPkeKAyl/c Qi2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709713612; x=1710318412; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Mqy7yCyGeKzS0M1wzpAifHuMs6iaW5scwvx3Uzt8Gek=; b=ow2VftYwhNo8hcgMn6UxfOdAEZPF4p2bdJM0Q0iLOmaXoTkSearUVesvLR1qlF+cxi MyJ0+ltj86dTO6f1uXMM8n8cYM/JZwrN5Y0EJVBFU3zFLZU+PY+ZGzFyGbVb8UqUcmCo TAlYWwSepOROPonuLeWsX00JkRLEreJ9voQx69rMVe40HyUeB0dFEXuEuOii5ULohWTR 9XE847b0Uw05XRh2U2TgpCoVU1fk4Gos+RLKpvlFwcPq+wtGCKxKVsTuPGdJnLwxtiwj YYo28D3iF4ZqXvfIznF+YdlSaa6NQImxdiQmvfS1wqDciNpqyt0QReC2bIIG1Cb3fvSj Upig== X-Gm-Message-State: AOJu0YxjHC5W46PPNk1l0AQj5zkd0sLbSh62uJY7Hi0wL8kzons6gGUB N9O8hXBhk+y0JgP7Ar83x4lNhkb6wIiq4SEvxM4vu3VLxpd3gqOXEJRu86bb X-Google-Smtp-Source: AGHT+IHUJfkCDOsOyRVHER2EYxzMPeXcN6/HyppLSettvunX6LT3nx6vIJ9UBb4WW5fFeM071eIFDw== X-Received: by 2002:a5d:48d1:0:b0:33d:7ea3:5b90 with SMTP id p17-20020a5d48d1000000b0033d7ea35b90mr9640797wrs.65.1709713612006; Wed, 06 Mar 2024 00:26:52 -0800 (PST) Received: from ?IPV6:2a02:1811:cc83:ee50:280e:1e36:3a00:824? (ptr-dtfv08akcem5xburtic.18120a2.ip6.access.telenet.be. [2a02:1811:cc83:ee50:280e:1e36:3a00:824]) by smtp.gmail.com with ESMTPSA id n15-20020a5d484f000000b0033dec836ea6sm16946330wrs.99.2024.03.06.00.26.51 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 06 Mar 2024 00:26:51 -0800 (PST) Message-ID: <1af7cc52-e571-4975-9361-9cf2247e0cc2@gmail.com> Date: Wed, 6 Mar 2024 09:26:51 +0100 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PHP-DEV] [Discussion] grapheme cluster for str_split function To: internals@lists.php.net References: <51122d7e-f218-4243-bbb8-dd59de2165b4@app.fastmail.com> <5817daaf-813c-430e-a12f-32908d1f7ec7@gmail.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: dossche.niels@gmail.com (Niels Dossche) On 06/03/2024 01:37, youkidearitai wrote: > 2024年3月6日(水) 9:22 youkidearitai : >> >> Hi, Larry >> Hi, Niels >> >> 2024年3月6日(水) 6:47 Niels Dossche : >>> >>> Hi Larry >>> Hi Yuya >>> >>> So first of all, I meant the error handling in cases like these: https://github.com/php/php-src/pull/13580/files#diff-b8fe038d9d7539694593978bea5605f38dde4bcb6a016865130590e45e23202eR852-R860 >>> The implementation still returns NULL here, so the signature is still incorrect. Either it should return false to match the other functions, or throw something and not return a value. >>> >>> On 05/03/2024 18:40, Larry Garfield wrote: >>>> On Tue, Mar 5, 2024, at 7:25 AM, youkidearitai wrote: >>>>> 2024年3月5日(火) 5:52 Niels Dossche : >>>>>> >>>>>> Hi Yuya >>>>>> >>>>>> This sounds useful. >>>>>> >>>>>> I do have a question about the function signature: >>>>>> function grapheme_str_split(string $string, int $length = 1): array {} >>>>>> >>>>>> This always returns an array. >>>>>> However, looking at your PR it seems you return NULL on failure, but the return type in the signature isn't nullable. >>>>>> Also, from a quick look, it seems other functions return false instead of null on failure. So perhaps the return type should be array|false. >>>>>> >>>>>> What do you think? :) >>>>>> >>>>>> Kind regards >>>>>> Niels >>>>>> >>>>>> On 03/03/2024 00:21, youkidearitai wrote: >>>>>>> Hi, Internals >>>>>>> >>>>>>> I noticed PHP does not have grapheme cluster for str_split function., >>>>>>> Until now, you had to use the PCRE function's \X. >>>>>>> >>>>>>> Therefore, I try create `grapheme_str_split` function. >>>>>>> https://github.com/php/php-src/pull/13580 >>>>>>> It is possible to convert array per emoji and variation selectors using ICU. >>>>>>> >>>>>>> If it's fine, I'll create an RFC. >>>>>>> >>>>>>> Regards >>>>>>> Yuya >>>>>>> >>>>> >>>>> Hi, Niels >>>>> >>>>> Thank you for your comment. >>>>> Indeed, returns false is make sense. >>>>> >>>>> Therefore, I changed to returns false when invalid UTF-8 strings. >>>>> >>>>> Regards >>>>> Yuya >>>> >>>> Many legacy functions return false on error, but that is widely regarded as bad design. Please do not continue bad design. >>> >>> I agree that returning false on error isn't ideal for exceptional cases, that's what exceptions are for. >>> Looking at the other grapheme functions makes me wonder though how consistent this would be, especially w.r.t. intl_get_error_*() and intl_error_name(). >>> >>>> >>>> Right now, the best "standard" error handling mechanism available is exceptions. false (or null) can very easily lead to incorrectly using that value as though it were valid, when it's not, which will sometimes cause a fatal error and sometimes cause a security leak. >>>> >>>> If the input value cannot be logically processed, that's an exception. (Or Error, perhaps.) >>>> >>>> --Larry Garfield >>> >>> Kind regards >>> Niels >> >> Thank you so much for advice. >> Indeed, This current grapheme* functions seems inconsistent. >> >> Therefore, it's one thing when returns null, throws any exception. >> Shall we do so just for the grapheme_str_split function? >> >> Regards >> Yuya >> >> -- >> --------------------------- >> Yuya Hamada (tekimen) >> - https://tekitoh-memdhoi.info >> - https://github.com/youkidearitai >> ----------------------------- > > Ah, If throws exception when intl_error*, is required other an RFC? > If we make grapheme_str_split's signature is below (include null): Hi Yuya If you want to change other grapheme functions with respect to error handling, then it requires a separate RFC. I think consistency between the functions is important. > > ``` > function grapheme_str_split(string $string, int $length = 1): array|null {} > ``` > > For now, I change signature to `array|null`. Most others functions return false on error, so I think it should be array|false, and the implementation should use RETURN_FALSE instead of RETURN_NULL. > > Regards > Yuya > > Kind regards Niels