Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:128045 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id E79BE1A00BC for ; Tue, 15 Jul 2025 07:05:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1752563037; bh=cQSDYR4wTIKd3RKWsf7lokxRQAxt1txrFApqtPXkL08=; h=References:In-Reply-To:From:Date:Subject:To:From; b=nR6SER5WbSNI9zV1CZHE4kFTiwake7+TbTnYfq3N4sfxXhRsrXI4tlWOc1HUykD6C 68G6ADo7m97B5ZDwN1m8VO1CF2m/sDXyBAcDzosVO7TUD/HScHtcLwg5wClVZ2CGJl yPrTeMfpNrQFWJkeDDX4wiNvNovn5EHkiYM8jXn6Ihoa0CkeR6s+G8Xy93aywfiYXK f2A/HMXlgTCkLvKAGjApchvd/Bp47YmgsDDnFFXFypbBIGs9pK+wT+YaryPrHb8z8P 4yKgp06xvUgrHqjWsdbQirRXFSuXVc4M3wD/XVrNbxZQr4Q9HaMkEPvdTQ8ALlK55X IjXLaHBL8tsaw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id CD787180061 for ; Tue, 15 Jul 2025 07:03:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 15 Jul 2025 07:03:56 +0000 (UTC) Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-3a50fc7ac4dso2330020f8f.0 for ; Tue, 15 Jul 2025 00:05:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1752563143; x=1753167943; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WqBjaoN41/+ef6WrFz47fKYLftsPp8yJQlULmzxPb7w=; b=Os+jGY7oyPHsVnULCylVnfQXgRSMJNMNMPb+zI0ykcFD2viYsbDQqJx+djN3RtrpIp GLwsVe2wRsEGcSWV8Yf7KwVMsG0NDkaVSX+Rpr/Zvow9N8/Pze2zmHMnqXNoDjdnHxyI UeobSWegQthtzhTPg1BSXqiVv9NXjdFZ7lDr7uLdn5P9YWwWS5PmyjUQUC4IARZWh06M ek7VztB2R1Sq7tFNtwfGirFrnEYgxrtfKO+k9W7ln78mImW+GwOnw/+8rEn+0Tqt2cKq BiZFBDlyC7XxIzv+vOZE++4KaQgfRek8qp17v/3BMiRn2VBkR1nEWKboRWWheI1GujcN vE0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752563143; x=1753167943; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WqBjaoN41/+ef6WrFz47fKYLftsPp8yJQlULmzxPb7w=; b=IVPdjf6nmnVOLU6zPBBMI9ZlpUkTRhNF3iilneYlZF7YsBNzCBYUiZCp9zRZQv2jsR zk+r7UIAxmc3Hh1i1GfJQ8PwwY0DxQ1JnpvSIgVdXw7GfWDQUZ3/wrcTCimDKB7i+2NQ 4U4MDCDnB3vj/o9xZIHKJ/20YvWDeem40HyW14QlXnhNFRzsF0zEefFfQGMhNsVpbm5P fx1NIS4WFAMb6sY7CTqh9M6EX4H9NNq2oxitcYFB4RNBYM8Ue1XJFIP2VhCxbyiodPXK gPnOpwPiBKL1g3EJBcSyFSc4badplNLgmMua3ZdcPWCkNCp0pqunAV+0iGHJUUOyBLE4 b3Pg== X-Gm-Message-State: AOJu0YxNA6exyXK22J+IK+/YtV2LRXX4NbNOKJMv/NUmrrnv5j0xWz9k Ca6vCtXqDM5BesEwuJlONz8QkdkT8nFzyiOgG1gKyu+TlHR11W8x1woTyyrfLVGWZn/b6yhuqhG TPkxypEFklapt2gMUPGYYryVYPrBujcKMyV8enQ== X-Gm-Gg: ASbGnctwBbne5Te7imFx5d4/VcRLEnC9tQxuoZIFVN/HhOz9sty7rRKvvJSovmy+OXI PAilqR7LMwNPcWtewHCwiNhclmHhLB5XzGEiQ5+I0ka4AifV1MNTYkXxmxLYWQpphZFI89w/Ffg /VmcB4VtNV4UB23sPUQZA2npO0QOjJz/zGbZLo1xCj7RpwTW61VO29d5I50ySRua61FOQwDt8e6 TPKFZG8fFGo5pWl X-Google-Smtp-Source: AGHT+IFi+vlSIxVIEFROa7mfrZsIhyVuWKjIexZu47VzxDnqHnD6u86cghEtAdBEQRSnqMEQAf4OM+t6q6aSvDccfPE= X-Received: by 2002:a05:6000:250c:b0:3b6:333:256f with SMTP id ffacd0b85a97d-3b603332832mr5329349f8f.58.1752563142735; Tue, 15 Jul 2025 00:05:42 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <3b7b0362-537e-5e2f-4345-63d0b7ef0964@php.net> In-Reply-To: <3b7b0362-537e-5e2f-4345-63d0b7ef0964@php.net> Date: Tue, 15 Jul 2025 16:05:30 +0900 X-Gm-Features: Ac12FXylFg9THJXm_Oa3OJRAZXLBI9QwbFsVHAvKcCS9u-_vhrZUBC3OiAEJN6U Message-ID: Subject: Re: [PHP-DEV][DISCUSSION] Add locale and strength for grapheme functions To: php internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) 2025=E5=B9=B47=E6=9C=8814=E6=97=A5(=E6=9C=88) 19:22 Derick Rethans : > > On Wed, 9 Jul 2025, youkidearitai wrote: > > > Hi, Internals > > > > I changed below the RFC. > > - https://wiki.php.net/rfc/grapheme_add_locale_for_case_insensitive > > Pull request is below: > > - https://github.com/php/php-src/pull/18792 > > > > Change point is below: > > - Add a strength for grapheme_* functions > > - Affect to all over the world characters, ex: Ideographic Variation > > Sequence(IVS) > > - Use Collator object const values. > > These settings are indeed important for these functions, but I can't get > around the fact that it makes these APIs really cluttered and > complicated =E2=80=94 something that many functions in the grapheme_ / in= tl > extension already suffer from. > > Is this API really the best way? > > > $locale parameter is not change anything. Because I could not find any = way. > > It seems that I came to a similar conclusion, but locales are much more > complicated than just languageCode_regionCode (for example, see > https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt#L2= 5) > > You also don't really need a strength argument, as you can 'encode' that > in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugly > and the list of options is vast: > https://www.unicode.org/reports/tr35/tr35-collation.html#Common_Settings > > cheers, > Derick Hi, Derick Thank you very much for response. > Is this API really the best way? I reconsidered the function signature based on what you said. > It seems that I came to a similar conclusion, but locales are much more > complicated than just languageCode_regionCode (for example, see > https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt#L2= 5) > > You also don't really need a strength argument, as you can 'encode' that > in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugly > and the list of options is vast: > https://www.unicode.org/reports/tr35/tr35-collation.html#Common_Settings Indeed, since strength can be specified in the locale, I thought it would be better to specify it in the locale rather than as a parameter for strength. For example, The grapheme_* functions can detect difference for IVS. ``` $ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}", "\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));' int(1) $ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}", "\u{908A}\u{E0101}"));' int(0) $ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}", "\u{908A}\u{E0101}"= ));' int(0) $ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}", "\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));' bool(false) ``` Since ideographic characters also have identities (e.g., names), we would like to make IVS compatible with them. However, it should be simple, so we should compromise somewhere. Regards Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------