Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:128173 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id 8E6E21A00BC for ; Tue, 22 Jul 2025 08:08:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1753171585; bh=UiUobDo8z3P1IEZuAGPsh4GcvgpLVhcYwdgZioUK2ho=; h=References:In-Reply-To:From:Date:Subject:To:From; b=VfdBmNlHLtX6WmKRD1UkWS1IjOBGfWWI9Nyd+BUA3UmsVZlx+aaUp7DBDxmf/Rxuy YcUf4va6Y4iTMnYxMEsydWCOYy0OjDBZsRDkuioLfDhzQn8ib9A6w6iNb9txHLSNm7 +goDAfwoTb6tfmL49YF5s9WegbwiobrzUK+Hw/sjlmE9Dd4ddbwHQnyDkIPUL6FWaN Se03ZsCAaKnDRUbCiR96M+nxnBSl4/A8Mh+CH7iozu7EnledVLV3ysIKcmLltvf9E9 2N2KI6bLNBGHYmaoqCvpjSuClnHi5orVdjbsBFXa93J14TWwYsAPRco9ErGDtq5wrV pZsvNKUnPS/ig== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 12C34180079 for ; Tue, 22 Jul 2025 08:06:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 22 Jul 2025 08:06:24 +0000 (UTC) Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-3a4fb9c2436so2953343f8f.1 for ; Tue, 22 Jul 2025 01:08:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1753171689; x=1753776489; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fOa4ZNKGWWuKuZeR/HWYEP1DeuttDOS0ooR6LyRVGaE=; b=QgyQX2gPUswoVTrb3lcFXQx70idkCAe0hiGC9Et0jgaeWP7m28aDC9GBXqbNKpYL0B PtXTExvJ7zcu1sWMNLSfG058OxpjW84F45jdYyBe9HQmqundwPm+mk6UzQuQd6/DZMpm GvNvQB5kPzrAw3YRcvVnGE2UftlRBWFi7YmYnXyh/eRwLmhiJFkM6CuHDkuWLdtGnJ7d R4k4KsVonny0fq1q8wJHtsuC0z2MGaoQ26zf7ftzpaeVAQmQnWPBpb0AOb1qNW93DRzK trhyeNbVERKA+2bwKCEPkPQus3cmlxBySnOueditIcSlgmozDuWvKTlzw3f0TYsDWHAx +Pqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753171689; x=1753776489; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fOa4ZNKGWWuKuZeR/HWYEP1DeuttDOS0ooR6LyRVGaE=; b=VZI0UnX4qvl1+hmNNongEA9HoAJnbthCJ5qtrbIyHZk60zwF4/86mP+2RRX3NDOQDz Pri6gYfRbklSWP21yU87Pi7RiwhjBrEIQj5UMI7LtF1HcEEnwihYJ9p4X+9rXr4YQMTA NxYI03MR2gd8DuJgUNUjgBWS643/RVEqlKBXoCNWHsB5zyCmiNINXcRoGdwWQ7bp2q8L R7bSroZTUPTAacstoWAZci8Zk7J+mqoQFGNFuGqr9lBraRXLkwKVxEV+QDDZzp/wVePx 7J9Vc5+CswOd6BB/mpNOD4Ct1TZR0J/VB6jf6PbqCntlgOLk1ZdAkQ/oCSjErclwcSAu JLUw== X-Gm-Message-State: AOJu0YyvIyJGdeBmiOHybAPY5dBE8SkRa7EGraCmAIvswmc/bBr85rBP MMBFTNvIUqsnCD8U6WqxawbprpZjyibcwIx7/nmu+ogJwqNetnT49gJpHObwtmvjCsrJHKnvCFX /qOV3cX6YTcjdI2/4FyLpbVYCAm3E/1A9TIaT/g== X-Gm-Gg: ASbGncuAHLROn7GeiJd3PX6tx3USYjDbCWtUBuyBlVmyHNW95yhvZNlszPaLo2dOnu+ LwUOEUqFv4DcBMBW8A5L/HVSznw/1caK7NovbrzxoKIiNPyXS6cgz+9uTEA2feimMX2TwXK3qjB Zsbk7IYqOMNBDNZFYCZMTooeBLMlgK6O8lsJxYEnmrKGDQe/3Eh29X18l6h/WTKyrJ2zpMjt7As 6QMCw== X-Google-Smtp-Source: AGHT+IHOD1AqMfGHU6gubfhE/iAcz6EpMbNYNlJJ6YtGd9W3/tUkWieq/5+5GzIdJUTh9KmZ8rti1CzY/bwMGBJ8edQ= X-Received: by 2002:a05:6000:310a:b0:3a5:7991:ff6 with SMTP id ffacd0b85a97d-3b61b0ebedamr12278046f8f.1.1753171688488; Tue, 22 Jul 2025 01:08:08 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <3b7b0362-537e-5e2f-4345-63d0b7ef0964@php.net> In-Reply-To: Date: Tue, 22 Jul 2025 17:07:56 +0900 X-Gm-Features: Ac12FXzviiDamUouSgbBRkNsnlEKcQHbtbW765jB37lJ6t_JHhGH69rOtGH7z1w Message-ID: Subject: Re: [PHP-DEV][DISCUSSION] Add locale and strength for grapheme functions To: php internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) 2025=E5=B9=B47=E6=9C=8815=E6=97=A5(=E7=81=AB) 16:05 youkidearitai : > > 2025=E5=B9=B47=E6=9C=8814=E6=97=A5(=E6=9C=88) 19:22 Derick Rethans : > > > > On Wed, 9 Jul 2025, youkidearitai wrote: > > > > > Hi, Internals > > > > > > I changed below the RFC. > > > - https://wiki.php.net/rfc/grapheme_add_locale_for_case_insensitive > > > Pull request is below: > > > - https://github.com/php/php-src/pull/18792 > > > > > > Change point is below: > > > - Add a strength for grapheme_* functions > > > - Affect to all over the world characters, ex: Ideographic Variatio= n > > > Sequence(IVS) > > > - Use Collator object const values. > > > > These settings are indeed important for these functions, but I can't ge= t > > around the fact that it makes these APIs really cluttered and > > complicated =E2=80=94 something that many functions in the grapheme_ / = intl > > extension already suffer from. > > > > Is this API really the best way? > > > > > $locale parameter is not change anything. Because I could not find an= y way. > > > > It seems that I came to a similar conclusion, but locales are much more > > complicated than just languageCode_regionCode (for example, see > > https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt#= L25) > > > > You also don't really need a strength argument, as you can 'encode' tha= t > > in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugl= y > > and the list of options is vast: > > https://www.unicode.org/reports/tr35/tr35-collation.html#Common_Setting= s > > > > cheers, > > Derick > > Hi, Derick > > Thank you very much for response. > > > Is this API really the best way? > > I reconsidered the function signature based on what you said. > > > It seems that I came to a similar conclusion, but locales are much more > > complicated than just languageCode_regionCode (for example, see > > https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt#= L25) > > > > You also don't really need a strength argument, as you can 'encode' tha= t > > in the locale name, like: 'nb_NO-u-ks-primary' (I know, it's rather ugl= y > > and the list of options is vast: > > https://www.unicode.org/reports/tr35/tr35-collation.html#Common_Setting= s > > Indeed, since strength can be specified in the locale, > I thought it would be better to specify it in the locale rather than > as a parameter for strength. > > For example, The grapheme_* functions can detect difference for IVS. > ``` > $ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}", > "\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));' > int(1) > $ sapi/cli/php -r 'var_dump(grapheme_levenshtein("\u{908A}", > "\u{908A}\u{E0101}"));' > int(0) > $ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}", "\u{908A}\u{E0101= }"));' > int(0) > $ sapi/cli/php -r 'var_dump(grapheme_strpos("\u{908A}", > "\u{908A}\u{E0101}", locale: "ja_JP-u-ks-identic"));' > bool(false) > ``` > > Since ideographic characters also have identities (e.g., names), we > would like to make IVS compatible with them. > However, it should be simple, so we should compromise somewhere. > > Regards > Yuya > > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > ----------------------------- Hi, Internals I have revised this RFC. https://wiki.php.net/rfc/grapheme_add_locale_for_case_insensitive I believe I have done my best to address the complexity of Unicode. I would like to go to "Voting" phase. If there are no objections, I would like to start voting this week. Regards Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------