Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:130238 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id EBDE41A00BC for ; Tue, 3 Mar 2026 13:58:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1772546332; bh=5JNKIYUlluigcN2rTVv1Cng/6hQP4Siz4l8gosw1yBc=; h=References:In-Reply-To:From:Date:Subject:To:From; b=RMJv8XFuXvZomoO5b193/FFs5noRf7DOZGnUEvsrSSsrR/BIjDI1g2lONVSaOuZw1 VSrHyIXgVJQlWZsU726xV95vSg5T+XUIC4KnbshS4n6qkIQ7Al+CCY6PRMJz3/xuDx bmbMkZVM58yXsDCYnygFjtZkHen8ryaUOzGLBPOoda1xoDSZRvnvmfv3p/w9Xkpcd7 XTlcjCruI1tFPXm4CnSGDUUIAgnaH9AjJHPwn2ySiH0cYLu/VduWOjqgkRP3o+PmcW mDCBUoTZLPFjyEpi62EE48VpRua431lWS0fiEz1XPMpInd0q+ykimffTIyY1SRUD9P nJFlbg1Ujm75Q== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id ECA931801E8 for ; Tue, 3 Mar 2026 13:58:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_50, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 3 Mar 2026 13:58:48 +0000 (UTC) Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-4838c15e3cbso51772955e9.3 for ; Tue, 03 Mar 2026 05:58:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772546322; cv=none; d=google.com; s=arc-20240605; b=EHX1TFI8Mz0XNDrwIGTJrTJ1xwhEj4YV/kAdMTR3n/qBycJPXBH5wlr0hzFA1QvpHN 2xWTcg5I9dPPIdSVIabxE2a5/LhSd1B64MjPI8cvSTwqUioSJyT3HPaegFSxmjgV61yk pTjiaS/VgnMJeEcd4iLSVlV1xO01OX80TVt+qOJooN2ENCVXdY7zvPKgVwI69cYH9krx PMqz1yuddglLEGdvn8PylYO3dfTVSutqv9Pnhu6yoeaehGaxiT2q0UDCnhH7y2QXtxgt hMVA+9acmNoi/cFeoxCls1ttQpbRSeVcVb20fq7xdUjf9vvi2qKvOQqKqVf8OYSxVjgO xt1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=StVnH6hGqS6DYTdzIXpyKv4DPGxPT6Rw3ua4pL1oZPw=; fh=RnZ+4KjfdZdxwVfzmjFoBvUAaJ227RJecqE9MM9tvfQ=; b=X/rOaqoiAzvsfJRzvVn/dRnl8gJ7qEVZkxrQKamWjKW0XbNhhH1WpMPU91xVdfa2yW zxUUF3TMflydXCQ4+JhNoqq/mazzO3Oi/TbIHs6YACShBrUMaCmcKv/jgMw1aSl4L1Vp pj2l4DvwBtUEAYXtR1ggO8dRlL0fT8qnc28Itf30Yoyfe8LPpD9yFTHck24Poeuy73WP jZ+vgHqP/DLoWwNoVP0gRVY5gMVvwT0LjXe9N4QtVLeU+SQa1I5ouTd/G1Kw69TB9t1l AtqwEna7VSRXcZTNx7CkfveaZ41vKCS7+XbUTPZSZZE8X9TyL2KkckxW4jGNzVFhVM60 7lxQ==; darn=lists.php.net ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772546322; x=1773151122; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=StVnH6hGqS6DYTdzIXpyKv4DPGxPT6Rw3ua4pL1oZPw=; b=hQRLJYkqhy6YVvdp4qob02fCrRu5BbvAQSb9erqxBsSvLv3JddyF2GR+xHAS7LayY1 +J8y1LomIdQ2O8NPcANZ/abKI/JlWGZ6TzVfQliojfRf9kN3fdxE7/V6bCt4z/Wztkxr kNUuMcY60K7duod7PUzMdCBYi/wyeXXNsk8LsTzO5hGoayIyGlkTzE51WH3vIZUkCRSq QJOFcaT1Kyz6L8CmwNtt2jaLtxfZ5NWyEae5dwLxIhImgKgwRheCroFgeYRt/2ynmgPD Povi3IRJcaFP4SbjWJecKJltXNAQV1IuY+bKxpruk6EkE9/wQ4+VJXT5H04U19Cpk0G6 weJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772546322; x=1773151122; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=StVnH6hGqS6DYTdzIXpyKv4DPGxPT6Rw3ua4pL1oZPw=; b=Hm/LoKjd1Miz74+/iXx6lizOHOg46oEAPvmAtU8yCVPWZGxCxRy3w1WaFONViWAGvJ GSF1VhRRhvC4ZBr2fDKJ3oExG0Zqq6cniMkLB2SmB23k2tM3ba0/wD0dFW1TEk5OaoEg IlTaQyJ23sYy9vOZPrImaLc9qbKnAzNRVmjGAQZ5jFnUDmAOirvx4E/t1yDAxoiebSWJ LKlqyjaSxYWWpyFu2+PMVDzp6dFSuogfqkx0tUVG4SasCRSjsjwhA7eNwIdsGAunqtVb V9yzIpEQmfmad8boDM3ZH3boeGRNJrmKxDWgXai7aRR3EuRMX22ZtFpn95EKcgXgMo93 IN8g== X-Gm-Message-State: AOJu0YzOwvYkH9EwFAtOzA1uIvJpojOFnFvFdimdb3Te1asAPrKA1N5X y3y0mHc6zxzurzoQ1xGD+S0M6WgMAPFRHU+TwMMt2mHgsOdJ4Ahq5l4eAg9/PySOQhz1ptfHFRF 162yf2fNGVVkiE4fs87F2mdNVE2iRt9pHoxODNQ== X-Gm-Gg: ATEYQzxPiF7tgAPteUs1mCaOgr3KYGa+h9ri8Kzz30gerYrwQIbHqF16NPQA6hyn8K5 ckFQhYWdOKhP4tWCvpM2nRPHN75tVq5aMCJP8M6Q2wq4+8ZklBzfbwtt3Uv3o/CtRpZ46r44GjJ HfRHbcxOhgr1M8zpOVwFhMeVQZt7NypTHj6FX/p2V+vuTcCKJ7yJHewl2b3i89NK8M3st9ha5JN C5BN2uws4s/IO9aO1H0uxZeRNMzCwNdslidWWuibAQjMhWsl1UR7exFDW83epLPSztJHk5qqpJR /wVjmZoRn9lXVhTMkaXc8JgNAe6ZDiaUCNo= X-Received: by 2002:a05:600c:46c4:b0:483:7ae2:1737 with SMTP id 5b1f17b1804b1-483c9c1bbe9mr289813155e9.17.1772546322324; Tue, 03 Mar 2026 05:58:42 -0800 (PST) Precedence: list list-help: list-unsubscribe: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 References: In-Reply-To: Date: Tue, 3 Mar 2026 22:58:31 +0900 X-Gm-Features: AaiRm50-d-Na5lawiaLviNscNLQ6gfR3E2fZ9LjNJiIvqLGpOPLpgeLSnPQWTnU Message-ID: Subject: Re: [PHP-DEV][DISCUSSION] Limit of code point for grapheme cluster in programming languages To: php internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) 2026=E5=B9=B42=E6=9C=8828=E6=97=A5(=E5=9C=9F) 0:59 youkidearitai : > > 2026=E5=B9=B42=E6=9C=8824=E6=97=A5(=E7=81=AB) 16:21 youkidearitai : > > > > 2026=E5=B9=B42=E6=9C=8824=E6=97=A5(=E7=81=AB) 11:38 Kentaro Takeda : > > > > > > Hi Yuya, > > > > > > I think this is a good idea. While spec compliance is generally desir= able, DoS via unbounded grapheme clusters is a real threat, and it's reason= able for a language-level implementation to impose practical limits that th= e Unicode spec itself doesn't define. This kind of gap between a general-pu= rpose spec and a concrete implementation is not unusual. > > > > > > The default of 32 code points sounds sensible given that natural lang= uage grapheme clusters top out well below that. > > > > > > One minor note: it might help to clarify the intended behavior of `gr= apheme_limit_codepoints` a bit more =E2=80=94 for instance, whether it is m= eant as a validation check (returning false when a cluster exceeds the limi= t) or something else. > > > > > > Regards, > > > Kentaro Takeda > > > > > > > > > 2026=E5=B9=B42=E6=9C=8823=E6=97=A5(=E6=9C=88) 20:28 youkidearitai : > > >> > > >> Hi, Internals > > >> > > >> I noticed grapheme cluster is not limit code points in UAX#29. > > >> https://www.unicode.org/reports/tr29/ > > >> > > >> And there is no limit code point in Unicode that confirmed in issue = of ICU. > > >> https://unicode-org.atlassian.net/browse/ICU-23302 > > >> > > >> So that means create many code points in 1 grapheme cluster, > > >> That is crash for program because computer resource is limited. > > >> > > >> For example, this code is 200MB but 1 grapheme cluster in emoji_bomb= .txt > > >> ``` > > >> php -r 'echo(mb_trim(str_repeat("\u{200d}\u{1f468}\u{200d}\u{1f466}\= u > > >> {200d}\u{1f466}", 10000000), "\u{200d}"));' -d memory_limit=3D600M > > > >> emoji_bomb.txt > > >> ``` > > >> (PLEASE BE CAREFUL OPEN IN emoji_bomb.txt BECAUSE MAYBE CRASH) > > >> > > >> So, I think we(php-src, programming language level) need to create n= ew > > >> custom limit function. > > >> My idea is below: > > >> > > >> ``` > > >> grapheme_limit_codepoints(string $str, integer $max_codepoints =3D 3= 2): bool > > >> ``` > > >> > > >> I don't have heavy opinion that $max_codepoints is 32. > > >> However, 32 code points is enough of grapheme cluster because > > >> human language max code points is maybe Hak=E1=B9=A3hmalawaraya=E1= =B9=81(=E0=BD=A7) in > > >> 9 code points. > > >> > > >> If need more than code points in grapheme cluster, > > >> Userland can to increase $max_codepoints. > > >> > > >> Please see also my speakerdeck. > > >> https://speakerdeck.com/youkidearitai/limit-of-code-point-for-graphe= me-cluster > > >> > > >> What do you think about this idea? > > >> > > >> Regards > > >> Yuya > > >> > > >> -- > > >> --------------------------- > > >> Yuya Hamada (tekimen) > > >> - https://tekitoh-memdhoi.info > > >> - https://github.com/youkidearitai > > >> ----------------------------- > > > > Hi, Kentaro > > > > Thank you very much for your feedback. > > > > > One minor note: it might help to clarify the intended behavior of `gr= apheme_limit_codepoints` a bit more =E2=80=94 for instance, whether it is m= eant as a validation check (returning false when a cluster exceeds the limi= t) or something else. > > > > Okay. I'll show you. > > > > ``` > > // something string in $_POST['text'] > > // Validate many code points in a grapheme cluster. > > if (grapheme_limit_codepoints($_POST['text'], 32) !=3D=3D true) { > > throw new InvalidException("Found invalid / many code points in > > grapheme cluster"); > > } > > > > // Validate grapheme cluster length > > if (grapheme_strlen($_POST['text']) > 100) { > > throw new InvalidException("Invalid grater than 100 graphemes"); > > } > > > > // do anything... > > ``` > > The intention is "count correct graphemes avoid DoS". > > And I want to overcoming to > > https://github.com/symfony/symfony/pull/13527 in grapheme_strlen > > function. > > > > Feel free to more comment. > > Regards > > Yuya. > > > > -- > > --------------------------- > > Yuya Hamada (tekimen) > > - https://tekitoh-memdhoi.info > > - https://github.com/youkidearitai > > ----------------------------- > > Hi, Internals > > I created a PoC and RFC. > https://github.com/php/php-src/pull/21311 > https://wiki.php.net/rfc/grapheme_limit_codepoints > > I tried to ask Unicode that UAX#29 add for limit of codepoint for > grapheme cluster. > Perhaps Unicode adds my suggestion if it is make sense. However, I > don't know what happen. > > Anyway, I think make sense that grapheme cluster limits codepoint in PHP = side. > > Feel free to comment. > > Regards > Yuya > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > ----------------------------- Hi, Internals This topic, I reported Unicode. Then received reply that is below: > Thank you for your feedback and your interest in Unicode. > Your feedback will be reviewed by one of Unicode=E2=80=99s working groups= . > If appropriate, it may be posted to the PRI feedback page or be made part= of a list of general feedback that will be considered for the next quarter= ly UTC meeting. My understand, if appropriate PRI(https://www.unicode.org/review/) or UTC. I'm going to wait and see. Regards Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------