Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:130151 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id B97FF1A00BC for ; Tue, 24 Feb 2026 07:22:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1771917735; bh=ESCwTJuDppGMcaHeNaj2Vu95ZXkvT2IcNgkiZOCyVk0=; h=References:In-Reply-To:From:Date:Subject:To:From; b=CdFppmUevDad1ZGKf/lw2zoppIxu30u743+6iRzJzs8wQVRYYVdfgvK0lZW6CkjM5 QLGTXSI/XSCCnkHYhv3cqlTiO8t5pZ+0amIF88J0Z5EhCEE8FSBWBwYaKjnL3DEV0u JMncZDlpnoqcKLm7s+gbqRHDbYQ1Y7olVY/mCAxSZj2sSB3IizE08CHZa2T0SYZgXI Ve+Qgmuo4ki2c9vMrG43/y1xQKyI7SwXTwCY/P3X4dmK7n4/7ktSngcLhhUzkjgGEG BAd2ROCDYaIM55BBhzrbDMC8Y4NbDZcYFE8Y52KG8nmvf4Crpn67bVIrSlN8J3feQx PkpFmmCqHR88g== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 657EB1804B2 for ; Tue, 24 Feb 2026 07:22:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_50, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 24 Feb 2026 07:22:04 +0000 (UTC) Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-435f177a8f7so5057165f8f.1 for ; Mon, 23 Feb 2026 23:21:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1771917718; cv=none; d=google.com; s=arc-20240605; b=aXKAYxqzRmlKnjwxwZW//Wfn9db1+5aITwcOhwBk8LZpwvZoGo0cI+CaIPFglQoBfA iUA+EhgZucXDDz7ZZA84YTxy74XikIxnHIHajt9RXY2DGf2ol6RvSADgVVxyrN6pEY/P +4LpQyVu9t0hL5oZK45rDzglz9Vc18zLTJLuyFgkPsiV9uCq9tkTZRvQsh4JhyNmOMib 1GukTuu3gFWthP+UzLIQCrfRcCj3v2Aspo1DbhFVTAKvU0bXm2SE1JVnfBx5zmlAmcug vMKTnvvhCnM2YXD5w8W+s5CWbf6fhNblKfvMqa8xyRzilnfjEZ/eRYqwj1hFkhiO8MBu 7Z1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Ep7s82Yi0hHNKk5K3tmjPUxiMq1UoDvv931EprrzWRE=; fh=RnZ+4KjfdZdxwVfzmjFoBvUAaJ227RJecqE9MM9tvfQ=; b=UztFoF+GECxGSt0GAthWPpob+ZmxEcL0gana72SJmy7uf84NVE84qjzDRor5zk518x +D28ONgXhipZxSZ0PgYZi7T9tQT/Xtn26Sj4JO8vDCrCUYOHUIiur0mNwIizR3IiDQq4 /jGVCwSRxnAfISdynG6qUClMMThU58zl0juo9QcoMmaeEUjKfmVzO7PmDw95Egha+bv8 n+NvmNYefxaEPyABfyfv+dFnyaUDq53vq8x2Yk2JuWf8Vdu8dh0QB/nN/chxEQHcids6 f8hacuyLXTvuJJHkb71n2skt9RdnnZi5hM2mobpWzyyRzdu3sq+vhVBl6wJ6F83L/wFC N0rg==; darn=lists.php.net ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771917718; x=1772522518; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ep7s82Yi0hHNKk5K3tmjPUxiMq1UoDvv931EprrzWRE=; b=OvNp63e6LRdS66INP21IDxDjJLHehGCzBs8N9f2+menYYzSegNOmR9DcnEwQjfFKf2 SHB/saj2aY0vbdZc3wETI5zomoXpYrrCVR+UPNwnNy4HEDVHOlmmubtZfjTTvoXt6lFQ DDfMBIMs2MplOzcmP/4E3Aj6QAwpsr0Iemv196beyjWPuXHN05Z1iV9e8vcKXYKkJOEo oMolCCBBKdo/wuOHGqLzDXNi88uHXsRXanjYMTNhMSkKNi1f+BmqIOjt7fTpc+miPgVo Cr2f+LG6G4RtB96FZ4pH0gvNYLpf+x06PIOu+dGhdNfA8V6unTQm/urkz/smFT0XZilX HK8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771917718; x=1772522518; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Ep7s82Yi0hHNKk5K3tmjPUxiMq1UoDvv931EprrzWRE=; b=gHcd2DHkMiq10MToR9t5l2CGrDu97uREZ0zXqjIevx05vb4cUjvRWCbaFO+XYaTlQX PwpmJBg8ZTWXa/QCo8c3onwJ/ivy+jJsYXmIQcV/mddVyHGZAKOLWpBXrIN7bKGV6VTn JJnlnOGBKgqf0V09oujYF+u9SmU09j10958EMi1NjTIrLt75qVemuaaz9tid6ld59MLs pARc3d/tLQrJTXN2cSgVZB+vhaslZCOTp1b7JfyNphI/ff02oKoPX+rO9iNz2Mh+nQrW f2spW0MgPri0AF+b/1AF2QO1vaS/fTOQkfoGmgbReU6KZMMGoN/Z7gVFqLvcqn/Ogc6k mVCQ== X-Gm-Message-State: AOJu0YxdtK+HTfELIovtKMi19/UIT16V/4NnJazYJJs6UOBTmzOSvaR3 LqRE0dMI0c4CY3ygl/+kMCRLVkvT2F8fPzbgVbIbpwtbp1TPdRsRVAjFJ4pMXSJu5hN03vx9hVX MEmm4bN2kAtw0bX8niBQLuYN2amlU2oOqZq4tYg== X-Gm-Gg: ATEYQzxGsWhiwV4uGqQF/fvn7MZJlqnTAAktyKcsBJQrMKtunhhPD+BXuTY1MasdTV/ uASPtVeHGxatl+OqZunoCG3wkJhstSXUtz6hM6uWNtm0V3K1yv5hoxcFamUJjV1gWaYi8AMtFAU p8vRWECVC1X85JMnrHb7U576kscFe1p3vF/Cj+YbukPPzv0ph5WyJO4Lc0UVqcp0WNRddp0FJEk p98Zd/clo5Wc4ewF9Aoveh0FhkIaTet79VdZentjjQSUx2U+z1E4u4Ksc0r6Eaw+y1V0tIEn1pp yDgIYA== X-Received: by 2002:a05:6000:200b:b0:435:96ec:679e with SMTP id ffacd0b85a97d-4396f158983mr19055521f8f.23.1771917717453; Mon, 23 Feb 2026 23:21:57 -0800 (PST) Precedence: list list-help: list-unsubscribe: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 References: In-Reply-To: Date: Tue, 24 Feb 2026 16:21:45 +0900 X-Gm-Features: AaiRm53jCYvXBChgjuSYL0Sd35iEQvy48IzJ55OhtCCilKEQSoH5YUXMtQpUvLw Message-ID: Subject: Re: [PHP-DEV][DISCUSSION] Limit of code point for grapheme cluster in programming languages To: php internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) 2026=E5=B9=B42=E6=9C=8824=E6=97=A5(=E7=81=AB) 11:38 Kentaro Takeda : > > Hi Yuya, > > I think this is a good idea. While spec compliance is generally desirable= , DoS via unbounded grapheme clusters is a real threat, and it's reasonable= for a language-level implementation to impose practical limits that the Un= icode spec itself doesn't define. This kind of gap between a general-purpos= e spec and a concrete implementation is not unusual. > > The default of 32 code points sounds sensible given that natural language= grapheme clusters top out well below that. > > One minor note: it might help to clarify the intended behavior of `graphe= me_limit_codepoints` a bit more =E2=80=94 for instance, whether it is meant= as a validation check (returning false when a cluster exceeds the limit) o= r something else. > > Regards, > Kentaro Takeda > > > 2026=E5=B9=B42=E6=9C=8823=E6=97=A5(=E6=9C=88) 20:28 youkidearitai : >> >> Hi, Internals >> >> I noticed grapheme cluster is not limit code points in UAX#29. >> https://www.unicode.org/reports/tr29/ >> >> And there is no limit code point in Unicode that confirmed in issue of I= CU. >> https://unicode-org.atlassian.net/browse/ICU-23302 >> >> So that means create many code points in 1 grapheme cluster, >> That is crash for program because computer resource is limited. >> >> For example, this code is 200MB but 1 grapheme cluster in emoji_bomb.txt >> ``` >> php -r 'echo(mb_trim(str_repeat("\u{200d}\u{1f468}\u{200d}\u{1f466}\u >> {200d}\u{1f466}", 10000000), "\u{200d}"));' -d memory_limit=3D600M > >> emoji_bomb.txt >> ``` >> (PLEASE BE CAREFUL OPEN IN emoji_bomb.txt BECAUSE MAYBE CRASH) >> >> So, I think we(php-src, programming language level) need to create new >> custom limit function. >> My idea is below: >> >> ``` >> grapheme_limit_codepoints(string $str, integer $max_codepoints =3D 32): = bool >> ``` >> >> I don't have heavy opinion that $max_codepoints is 32. >> However, 32 code points is enough of grapheme cluster because >> human language max code points is maybe Hak=E1=B9=A3hmalawaraya=E1=B9=81= (=E0=BD=A7) in >> 9 code points. >> >> If need more than code points in grapheme cluster, >> Userland can to increase $max_codepoints. >> >> Please see also my speakerdeck. >> https://speakerdeck.com/youkidearitai/limit-of-code-point-for-grapheme-c= luster >> >> What do you think about this idea? >> >> Regards >> Yuya >> >> -- >> --------------------------- >> Yuya Hamada (tekimen) >> - https://tekitoh-memdhoi.info >> - https://github.com/youkidearitai >> ----------------------------- Hi, Kentaro Thank you very much for your feedback. > One minor note: it might help to clarify the intended behavior of `graphe= me_limit_codepoints` a bit more =E2=80=94 for instance, whether it is meant= as a validation check (returning false when a cluster exceeds the limit) o= r something else. Okay. I'll show you. ``` // something string in $_POST['text'] // Validate many code points in a grapheme cluster. if (grapheme_limit_codepoints($_POST['text'], 32) !=3D=3D true) { throw new InvalidException("Found invalid / many code points in grapheme cluster"); } // Validate grapheme cluster length if (grapheme_strlen($_POST['text']) > 100) { throw new InvalidException("Invalid grater than 100 graphemes"); } // do anything... ``` The intention is "count correct graphemes avoid DoS". And I want to overcoming to https://github.com/symfony/symfony/pull/13527 in grapheme_strlen function. Feel free to more comment. Regards Yuya. --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------