Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:130135 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id 40D2C1A00BC for ; Mon, 23 Feb 2026 11:27:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1771846080; bh=ui22Ah14yV/ErTQIifGPATARdlwJXxwpUmycTKdc3Fw=; h=From:Date:Subject:To:From; b=cCcyK/xfj7ZiGFqXH/25gFMRQxola49KJh1qqIJ80A1W6TRAFX/zhnuvE2mx54erG PEt9lQfGB4aNHBiTekoNiJ+PYnFsMGVDt8OI6aFMteGnHgB0YFg8godKWA+/eAPGht 0ypW/ra1zEhr4FQErslSFESeQbB360c2OJ0p6Cl3xVWYosuf9vfvsohUZNP4SuYdEY Y3IH87LgWJvUgB7iKdhFSfv3fHyvBDuHIOJ+9HLdI0doPUnDK0iNcg1cgtowv0B63U BmVYSWGVh0UpcUoQpwa37L2mZKrJrc8324Ol0hb26y4A5Knh+J1XStdRvavcGl8WLX cI53G7o0HIOkA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id EE1D318062E for ; Mon, 23 Feb 2026 11:27:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_50, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 23 Feb 2026 11:27:59 +0000 (UTC) Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-435a11957f6so3571088f8f.0 for ; Mon, 23 Feb 2026 03:27:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1771846073; cv=none; d=google.com; s=arc-20240605; b=fCnU9MKjgeXebA+wcMDjGA3OZKL0T1kb/hD1vWwio07xp1vpn8RcnKpA2dgqTDiTct JDZlgBFegnSi3KkR/2hU5WYu3sx6YZHfY4bakhgCR39yn+cQZkuONXYHcvo2VaymRKil yEbmUw67juQxQInCFiHGNz0TtipAq/MjKgtHqYiFO59fPh3VG+M3dGgVVweC2D7wuqRs DMW2pCx0RY/8WBSb27x6fjut2hClWMBVxTIny9/siUpUtyHtrKaRUnirxqn2wCGiFCp8 whoAmxSF7ix99yCGHHVQlqZdNlya1kHX9lb4Ny6rTMywWJcLPfEhXazYuGfFjtLp9nDG EnSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:to:subject:message-id:date:from :mime-version:dkim-signature; bh=ui22Ah14yV/ErTQIifGPATARdlwJXxwpUmycTKdc3Fw=; fh=RnZ+4KjfdZdxwVfzmjFoBvUAaJ227RJecqE9MM9tvfQ=; b=AMfdkcy4zc0OzYCTlmo47ynucNb0n5jNZxw0yQRM39WB8nLyodAZa9ISlcRslwac3v ZaEbqMkUIsHdQA8WdRL6T2lIv2THNcd/qom33PdRpa1kECxhKrf4c0pCnFn/45w8dl1s hnxZaHB7mxfMAlrE96CYJRm/oz2T3jcnYzWbBUm7nX+ANBQ1radSdXXrjJ++36VBTKbB CLxVcmoHAxpUI/r3UiRLVh1KBqiqDlAxNaoFLWDTgpAUrpmq16aU0o4mUDxHdl6RGtCP 8y215IaLCjgF7HUhRDs9d+kC5/X3msAFIgLwzOlPJA2XoqydXcj08peCgzO46x64iwp2 EsSQ==; darn=lists.php.net ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771846073; x=1772450873; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ui22Ah14yV/ErTQIifGPATARdlwJXxwpUmycTKdc3Fw=; b=I+F5ufTu9RJJLbG3d1l2XmEkOBHNd1z2lsl7MeKx1zMdqWMQE5yv/oAKs8JUlyvx9X ONmeU+ZT+lyUlwBcj5i4wgvzUsvN/jH6tsF1dcbkqn1qLeop6oA7PyMjGx1yjlyh6yzM 2LQguyeJUKsrOal/rZTAVRO/SBo1iOGP7Fj7RzCgVBTa240poEVXLzZHM7qUV9OJhc9y Oyx1IpUcVytk302GISU308CTtLvWvSRvjagxGQ95qAfPNvEgzM0qx4OIajhdT0FnV9Jw EJ8Gw58qwzRkwKnMtGzZn8DG/VrtPD8GwEMLVoLqIO7CL30ulWyGnUoEGHnHjHCfEY18 pfRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771846073; x=1772450873; h=content-transfer-encoding:to:subject:message-id:date:from :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ui22Ah14yV/ErTQIifGPATARdlwJXxwpUmycTKdc3Fw=; b=Kxu89WPmHufZ8sKDzYnqamseY3UnV2c5UJ7KteNLluyDwtILOc7His3FNbOleInczl x2yU4K1VjtOAEYkhvow78FCTfNDiTnuf+/NY4emvF0MOaaU3tndv8LFMywLwjsdm8AR4 dJDh8ePECf0IicFHRualrS6FuW5C4gBQrPec8nVdWwQO7s461b2LoDYgr4anpXp63AxI +Dzu8uArgFmpxHSa+dmXUFoOJV3c+M06xSh2O6DLILLTUX5bRD564TQSzYF64fT5qZwN 9WA1E61hZ1uaTV8yJa4ks3CBNdc82YeWB3rSzHxaGaam7GmTwA3rdf0hc4bBL6yWXKYM Y8QA== X-Gm-Message-State: AOJu0Yw3zIeGpqsfW8AXzHInvbG/6Vye1Xdq/k4qWPLQg10SiA2MmwJ3 rUXIoxXvbm6j6y+pe/OTXtyCf/Ywxn8YnaBmlebzkPU3486dAts6Vyv/OWYxG0XslblxdvUeSTH G30n4eI6kZP0bhyOCRDD/lTbw4ljs4biZuCg+mg== X-Gm-Gg: ATEYQzyDeiwIb7cfR+vzhzjR6HMtLFxPG5GL2Z9gWofUULZuO2UDQzXwtD23L13G6S4 43Zw3BmynIGS3pA0n2asUKQWGiktGlkd+V/cnbNy1tPWKyvscAIx7+gwlYBdJr+rq58MDeLz9NE NJtvrrNbrGmjUpM6b88Hv6CxEG1Yz3awku9p8j98vsTUz6s1GbAoREKYdHoc/MX2VDEM2yKO2Wf tqWhiWKWDW5jaA2v9yh1zmbYLHusQfhmbtfS6gUrRegaU7Jb9tIelIqg3heVXloK/59M3BgPmHH 2rUzZc1Q2MZS273z0cCUn62wAFR0KSLsYSM= X-Received: by 2002:a05:6000:2f86:b0:436:18e5:48 with SMTP id ffacd0b85a97d-4396f15b2ffmr14167202f8f.15.1771846073068; Mon, 23 Feb 2026 03:27:53 -0800 (PST) Precedence: list list-help: list-unsubscribe: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 Date: Mon, 23 Feb 2026 20:27:41 +0900 X-Gm-Features: AaiRm52vmDWBDaZ9HsGQ4woMLnAsMh2ARaBkWffnRuEwTfpynChDp4pMU4sbUxs Message-ID: Subject: [PHP-DEV][DISCUSSION] Limit of code point for grapheme cluster in programming languages To: php internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) Hi, Internals I noticed grapheme cluster is not limit code points in UAX#29. https://www.unicode.org/reports/tr29/ And there is no limit code point in Unicode that confirmed in issue of ICU. https://unicode-org.atlassian.net/browse/ICU-23302 So that means create many code points in 1 grapheme cluster, That is crash for program because computer resource is limited. For example, this code is 200MB but 1 grapheme cluster in emoji_bomb.txt ``` php -r 'echo(mb_trim(str_repeat("\u{200d}\u{1f468}\u{200d}\u{1f466}\u {200d}\u{1f466}", 10000000), "\u{200d}"));' -d memory_limit=3D600M > emoji_bomb.txt ``` (PLEASE BE CAREFUL OPEN IN emoji_bomb.txt BECAUSE MAYBE CRASH) So, I think we(php-src, programming language level) need to create new custom limit function. My idea is below: ``` grapheme_limit_codepoints(string $str, integer $max_codepoints =3D 32): boo= l ``` I don't have heavy opinion that $max_codepoints is 32. However, 32 code points is enough of grapheme cluster because human language max code points is maybe Hak=E1=B9=A3hmalawaraya=E1=B9=81(= =E0=BD=A7=E0=BE=90=E0=BE=B5=E0=BE=A8=E0=BE=B3=E0=BE=BA=E0=BE=BC=E0=BE=BB=E0= =BE=82) in 9 code points. If need more than code points in grapheme cluster, Userland can to increase $max_codepoints. Please see also my speakerdeck. https://speakerdeck.com/youkidearitai/limit-of-code-point-for-grapheme-clus= ter What do you think about this idea? Regards Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------