[Discussion] Proposal: grapheme_mask() helper function

8 hours ago by sepehrphpr@gmail.com — view source — reply

unread

Hello everyone
Over the past few weeks I have been exploring a common pattern that
frequently appears in PHP applications: masking sensitive parts of
strings such as credit card numbers, email addresses, phone numbers,
and personal identifiers.

In many real-world codebases, developers typically implement masking
using combinations of functions like substr(), strlen(), str_repeat(),
substr_replace(), or their multibyte equivalents. While these
approaches work, they often lead to repetitive, error‑prone, and
sometimes inefficient user‑land implementations. Handling edge
cases—especially when offsets are negative, lengths are omitted, or
when working with Unicode text—can make these snippets unnecessarily
complex.

While thinking about this problem, I designed a function concept
called grapheme_mask(). The goal of this function is to provide a
clear, native, and Unicode‑safe way to mask sections of a string.

The key idea is that the function operates on grapheme clusters,
rather than raw bytes or individual code points. This allows it to
correctly handle modern Unicode text, including composed characters
and emoji sequences, without breaking them apart.

Conceptually, the function replaces a range of grapheme clusters with
a masking string.

Example:

grapheme_mask("sepehr@example.com", "", 2, -12);
// result: se***@example.com

Example with emoji sequences:
grapheme_mask("👨🏽‍👩‍👧‍👦 family", "", 0, 1);
// result: family

The intention is not to replace existing string functions, but to
provide a dedicated and expressive helper for a task that developers
routinely implement themselves.

If there is interest from the community, I would be happy to draft a
formal RFC describing the proposed behavior, edge cases, and potential
implementation details.

I would greatly appreciate any feedback, thoughts, or suggestions.

Best regards,

Sepehr

4 hours ago by youkidearitai — view source — reply

unread

‪2026年6月19日(金) 19:54 ‫سپهر محمودی‬‎ sepehrphpr@gmail.com:‬

Hello everyone
Over the past few weeks I have been exploring a common pattern that
frequently appears in PHP applications: masking sensitive parts of
strings such as credit card numbers, email addresses, phone numbers,
and personal identifiers.

In many real-world codebases, developers typically implement masking
using combinations of functions like substr(), strlen(), str_repeat(),
substr_replace(), or their multibyte equivalents. While these
approaches work, they often lead to repetitive, error‑prone, and
sometimes inefficient user‑land implementations. Handling edge
cases—especially when offsets are negative, lengths are omitted, or
when working with Unicode text—can make these snippets unnecessarily
complex.

While thinking about this problem, I designed a function concept
called grapheme_mask(). The goal of this function is to provide a
clear, native, and Unicode‑safe way to mask sections of a string.

The key idea is that the function operates on grapheme clusters,
rather than raw bytes or individual code points. This allows it to
correctly handle modern Unicode text, including composed characters
and emoji sequences, without breaking them apart.

Conceptually, the function replaces a range of grapheme clusters with
a masking string.

Example:

grapheme_mask("sepehr@example.com", "", 2, -12);
// result: se***@example.com

Example with emoji sequences:
grapheme_mask("👨🏽‍👩‍👧‍👦 family", "*", 0, 1);
// result: * family

The intention is not to replace existing string functions, but to
provide a dedicated and expressive helper for a task that developers
routinely implement themselves.

If there is interest from the community, I would be happy to draft a
formal RFC describing the proposed behavior, edge cases, and potential
implementation details.

I would greatly appreciate any feedback, thoughts, or suggestions.

Best regards,

Sepehr

Hi, Sepehr and Internals

Thank you for bringing up discussion.
Looks good to me.

One more point for add that function.
The diacritical mark sometimes includes one code point and separated
code points.
For example, Umlaut(ä, a + ¨), Dakuten(が, か + ゛) and etc in the world.
These characters needs support for grapheme_mask function.
Therefore, I would like need that function.

Regards
Yuya

--

Yuya Hamada (tekimen)

4 hours ago by sepehrphpr@gmail.com — view source — reply

unread

Hi everyone,

I'm Sepehr, the author of this proposal. I'm glad to see the interest in
grapheme_mask().

I have already developed a working prototype in C (based on ICU ubrk) along
with several PHPT test cases covering Unicode and emoji clusters. I believe
this addition will significantly improve how developers handle sensitive
data masking in modern PHP applications.

I have requested a Wiki account to start the formal RFC process and share
the implementation details.

Looking forward to your feedback.

Best regards,
Sepehr

در تاریخ جمعه ۱۹ ژوئن ۲۰۲۶، ۱۸:۱۷ youkidearitai youkidearitai@gmail.com
نوشت:

‪2026年6月19日(金) 19:54 ‫سپهر محمودی‬‎ sepehrphpr@gmail.com:‬

Hello everyone
Over the past few weeks I have been exploring a common pattern that
frequently appears in PHP applications: masking sensitive parts of
strings such as credit card numbers, email addresses, phone numbers,
and personal identifiers.

In many real-world codebases, developers typically implement masking
using combinations of functions like substr(), strlen(), str_repeat(),
substr_replace(), or their multibyte equivalents. While these
approaches work, they often lead to repetitive, error‑prone, and
sometimes inefficient user‑land implementations. Handling edge
cases—especially when offsets are negative, lengths are omitted, or
when working with Unicode text—can make these snippets unnecessarily
complex.

While thinking about this problem, I designed a function concept
called grapheme_mask(). The goal of this function is to provide a
clear, native, and Unicode‑safe way to mask sections of a string.

The key idea is that the function operates on grapheme clusters,
rather than raw bytes or individual code points. This allows it to
correctly handle modern Unicode text, including composed characters
and emoji sequences, without breaking them apart.

Conceptually, the function replaces a range of grapheme clusters with
a masking string.

Example:

grapheme_mask("sepehr@example.com", "", 2, -12);
// result: se***@example.com

Example with emoji sequences:
grapheme_mask("👨🏽‍👩‍👧‍👦 family", "*", 0, 1);
// result: * family

The intention is not to replace existing string functions, but to
provide a dedicated and expressive helper for a task that developers
routinely implement themselves.

If there is interest from the community, I would be happy to draft a
formal RFC describing the proposed behavior, edge cases, and potential
implementation details.

I would greatly appreciate any feedback, thoughts, or suggestions.

Best regards,

Sepehr

Hi, Sepehr and Internals

Thank you for bringing up discussion.
Looks good to me.

One more point for add that function.
The diacritical mark sometimes includes one code point and separated
code points.
For example, Umlaut(ä, a + ¨), Dakuten(が, か + ゛) and etc in the world.
These characters needs support for grapheme_mask function.
Therefore, I would like need that function.

Regards
Yuya

--

Yuya Hamada (tekimen)

https://tekitoh-memdhoi.info

https://github.com/youkidearitai

[Discussion] Proposal: grapheme_mask() helper function

grapheme_mask("sepehr@example.com", "", 2, -12); // result: se***@example.com

Example with emoji sequences: grapheme_mask("👨🏽‍👩‍👧‍👦 family", "*", 0, 1); // result: * family

grapheme_mask("sepehr@example.com", "", 2, -12); // result: se***@example.com

Example with emoji sequences: grapheme_mask("👨🏽‍👩‍👧‍👦 family", "*", 0, 1); // result: * family

--

grapheme_mask("sepehr@example.com", "", 2, -12); // result: se***@example.com

Example with emoji sequences: grapheme_mask("👨🏽‍👩‍👧‍👦 family", "*", 0, 1); // result: * family

--

grapheme_mask("sepehr@example.com", "", 2, -12);
// result: se***@example.com

Example with emoji sequences:
grapheme_mask("👨🏽‍👩‍👧‍👦 family", "", 0, 1);
// result: family

grapheme_mask("sepehr@example.com", "", 2, -12);
// result: se***@example.com

Example with emoji sequences:
grapheme_mask("👨🏽‍👩‍👧‍👦 family", "", 0, 1);
// result: family

grapheme_mask("sepehr@example.com", "", 2, -12);
// result: se***@example.com

Example with emoji sequences:
grapheme_mask("👨🏽‍👩‍👧‍👦 family", "", 0, 1);
// result: family