Hello everyone
Over the past few weeks I have been exploring a common pattern that
frequently appears in PHP applications: masking sensitive parts of
strings such as credit card numbers, email addresses, phone numbers,
and personal identifiers.
In many real-world codebases, developers typically implement masking
using combinations of functions like substr(), strlen(), str_repeat(),
substr_replace(), or their multibyte equivalents. While these
approaches work, they often lead to repetitive, error‑prone, and
sometimes inefficient user‑land implementations. Handling edge
cases—especially when offsets are negative, lengths are omitted, or
when working with Unicode text—can make these snippets unnecessarily
complex.
While thinking about this problem, I designed a function concept
called grapheme_mask(). The goal of this function is to provide a
clear, native, and Unicode‑safe way to mask sections of a string.
The key idea is that the function operates on grapheme clusters,
rather than raw bytes or individual code points. This allows it to
correctly handle modern Unicode text, including composed characters
and emoji sequences, without breaking them apart.
Conceptually, the function replaces a range of grapheme clusters with
a masking string.
Example:
grapheme_mask("sepehr@example.com", "", 2, -12);
// result: se***@example.com
Example with emoji sequences:
grapheme_mask("👨🏽👩👧👦 family", "*", 0, 1);
// result: * family
The intention is not to replace existing string functions, but to
provide a dedicated and expressive helper for a task that developers
routinely implement themselves.
If there is interest from the community, I would be happy to draft a
formal RFC describing the proposed behavior, edge cases, and potential
implementation details.
I would greatly appreciate any feedback, thoughts, or suggestions.
Best regards,
Sepehr
2026年6月19日(金) 19:54 سپهر محمودی sepehrphpr@gmail.com:
Hello everyone
Over the past few weeks I have been exploring a common pattern that
frequently appears in PHP applications: masking sensitive parts of
strings such as credit card numbers, email addresses, phone numbers,
and personal identifiers.In many real-world codebases, developers typically implement masking
using combinations of functions likesubstr(),strlen(),str_repeat(),
substr_replace(), or their multibyte equivalents. While these
approaches work, they often lead to repetitive, error‑prone, and
sometimes inefficient user‑land implementations. Handling edge
cases—especially when offsets are negative, lengths are omitted, or
when working with Unicode text—can make these snippets unnecessarily
complex.While thinking about this problem, I designed a function concept
called grapheme_mask(). The goal of this function is to provide a
clear, native, and Unicode‑safe way to mask sections of a string.The key idea is that the function operates on grapheme clusters,
rather than raw bytes or individual code points. This allows it to
correctly handle modern Unicode text, including composed characters
and emoji sequences, without breaking them apart.Conceptually, the function replaces a range of grapheme clusters with
a masking string.Example:
grapheme_mask("sepehr@example.com", "", 2, -12);
// result: se***@example.comExample with emoji sequences:
grapheme_mask("👨🏽👩👧👦 family", "*", 0, 1);
// result: * familyThe intention is not to replace existing string functions, but to
provide a dedicated and expressive helper for a task that developers
routinely implement themselves.If there is interest from the community, I would be happy to draft a
formal RFC describing the proposed behavior, edge cases, and potential
implementation details.I would greatly appreciate any feedback, thoughts, or suggestions.
Best regards,
Sepehr
Hi, Sepehr and Internals
Thank you for bringing up discussion.
Looks good to me.
One more point for add that function.
The diacritical mark sometimes includes one code point and separated
code points.
For example, Umlaut(ä, a + ¨), Dakuten(が, か + ゛) and etc in the world.
These characters needs support for grapheme_mask function.
Therefore, I would like need that function.
Regards
Yuya
--
Yuya Hamada (tekimen)
Hi everyone,
I'm Sepehr, the author of this proposal. I'm glad to see the interest in
grapheme_mask().
I have already developed a working prototype in C (based on ICU ubrk) along
with several PHPT test cases covering Unicode and emoji clusters. I believe
this addition will significantly improve how developers handle sensitive
data masking in modern PHP applications.
I have requested a Wiki account to start the formal RFC process and share
the implementation details.
Looking forward to your feedback.
Best regards,
Sepehr
در تاریخ جمعه ۱۹ ژوئن ۲۰۲۶، ۱۸:۱۷ youkidearitai youkidearitai@gmail.com
نوشت:
2026年6月19日(金) 19:54 سپهر محمودی sepehrphpr@gmail.com:
Hello everyone
Over the past few weeks I have been exploring a common pattern that
frequently appears in PHP applications: masking sensitive parts of
strings such as credit card numbers, email addresses, phone numbers,
and personal identifiers.In many real-world codebases, developers typically implement masking
using combinations of functions likesubstr(),strlen(),str_repeat(),
substr_replace(), or their multibyte equivalents. While these
approaches work, they often lead to repetitive, error‑prone, and
sometimes inefficient user‑land implementations. Handling edge
cases—especially when offsets are negative, lengths are omitted, or
when working with Unicode text—can make these snippets unnecessarily
complex.While thinking about this problem, I designed a function concept
called grapheme_mask(). The goal of this function is to provide a
clear, native, and Unicode‑safe way to mask sections of a string.The key idea is that the function operates on grapheme clusters,
rather than raw bytes or individual code points. This allows it to
correctly handle modern Unicode text, including composed characters
and emoji sequences, without breaking them apart.Conceptually, the function replaces a range of grapheme clusters with
a masking string.Example:
grapheme_mask("sepehr@example.com", "", 2, -12);
// result: se***@example.comExample with emoji sequences:
grapheme_mask("👨🏽👩👧👦 family", "*", 0, 1);
// result: * familyThe intention is not to replace existing string functions, but to
provide a dedicated and expressive helper for a task that developers
routinely implement themselves.If there is interest from the community, I would be happy to draft a
formal RFC describing the proposed behavior, edge cases, and potential
implementation details.I would greatly appreciate any feedback, thoughts, or suggestions.
Best regards,
Sepehr
Hi, Sepehr and Internals
Thank you for bringing up discussion.
Looks good to me.One more point for add that function.
The diacritical mark sometimes includes one code point and separated
code points.
For example, Umlaut(ä, a + ¨), Dakuten(が, か + ゛) and etc in the world.
These characters needs support for grapheme_mask function.
Therefore, I would like need that function.Regards
Yuya--
Yuya Hamada (tekimen)