Dear all,
I am sending this to introduce my new RFC: https://wiki.php.net/RFC/dont_trim_NUL
Quick summary:
Currently, PHP's trim functions strip the NUL byte (\0) by default, treating it alongside spaces, tabs, and newlines. This creates a highly surprising edge case.
Because \0 is semantically a control character or a vital part of a binary payload rather than a typographical whitespace character, casually using trim() to clean up trailing newlines can silently corrupt binary streams or cryptographic hashes by stripping legitimate NUL bytes. Whitespace characters are intended for typographical spacing and formatting (e.g., spaces, newlines, tabs).
Also, almost every mainstream programming languages except PHP doesn't trim NUL characters (python, go, rust, js, even 'is_space' function in glibc...) It sounds reasonable to expect the same here.
This RFC proposes removing \0 (ASCII 0) from the default character mask. I recognize this introduces a backward compatibility break, and therefore I would love to hear your thoughts, feedback, and any concerns regarding the BC impact before moving forward.
Cheers,
Weilin Du
Hi all,
I believe the RFC “Don't trim NUL bytes by default” is ready to move to the voting phase. I intend to open the voting period soon (typically 7 days).
RFC page: https://wiki.php.net/rfc/dont_trim_nul
This RFC proposes to remove \0 (NUL byte) from the default character mask of trim(), ltrim(), and rtrim(), to align with common expectations and avoid unintended trimming of legitimate NUL-containing strings. Please tell me if there are any final comments or concerns. Thanks.
Best regards,
Weilin Du
p.s. Not that sure if this email is going be be sent in the correct thread, so I would post the thread link here if it doesn't https://externals.io/message/130318
Hi all,
I believe the RFC “Don't trim NUL bytes by default” is ready to move to the voting phase. I intend to open the voting period soon (typically 7 days).
RFC page: https://wiki.php.net/rfc/dont_trim_nul
This RFC proposes to remove \0 (NUL byte) from the default character mask of
trim(),ltrim(), andrtrim(), to align with common expectations and avoid unintended trimming of legitimate NUL-containing strings. Please tell me if there are any final comments or concerns. Thanks.Best regards,
Weilin Dup.s. Not that sure if this email is going be be sent in the correct thread, so I would post the thread link here if it doesn't https://externals.io/message/130318
I agree that \0 is a control byte and not whitespace, so it probably
shouldn't be included in any of the trim functions. However, at this
stage in PHP's lifecycle I am not sure if we should fix it.
There hasn't been much discussion, so dear internals: are simply busy,
un-opinionated, or what?
Hi all,
I believe the RFC “Don't trim NUL bytes by default” is ready to move to the voting phase. I intend to open the voting period soon (typically 7 days).
RFC page: https://wiki.php.net/rfc/dont_trim_nul
This RFC proposes to remove \0 (NUL byte) from the default character mask of
trim(),ltrim(), andrtrim(), to align with common expectations and avoid unintended trimming of legitimate NUL-containing strings. Please tell me if there are any final comments or concerns. Thanks.Best regards,
Weilin Dup.s. Not that sure if this email is going be be sent in the correct thread, so I would post the thread link here if it doesn't https://externals.io/message/130318
I agree that \0 is a control byte and not whitespace, so it probably
shouldn't be included in any of the trim functions. However, at this
stage in PHP's lifecycle I am not sure if we should fix it.There hasn't been much discussion, so dear internals: are simply busy,
un-opinionated, or what?
No strong feeling on the matter, will probably Abstain. I don't think it's something that I've ever run into, since I tend to know very well if my strings are bytes or characters and use them appropriately.
--Larry Garfield
I agree that \0 is a control byte and not whitespace, so it probably
shouldn't be included in any of the trim functions. However, at this
stage in PHP's lifecycle I am not sure if we should fix it.There hasn't been much discussion, so dear internals: are simply busy,
un-opinionated, or what?
For what little it's worth, I can't imagine any practical situation where this change would be helpful. Using trim() or its variants on binary data is likely to result in that data being corrupted, and this will continue to be the case even if NUL is not trimmed. Changing the current behavior is likely to break userspace code which depends on the current behavior. If, for some reason, users find it useful to trim specific whitespace characters from binary data, they can do so by passing a $characters mask to the function to fit their needs, rather than changing the function for everyone.
-- Andrew F
On Tue, 24 Mar 2026 at 19:29, Levi Morrison levi.morrison@datadoghq.com
wrote:
There hasn't been much discussion, so dear internals: are simply busy,
un-opinionated, or what?
I'd vote against this proposal: \0 being considered one of the stripped
characters is now a downstream assumption, and this ends up being a BC
break with little to no advantages.
From a semantic perspective, \n, \t and \r are also "control
characters" in other contexts (not the C world).
Marco Pivetta
Hi,
On Tue, 24 Mar 2026 at 19:29, Levi Morrison levi.morrison@datadoghq.com
wrote:There hasn't been much discussion, so dear internals: are simply busy,
un-opinionated, or what?I'd vote against this proposal:
\0being considered one of the stripped
characters is now a downstream assumption, and this ends up being a BC
break with little to no advantages.From a semantic perspective,
\n,\tand\rare also "control
characters" in other contexts (not the C world).
Thanks for putting this into clear works: this is exactly my fear too.
Since the very first message of this thread arrived here, my thoughts were
along the lines this is outright dangerous to do: there is "decades" of
downstream assumption how this works and it's IMPOSSIBLE to properly vet
this, as this operates often on the (potentially untrusted) input level.
And, AFAICS, a perfectly valid workaround is possible by just providing as
custom 2nd arg.
IM(H)O this should never come to a vote, doing this shouldn't even be
considered.
sincerely,
- Markus
Hello.
Using trim() for binary data sounds like a mistake. There's nothing special
in whitespace or any other characters in binary data, so why use trim() for
it at all? If someone using trim for binary data, then this might be
deliberate choose. For example, trimming zero byte might be the sole cause.
That's why I disagree with "Secondly" RFC point.
Java's String.trim() treat characters with code points equals or less
than \u0020 as whitespace. So there's no "surprising case" at least for
java developers and that's why I disagree with "Thirdly" point.
However, I agree with "Firstly" point. But for semantic purists we
have mb_trim function.
Removing \0 from trim() makes code vulnerable to null byte injection attack
[1]. I have strong feeling that zero byte was added to trim() exactly by
this cause.
[1] https://owasp.org/www-community/attacks/Embedding_Null_Code
That seems a bit dangerous, since non-stripped \0 can potentially lead to
issues because when concatinated with other strings, which is quite common
for string operations can result in un-predictability and possibly even
security issues.
You make a good point about other languages, the concern is while there
that is the expecation and different solutions exist for
sanitizing/handling \0 they are well known and understood, in PHP the
assumption is that \0 is removed and the change of this assumption breaks a
lot of things. Trim functions already allow 2nd parameter with character
list, so it is already possible to exclude \0 from being trimmed.
Just my 2c.
Dear all,
I am sending this to introduce my new RFC:
https://wiki.php.net/RFC/dont_trim_NULQuick summary:
Currently, PHP's trim functions strip the NUL byte (\0) by default,
treating it alongside spaces, tabs, and newlines. This creates a highly
surprising edge case.Because \0 is semantically a control character or a vital part of a
binary payload rather than a typographical whitespace character, casually
usingtrim()to clean up trailing newlines can silently corrupt binary
streams or cryptographic hashes by stripping legitimate NUL bytes.
Whitespace characters are intended for typographical spacing and formatting
(e.g., spaces, newlines, tabs).Also, almost every mainstream programming languages except PHP doesn't
trim NUL characters (python, go, rust, js, even 'is_space' function in
glibc...) It sounds reasonable to expect the same here.
This RFC proposes removing \0 (ASCII 0) from the default character mask.
I recognize this introduces a backward compatibility break, and therefore I
would love to hear your thoughts, feedback, and any concerns regarding the
BC impact before moving forward.Cheers,
Weilin Du
--
Ilia Alshanetsky
Technologist, CTO, Entrepreneur
E: ilia@ilia.ws
T: @iliaa
B: http://ilia.ws
I am sending this to introduce my new RFC:
https://wiki.php.net/RFC/dont_trim_NUL
I'm not a fan of this for the reasons that several others have already
stated. While \0 may not be technically whitespace, you have 30 years of
scripts expecting it to be trimmed and undoing that potentially (though
perhaps not actually) opens up new security vulnerability for dubious
gain. I hope nobody is depending on this to remove all null bytes,
because of course it won't, but I fail to see how a text oriented function
(the notion of whitespace makes this not binary oriented) should be in
favor of preserving nulls under any circumstance. If anything functions
like this which are specifically text oriented should (perhaps, but for
performance and historical raisins probably not) be elevated to throw on
discovering null bytes anywhere in the string.
-Sara