Hi internals,
I'd like to start the discussion for a new RFC about adding RFC 4648
compliant data encoding API
RFC proposal link: https://wiki.php.net/rfc/data_encoding_api
If passed, Tim Düsterhus has volunteered to do the implementation.
Thanks in advance for your remarks and comments.
Best regards,
Ignace Nyamagana Butera
Hi Ignace
I'd like to start the discussion for a new RFC about adding RFC 4648
compliant data encoding API
RFC proposal link: https://wiki.php.net/rfc/data_encoding_api
If passed, Tim Düsterhus has volunteered to do the implementation.Thanks in advance for your remarks and comments.
Best regards,
Ignace Nyamagana Butera
Thanks for the RFC!
Here my doleance about it:
- please make base58 part of the RFC - it's already widely used and having
it implemented in C would be great. See
https://github.com/php/php-src/issues/15195 - it'd be great to default to url-safe base64. The RFC-compliant variant is
a very common risk, it'd be great to be on the safe side by default - why do we need to decide between constant-time and unprotected? Can't we
always go for the constant-time behavior? If not, what about defaulting to
constant-time, again, safe by default? - about DecodingMode, shouldn't this be Lenient by default, following the
robustness principle? - (base85 looks great and would be nice to have also :) )
Cheers,
Nicolas
Thanks for the RFC!
Here my doleance about it:
- please make base58 part of the RFC - it's already widely used and having
it implemented in C would be great. See
https://github.com/php/php-src/issues/15195
I see that there's already a PECL extension for base58. I will see what I
can do because it was listed as a future scope for the moment.
- it'd be great to default to url-safe base64. The RFC-compliant variant is
a very common risk, it'd be great to be on the safe side by default
I went with the RFC recommendation to set up the default. In case of Base64
the URL Safe variant is not the default. While we support URL safe variants
there are plenty of applications which do not expect the URL Safe variant,
for instance, the data URLs do not use the URL Safe variant.
- why do we need to decide between constant-time and unprotected? Can't we
always go for the constant-time behavior? If not, what about defaulting to
constant-time, again, safe by default?
In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied to
every scenario hence why I went defensive on this option.
- about DecodingMode, shouldn't this be Lenient by default, following the
robustness principle?
I went with strict by default for security reasons. The Lenient behavior
described is for instance more restrictive than the current "lenient" mode
used by the current base64_decode function. This is due to the security
issues raised by the RFC.
Best regards,
Ignace
On Thu, Jun 19, 2025 at 1:50 PM Nicolas Grekas nicolas.grekas+php@gmail.com
wrote:
Hi Ignace
I'd like to start the discussion for a new RFC about adding RFC 4648
compliant data encoding API
RFC proposal link: https://wiki.php.net/rfc/data_encoding_api
If passed, Tim Düsterhus has volunteered to do the implementation.Thanks in advance for your remarks and comments.
Best regards,
Ignace Nyamagana ButeraThanks for the RFC!
Here my doleance about it:
- please make base58 part of the RFC - it's already widely used and having
it implemented in C would be great. See
https://github.com/php/php-src/issues/15195- it'd be great to default to url-safe base64. The RFC-compliant variant
is a very common risk, it'd be great to be on the safe side by default- why do we need to decide between constant-time and unprotected? Can't we
always go for the constant-time behavior? If not, what about defaulting to
constant-time, again, safe by default?- about DecodingMode, shouldn't this be Lenient by default, following the
robustness principle?- (base85 looks great and would be nice to have also :) )
Cheers,
Nicolas
Hi all,
I have updated the RFC (https://wiki.php.net/rfc/data_encoding_api) to
include base58 encoding and decoding functions to the proposal with
arguments in favor of the addition.
Best regards,
Ignace
On Fri, Jun 20, 2025 at 10:17 AM ignace nyamagana butera <
nyamsprod@gmail.com> wrote:
Thanks for the RFC!
Here my doleance about it:
- please make base58 part of the RFC - it's already widely used and having
it implemented in C would be great. See
https://github.com/php/php-src/issues/15195I see that there's already a PECL extension for base58. I will see what I
can do because it was listed as a future scope for the moment.
- it'd be great to default to url-safe base64. The RFC-compliant variant
is a very common risk, it'd be great to be on the safe side by defaultI went with the RFC recommendation to set up the default. In case of
Base64 the URL Safe variant is not the default. While we support URL safe
variants there are plenty of applications which do not expect the URL Safe
variant, for instance, the data URLs do not use the URL Safe variant.
- why do we need to decide between constant-time and unprotected? Can't we
always go for the constant-time behavior? If not, what about defaulting to
constant-time, again, safe by default?In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied to
every scenario hence why I went defensive on this option.
- about DecodingMode, shouldn't this be Lenient by default, following the
robustness principle?I went with strict by default for security reasons. The Lenient behavior
described is for instance more restrictive than the current "lenient" mode
used by the current base64_decode function. This is due to the security
issues raised by the RFC.Best regards,
IgnaceOn Thu, Jun 19, 2025 at 1:50 PM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:Hi Ignace
I'd like to start the discussion for a new RFC about adding RFC 4648
compliant data encoding API
RFC proposal link: https://wiki.php.net/rfc/data_encoding_api
If passed, Tim Düsterhus has volunteered to do the implementation.Thanks in advance for your remarks and comments.
Best regards,
Ignace Nyamagana ButeraThanks for the RFC!
Here my doleance about it:
- please make base58 part of the RFC - it's already widely used and
having it implemented in C would be great. See
https://github.com/php/php-src/issues/15195- it'd be great to default to url-safe base64. The RFC-compliant variant
is a very common risk, it'd be great to be on the safe side by default- why do we need to decide between constant-time and unprotected? Can't
we always go for the constant-time behavior? If not, what about defaulting
to constant-time, again, safe by default?- about DecodingMode, shouldn't this be Lenient by default, following the
robustness principle?- (base85 looks great and would be nice to have also :) )
Cheers,
Nicolas
- it'd be great to default to url-safe base64. The RFC-compliant
variant is a very common risk, it'd be great to be on the safe side by
defaultI went with the RFC recommendation to set up the default. In case of
Base64 the URL Safe variant is not the default. While we support URL
safe variants there are plenty of applications which do not expect the
URL Safe variant, for instance, the data URLs do not use the URL Safe
variant.
This should be included in the RFC, so it can be included in the future documentation.
- why do we need to decide between constant-time and unprotected? Can't
we always go for the constant-time behavior? If not, what about
defaulting to constant-time, again, safe by default?In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied
to every scenario hence why I went defensive on this option.
I don't follow. Every function listed allows a timing mode to be set, so I presume that means every function can use constant-time. The implementation is, well, this RFC. :-) So I don't see why we can't just force constant-time everywhere and be secure-by-default.
If there's a reason we cannot just blanket decide to use constant-time everywhere always, we need concrete examples of why that's a bad idea; and even then, I'd expect to be able to default to it.
For the long-names issue that Tim pointed out, perhaps drop "Variant" from the enum names? As they're namespaced, Base32::Ascii
seems fairly self-explanatory.
I am overall in favor of this RFC, modulo notes above.
--Larry Garfield
Hi
Am 2025-07-01 16:18, schrieb Larry Garfield:
I don't follow. Every function listed allows a timing mode to be set,
so I presume that means every function can use constant-time. The
implementation is, well, this RFC. :-) So I don't see why we can't
just force constant-time everywhere and be secure-by-default.
Please see the note in the “Implementation” section. I wanted Ignace and
the discussion to figure out the desired API from a “high level”
perspective first, before checking individually whether or not a
constant-time implementation is possible for each of the possible
combinations of options, since depending on the API that is agreed-on
certain combinations might not make it (allowing me to skip the effort
of finding out how to do it constant time).
If there's a reason we cannot just blanket decide to use constant-time
everywhere always, we need concrete examples of why that's a bad idea;
and even then, I'd expect to be able to default to it.
A constant-time implementation generally is (measurably) slower than
non-constant time implementation, but also see above.
For the long-names issue that Tim pointed out, perhaps drop "Variant"
from the enum names? As they're namespaced,Base32::Ascii
seems
fairly self-explanatory.
You probably meant s/Tim/Rowan/.
Best regards
Tim Düsterhus
For the long-names issue that Tim pointed out, perhaps drop "Variant"
from the enum names? As they're namespaced,Base32::Ascii
seems
fairly self-explanatory.You probably meant s/Tim/Rowan/.
Best regards
Tim Düsterhus
... I think that may be the second time I've confused you two. I have no idea why I keep confusing you and Rowan. Sorry again. :-/.
--Larry Garfield
Hi Larry,
I have updated the wording of the RFC to give the reason for the default
selected variant for each function family. I have also dropped the Variant
suffix from the algorithm variant enum.
Hope this answers your remarks
On Tue, Jul 1, 2025 at 4:20 PM Larry Garfield larry@garfieldtech.com
wrote:
- it'd be great to default to url-safe base64. The RFC-compliant
variant is a very common risk, it'd be great to be on the safe side by
defaultI went with the RFC recommendation to set up the default. In case of
Base64 the URL Safe variant is not the default. While we support URL
safe variants there are plenty of applications which do not expect the
URL Safe variant, for instance, the data URLs do not use the URL Safe
variant.This should be included in the RFC, so it can be included in the future
documentation.
- why do we need to decide between constant-time and unprotected? Can't
we always go for the constant-time behavior? If not, what about
defaulting to constant-time, again, safe by default?In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied
to every scenario hence why I went defensive on this option.I don't follow. Every function listed allows a timing mode to be set, so
I presume that means every function can use constant-time. The
implementation is, well, this RFC. :-) So I don't see why we can't just
force constant-time everywhere and be secure-by-default.If there's a reason we cannot just blanket decide to use constant-time
everywhere always, we need concrete examples of why that's a bad idea; and
even then, I'd expect to be able to default to it.For the long-names issue that Tim pointed out, perhaps drop "Variant" from
the enum names? As they're namespaced,Base32::Ascii
seems fairly
self-explanatory.I am overall in favor of this RFC, modulo notes above.
--Larry Garfield
RFC proposal link: https://wiki.php.net/rfc/data_encoding_api
Thanks for working on this, I have often had to implement base64url and been frustrated it's not just a built-in option.
I like the look of the new API. Using namespaced enums is currently quite verbose, but that's something we could try to fix at at the language level - e.g. Swift has some nice inference rules, so you can write the equivalent of base64_encode($string, ::UrlSafe).
One thing I think the RFC should mention is the future of the existing base64_encode/decode functions. Am I right in thinking that with one parameter, the new namespaced versions will be identical to the old? If so, we have the option to make the existing functions aliases for the new. Or, we can leave them as-is, but plan to deprecate them. What we probably don't want is to indefinitely have two versions with such similar names but different signatures.
Rowan Tommins
[IMSoP]
On Tue, Jul 1, 2025 at 1:09 PM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
On 19 June 2025 12:01:04 BST, ignace nyamagana butera nyamsprod@gmail.com
wrote:RFC proposal link: https://wiki.php.net/rfc/data_encoding_api
Thanks for working on this, I have often had to implement base64url and
been frustrated it's not just a built-in option.I like the look of the new API. Using namespaced enums is currently quite
verbose, but that's something we could try to fix at at the language level
- e.g. Swift has some nice inference rules, so you can write the equivalent
of base64_encode($string, ::UrlSafe).One thing I think the RFC should mention is the future of the existing
base64_encode/decode functions. Am I right in thinking that with one
parameter, the new namespaced versions will be identical to the old? If so,
we have the option to make the existing functions aliases for the new. Or,
we can leave them as-is, but plan to deprecate them. What we probably don't
want is to indefinitely have two versions with such similar names but
different signatures.Rowan Tommins
[IMSoP]
Hi Rowan,
Currently the RFC does not address deprecating the current functions for
the following reasons:
- The current base64_decode function operates in a lenient mode by default,
accepting characters outside the valid Base64 alphabet and ignoring
the padding character wherever it is in the string.
base64_decode('dG9===0bw??', false); // returns 'toto'
However, the newly proposed lenient mode aligns with the stricter
recommendations of RFC 4648, Section 12
https://www.rfc-editor.org/rfc/rfc4648.html#section-12, which advise
rejecting inputs containing invalid characters due to potential security
concerns. Consequently, the behavior differs significantly: while the
current implementation tolerates non-alphabet characters and accepts
padding characters in positions other than at the end of the encoded
string, the proposed version enforces strict validation to enhance security
and compliance with the standard.
Encoding\base64_decode('dG90bw??', DecodingMode::Lenient); // will throw
because of RFC 4648 security recommendation character outside of the base64
alphabet
Encoding\base64_decode('dG9===0bw', DecodingMode::Lenient); // will throw
because of RFC 4648 security recommendation padding character not located
at the end of the string
Encoding\base64_decode('dG90bw', DecodingMode::Lenient); // returns 'toto'
- hex2bin always operates in a lenient mode—it does not support strict
validation. It could be replaced by the new base16_decode function when
configured with appropriate options. However, it's important to note that
the default behavior differs: unlike hex2bin, base16_decode defaults to
strict mode, rejecting invalid input by design, consistent with all newly
proposed decoding functions.
For those reasons, I believe a clear deprecation and removal strategy for
the current functions warrants its own dedicated RFC, as certain features
cannot be easily migrated to the new API.
On 1 July 2025 22:27:14 BST, ignace nyamagana butera nyamsprod@gmail.com wrote:]
- The current base64_decode function operates in a lenient mode by default,
accepting characters outside the valid Base64 alphabet and ignoring
the padding character wherever it is in the string.base64_decode('dG9===0bw??', false); // returns 'toto'
However, the newly proposed lenient mode aligns with the stricter
recommendations of RFC 4648, Section 12
https://www.rfc-editor.org/rfc/rfc4648.html#section-12 which advise
rejecting inputs containing invalid characters due to potential security
concerns.
That makes total sense, and I support both the choice of default and standard-compliant implementation. However, it feels like it will be hard to document why people should stop using the long-established functions, and exactly what the difference is. Putting off the problem until a later RFC is just inviting confusion until then.
Perhaps we should include an option in the new API to emulate the old behaviour, named as "legacy" or "unsafe" and immediately soft-deprecated with a note in the manual, similar to the MT_RAND_PHP
mode in the Randomizer API https://www.php.net/manual/en/random-engine-mt19937.construct.php
Then the legacy base64_decode function could have a note like:
This function always uses Mode::LegacyUnsafe, and its use is discouraged; consider using the newer Encoding\base64_decode with Mode::Strict or Mode::Lenient instead.
And the main documentation for Encoding\base64_decode could explain all three modes side by side.
What do you think?
Rowan Tommins
[IMSoP]
Perhaps we should include an option in the new API to emulate the old
behaviour, named as "legacy" or "unsafe" and immediately soft-deprecated
with a note in the manual, similar to theMT_RAND_PHP
mode in the
Randomizer API <
https://www.php.net/manual/en/random-engine-mt19937.construct.php>
If I follow your reasoning, this would imply introducing a new case,
DecodingMode::Unsafe
, in the DecodingMode
enum. This mode would
replicate the current default behavior of base64_decode
, but only
within Encoding\base64_decode
.
echo base64_decode('dG9===0bw??'); // returns 'toto'
//would be portable to the new API using the following code
echo Encoding\base64_decode('dG9===0bw??', decodingMode:
Encoding\DecodingMode::Unsafe); // returns 'toto'
I would therefore propose that, for all other decoding functions, any
attempt to use DecodingMode::Unsafe
must result in an
UnableToDecodeException
being thrown.
Additionally, we should define the timeline for the eventual
deprecation of the current base64_encode()
, base64_decode()
,
hex2bin()
and bin2hex()
functions since the new option will be
automatically soft deprecated and removed at the same time as the
current API.
Should this deprecation take place during the PHP 8 cycle, with
removal targeted for PHP 9? Or would it be more appropriate to defer
the deprecation to the PHP 9 cycle, aiming for removal in PHP 10?
Alternatively, should a second vote be held to determine the
preferred deprecation timeline?
My intuition is that phasing out those functions during PHP 9 and
removing them in PHP 10 could help minimize disruption. However, I
don’t currently have data to support that assumption.
For completeness, the issue is less severe with hex2bin
where a
transparent migration path is possible
echo hex2bin('48656c6c6f2c20576f726c6421');
echo Encoding\base16_decode('48656c6c6f2c20576f726c6421',
decodingMode: Encoding\DecodingMode::Lenient);
// both codes will output: Hello, World
// whereas
echo Encoding\base16_decode('48656c6c6f2c20576f726c6421'); // will throw
Perhaps we should include an option in the new API to emulate the old behaviour, named as "legacy" or "unsafe" and immediately soft-deprecated with a note in the manual, similar to the
MT_RAND_PHP
mode in the Randomizer API https://www.php.net/manual/en/random-engine-mt19937.construct.phpIf I follow your reasoning, this would imply introducing a new case,
DecodingMode::Unsafe
, in theDecodingMode
enum. This mode would
replicate the current default behavior ofbase64_decode
, but only
withinEncoding\base64_decode
.echo base64_decode('dG9===0bw??'); // returns 'toto' //would be portable to the new API using the following code echo Encoding\base64_decode('dG9===0bw??', decodingMode: Encoding\DecodingMode::Unsafe); // returns 'toto'
I would therefore propose that, for all other decoding functions, any
attempt to useDecodingMode::Unsafe
must result in an
UnableToDecodeException
being thrown.
I don't think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.
function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false
That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn't need to deal with "disallowed" modes on the new functions.
Should this deprecation take place during the PHP 8 cycle, with removal
targeted for PHP 9? Or would it be more appropriate to defer the
deprecation to the PHP 9 cycle, aiming for removal in PHP 10?
Alternatively, should a second vote be held to determine the
preferred deprecation timeline?
Since we don't know when PHP 9 will be yet (Grrr...), I'd lean toward a secondary vote or punting it to the usual mass-deprecation RFC that often happens. (Side note: This is why we need a regular schedule for major releases.)
--Larry Garfield
I don't think it needs to be added to the enum, necessarily. Just make it
a nullable argument to base64_decode.function base64_decode(string $string, bool $strict = false, ?DecodingMode
= null): string|falseThat would leave the default behavior of the function intact, but also
allows switching it over to either of the new modes (which would then just
defer to the new implementations). And we wouldn't need to deal with
"disallowed" modes on the new functions.Hi Larry,
The goal is not to change the signature of the existing base64_encode
function, but rather to preserve its current non-strict behavior within the
new API. This is intended to ensure a smoother transition from the existing
API to the proposed one. Therefore, we shouldn’t alter or retrofit the
existing function. Instead, the focus should be on providing a clear
migration path for users, which is why the addition of a
DecodingMode::Unsafe
case is being proposed.
If I were to follow your suggestion, I would have proposed an alternative
signature like this:
base64_encode(string $string, bool|DecodingMode $strict = false);
Where:
-
Encoding\DecodingMode::Strict
is identical to$strict = true
-
Encoding\DecodingMode::Unsafe
would be identical to$strict = false
and the current function would then become an alias of
Encoding\base64_decode(string $encoded, decodingMode:
Encoding\DecodingMode::Unsafe);
// or
Encoding\base64_decode(string $encoded, decodingMode:
Encoding\DecodingMode::Strict);
The caveat is that, in the new API, errors will throw exceptions instead of
emitting an E_WARNING
and returning false
. Once the current API is
eventually removed, the Encoding\DecodingMode::Unsafe
mode would also be
deprecated and removed accordingly. And documentation would rightly
highlight the danger of using such settings.
Keep in mind that this is in response to Rowan comment and depending on
feedback I may not add the Encoding\DecodingMode::Unsafe
to the proposal.
I know I do not represent the majority but I tend to always use strict mode
when decoding base64 encoded data and when I forget PHPStan reminds me to
do so.
Best regards,
Ignace
I don't think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.
function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false
That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn't need to deal with "disallowed" modes on the new functions.
Hi Larry,
The goal is not to change the signature of the existing
base64_encode
function, but rather to preserve its current non-strict behavior within
the new API. This is intended to ensure a smoother transition from the
existing API to the proposed one. Therefore, we shouldn’t alter or
retrofit the existing function. Instead, the focus should be on
providing a clear migration path for users, which is why the addition
of aDecodingMode::Unsafe
case is being proposed.If I were to follow your suggestion, I would have proposed an
alternative signature like this:base64_encode(string $string, bool|DecodingMode $strict = false);
That would work, too. My point is just trying to avoid DecodingMode::Unsafe as a thing that has to then be checked for and rejected by the new functions. That feels like clunkiness that we should be able to avoid. So with that signature, false would still use the existing "unsafe" mode; there's no enum case for "old unsafe logic", just for the new-correct modes.
--Larry Garfield
Hi all,
I have updated the RFC to include a section outlining the migration path
https://wiki.php.net/rfc/data_encoding_api#migration_path. Since the
proposed migration strategy for base64_decode()
may be considered
controversial, I plan to submit it as an optional vote—allowing
contributors to decide specifically on that aspect. If the optional vote
fails, I want to ensure that the rest of the proposal is not rejected
solely due to disagreements over the migration approach for this function.
Best regards,
Ignace
On Wed, Jul 2, 2025 at 9:57 PM Larry Garfield larry@garfieldtech.com
wrote:
I don't think it needs to be added to the enum, necessarily. Just make
it a nullable argument to base64_decode.function base64_decode(string $string, bool $strict = false,
?DecodingMode = null): string|falseThat would leave the default behavior of the function intact, but also
allows switching it over to either of the new modes (which would then just
defer to the new implementations). And we wouldn't need to deal with
"disallowed" modes on the new functions.Hi Larry,
The goal is not to change the signature of the existing
base64_encode
function, but rather to preserve its current non-strict behavior within
the new API. This is intended to ensure a smoother transition from the
existing API to the proposed one. Therefore, we shouldn’t alter or
retrofit the existing function. Instead, the focus should be on
providing a clear migration path for users, which is why the addition
of aDecodingMode::Unsafe
case is being proposed.If I were to follow your suggestion, I would have proposed an
alternative signature like this:base64_encode(string $string, bool|DecodingMode $strict = false);
That would work, too. My point is just trying to avoid
DecodingMode::Unsafe as a thing that has to then be checked for and
rejected by the new functions. That feels like clunkiness that we should
be able to avoid. So with that signature, false would still use the
existing "unsafe" mode; there's no enum case for "old unsafe logic", just
for the new-correct modes.--Larry Garfield