[RFC][DISCUSSION] Add RFC 4648 compliant data encoding API

4 months ago by ignace nyamagana butera — view source — reply

unread

Hi internals,

I'd like to start the discussion for a new RFC about adding RFC 4648
compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api
If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

4 months ago by Nicolas Grekas — view source — reply

unread

Hi Ignace

I'd like to start the discussion for a new RFC about adding RFC 4648

compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api
If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

Thanks for the RFC!

Here my doleance about it:

please make base58 part of the RFC - it's already widely used and having
it implemented in C would be great. See
https://github.com/php/php-src/issues/15195
it'd be great to default to url-safe base64. The RFC-compliant variant is
a very common risk, it'd be great to be on the safe side by default
why do we need to decide between constant-time and unprotected? Can't we
always go for the constant-time behavior? If not, what about defaulting to
constant-time, again, safe by default?
about DecodingMode, shouldn't this be Lenient by default, following the
robustness principle?
(base85 looks great and would be nice to have also :) )

Cheers,
Nicolas

4 months ago by ignace nyamagana butera — view source — reply

unread

Thanks for the RFC!

Here my doleance about it:

please make base58 part of the RFC - it's already widely used and having
it implemented in C would be great. See
https://github.com/php/php-src/issues/15195

I see that there's already a PECL extension for base58. I will see what I
can do because it was listed as a future scope for the moment.

it'd be great to default to url-safe base64. The RFC-compliant variant is
a very common risk, it'd be great to be on the safe side by default

I went with the RFC recommendation to set up the default. In case of Base64
the URL Safe variant is not the default. While we support URL safe variants
there are plenty of applications which do not expect the URL Safe variant,
for instance, the data URLs do not use the URL Safe variant.

why do we need to decide between constant-time and unprotected? Can't we
always go for the constant-time behavior? If not, what about defaulting to
constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied to
every scenario hence why I went defensive on this option.

about DecodingMode, shouldn't this be Lenient by default, following the
robustness principle?

I went with strict by default for security reasons. The Lenient behavior
described is for instance more restrictive than the current "lenient" mode
used by the current base64_decode function. This is due to the security
issues raised by the RFC.

Best regards,
Ignace

On Thu, Jun 19, 2025 at 1:50 PM Nicolas Grekas nicolas.grekas+php@gmail.com
wrote:

Hi Ignace

I'd like to start the discussion for a new RFC about adding RFC 4648

compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api
If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

Thanks for the RFC!

Here my doleance about it:

please make base58 part of the RFC - it's already widely used and having
it implemented in C would be great. See
https://github.com/php/php-src/issues/15195

it'd be great to default to url-safe base64. The RFC-compliant variant
is a very common risk, it'd be great to be on the safe side by default

why do we need to decide between constant-time and unprotected? Can't we
always go for the constant-time behavior? If not, what about defaulting to
constant-time, again, safe by default?

about DecodingMode, shouldn't this be Lenient by default, following the
robustness principle?

(base85 looks great and would be nice to have also :) )

Cheers,
Nicolas

4 months ago by ignace nyamagana butera — view source — reply

unread

Hi all,

I have updated the RFC (https://wiki.php.net/rfc/data_encoding_api) to
include base58 encoding and decoding functions to the proposal with
arguments in favor of the addition.

Best regards,

Ignace

On Fri, Jun 20, 2025 at 10:17 AM ignace nyamagana butera <
nyamsprod@gmail.com> wrote:

Thanks for the RFC!

Here my doleance about it:

please make base58 part of the RFC - it's already widely used and having
it implemented in C would be great. See
https://github.com/php/php-src/issues/15195

I see that there's already a PECL extension for base58. I will see what I
can do because it was listed as a future scope for the moment.

it'd be great to default to url-safe base64. The RFC-compliant variant
is a very common risk, it'd be great to be on the safe side by default

I went with the RFC recommendation to set up the default. In case of
Base64 the URL Safe variant is not the default. While we support URL safe
variants there are plenty of applications which do not expect the URL Safe
variant, for instance, the data URLs do not use the URL Safe variant.

why do we need to decide between constant-time and unprotected? Can't we
always go for the constant-time behavior? If not, what about defaulting to
constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied to
every scenario hence why I went defensive on this option.

about DecodingMode, shouldn't this be Lenient by default, following the
robustness principle?

I went with strict by default for security reasons. The Lenient behavior
described is for instance more restrictive than the current "lenient" mode
used by the current base64_decode function. This is due to the security
issues raised by the RFC.

Best regards,
Ignace

On Thu, Jun 19, 2025 at 1:50 PM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Hi Ignace

I'd like to start the discussion for a new RFC about adding RFC 4648

compliant data encoding API

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api
If passed, Tim Düsterhus has volunteered to do the implementation.

Thanks in advance for your remarks and comments.

Best regards,
Ignace Nyamagana Butera

Thanks for the RFC!

Here my doleance about it:

please make base58 part of the RFC - it's already widely used and
having it implemented in C would be great. See
https://github.com/php/php-src/issues/15195

it'd be great to default to url-safe base64. The RFC-compliant variant
is a very common risk, it'd be great to be on the safe side by default

why do we need to decide between constant-time and unprotected? Can't
we always go for the constant-time behavior? If not, what about defaulting
to constant-time, again, safe by default?

about DecodingMode, shouldn't this be Lenient by default, following the
robustness principle?

(base85 looks great and would be nice to have also :) )

Cheers,
Nicolas

4 months ago by Larry Garfield — view source — reply

unread

it'd be great to default to url-safe base64. The RFC-compliant
variant is a very common risk, it'd be great to be on the safe side by
default

I went with the RFC recommendation to set up the default. In case of
Base64 the URL Safe variant is not the default. While we support URL
safe variants there are plenty of applications which do not expect the
URL Safe variant, for instance, the data URLs do not use the URL Safe
variant.

This should be included in the RFC, so it can be included in the future documentation.

why do we need to decide between constant-time and unprotected? Can't
we always go for the constant-time behavior? If not, what about
defaulting to constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied
to every scenario hence why I went defensive on this option.

I don't follow. Every function listed allows a timing mode to be set, so I presume that means every function can use constant-time. The implementation is, well, this RFC. :-) So I don't see why we can't just force constant-time everywhere and be secure-by-default.

If there's a reason we cannot just blanket decide to use constant-time everywhere always, we need concrete examples of why that's a bad idea; and even then, I'd expect to be able to default to it.

For the long-names issue that Tim pointed out, perhaps drop "Variant" from the enum names? As they're namespaced, Base32::Ascii seems fairly self-explanatory.

I am overall in favor of this RFC, modulo notes above.

--Larry Garfield

4 months ago by tim@bastelstu.be — view source — reply

unread

Am 2025-07-01 16:18, schrieb Larry Garfield:

I don't follow. Every function listed allows a timing mode to be set,
so I presume that means every function can use constant-time. The
implementation is, well, this RFC. :-) So I don't see why we can't
just force constant-time everywhere and be secure-by-default.

Please see the note in the “Implementation” section. I wanted Ignace and
the discussion to figure out the desired API from a “high level”
perspective first, before checking individually whether or not a
constant-time implementation is possible for each of the possible
combinations of options, since depending on the API that is agreed-on
certain combinations might not make it (allowing me to skip the effort
of finding out how to do it constant time).

If there's a reason we cannot just blanket decide to use constant-time
everywhere always, we need concrete examples of why that's a bad idea;
and even then, I'd expect to be able to default to it.

A constant-time implementation generally is (measurably) slower than
non-constant time implementation, but also see above.

For the long-names issue that Tim pointed out, perhaps drop "Variant"
from the enum names? As they're namespaced, Base32::Ascii seems
fairly self-explanatory.

You probably meant s/Tim/Rowan/.

Best regards
Tim Düsterhus

4 months ago by Larry Garfield — view source — reply

unread

For the long-names issue that Tim pointed out, perhaps drop "Variant"
from the enum names? As they're namespaced, Base32::Ascii seems
fairly self-explanatory.

You probably meant s/Tim/Rowan/.

Best regards
Tim Düsterhus

... I think that may be the second time I've confused you two. I have no idea why I keep confusing you and Rowan. Sorry again. :-/.

--Larry Garfield

4 months ago by ignace nyamagana butera — view source — reply

unread

Hi Larry,

I have updated the wording of the RFC to give the reason for the default
selected variant for each function family. I have also dropped the Variant
suffix from the algorithm variant enum.

Hope this answers your remarks

On Tue, Jul 1, 2025 at 4:20 PM Larry Garfield larry@garfieldtech.com
wrote:

it'd be great to default to url-safe base64. The RFC-compliant
variant is a very common risk, it'd be great to be on the safe side by
default

I went with the RFC recommendation to set up the default. In case of
Base64 the URL Safe variant is not the default. While we support URL
safe variants there are plenty of applications which do not expect the
URL Safe variant, for instance, the data URLs do not use the URL Safe
variant.

This should be included in the RFC, so it can be included in the future
documentation.

why do we need to decide between constant-time and unprotected? Can't
we always go for the constant-time behavior? If not, what about
defaulting to constant-time, again, safe by default?

In an ideal world I would use the constant-time behavior everytime, But
this will depend largely on the implementation and if it can be applied
to every scenario hence why I went defensive on this option.

I don't follow. Every function listed allows a timing mode to be set, so
I presume that means every function can use constant-time. The
implementation is, well, this RFC. :-) So I don't see why we can't just
force constant-time everywhere and be secure-by-default.

If there's a reason we cannot just blanket decide to use constant-time
everywhere always, we need concrete examples of why that's a bad idea; and
even then, I'd expect to be able to default to it.

For the long-names issue that Tim pointed out, perhaps drop "Variant" from
the enum names? As they're namespaced, Base32::Ascii seems fairly
self-explanatory.

I am overall in favor of this RFC, modulo notes above.

--Larry Garfield

4 months ago by Rowan Tommins [IMSoP] — view source — reply

unread

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

Thanks for working on this, I have often had to implement base64url and been frustrated it's not just a built-in option.

I like the look of the new API. Using namespaced enums is currently quite verbose, but that's something we could try to fix at at the language level - e.g. Swift has some nice inference rules, so you can write the equivalent of base64_encode($string, ::UrlSafe).

One thing I think the RFC should mention is the future of the existing base64_encode/decode functions. Am I right in thinking that with one parameter, the new namespaced versions will be identical to the old? If so, we have the option to make the existing functions aliases for the new. Or, we can leave them as-is, but plan to deprecate them. What we probably don't want is to indefinitely have two versions with such similar names but different signatures.

Rowan Tommins
[IMSoP]

4 months ago by ignace nyamagana butera — view source — reply

unread

On Tue, Jul 1, 2025 at 1:09 PM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:

On 19 June 2025 12:01:04 BST, ignace nyamagana butera nyamsprod@gmail.com
wrote:

RFC proposal link: https://wiki.php.net/rfc/data_encoding_api

Thanks for working on this, I have often had to implement base64url and
been frustrated it's not just a built-in option.

I like the look of the new API. Using namespaced enums is currently quite
verbose, but that's something we could try to fix at at the language level

e.g. Swift has some nice inference rules, so you can write the equivalent
of base64_encode($string, ::UrlSafe).

One thing I think the RFC should mention is the future of the existing
base64_encode/decode functions. Am I right in thinking that with one
parameter, the new namespaced versions will be identical to the old? If so,
we have the option to make the existing functions aliases for the new. Or,
we can leave them as-is, but plan to deprecate them. What we probably don't
want is to indefinitely have two versions with such similar names but
different signatures.

Rowan Tommins
[IMSoP]

Hi Rowan,

Currently the RFC does not address deprecating the current functions for
the following reasons:

The current base64_decode function operates in a lenient mode by default,
accepting characters outside the valid Base64 alphabet and ignoring
the padding character wherever it is in the string.

base64_decode('dG9===0bw??', false); // returns 'toto'

However, the newly proposed lenient mode aligns with the stricter
recommendations of RFC 4648, Section 12
https://www.rfc-editor.org/rfc/rfc4648.html#section-12, which advise
rejecting inputs containing invalid characters due to potential security
concerns. Consequently, the behavior differs significantly: while the
current implementation tolerates non-alphabet characters and accepts
padding characters in positions other than at the end of the encoded
string, the proposed version enforces strict validation to enhance security
and compliance with the standard.

Encoding\base64_decode('dG90bw??', DecodingMode::Lenient); // will throw
because of RFC 4648 security recommendation character outside of the base64
alphabet
Encoding\base64_decode('dG9===0bw', DecodingMode::Lenient); // will throw
because of RFC 4648 security recommendation padding character not located
at the end of the string
Encoding\base64_decode('dG90bw', DecodingMode::Lenient); // returns 'toto'

hex2bin always operates in a lenient mode—it does not support strict
validation. It could be replaced by the new base16_decode function when
configured with appropriate options. However, it's important to note that
the default behavior differs: unlike hex2bin, base16_decode defaults to
strict mode, rejecting invalid input by design, consistent with all newly
proposed decoding functions.

For those reasons, I believe a clear deprecation and removal strategy for
the current functions warrants its own dedicated RFC, as certain features
cannot be easily migrated to the new API.

4 months ago by Rowan Tommins [IMSoP] — view source — reply

unread

On 1 July 2025 22:27:14 BST, ignace nyamagana butera nyamsprod@gmail.com wrote:]

The current base64_decode function operates in a lenient mode by default,
accepting characters outside the valid Base64 alphabet and ignoring
the padding character wherever it is in the string.

base64_decode('dG9===0bw??', false); // returns 'toto'

However, the newly proposed lenient mode aligns with the stricter
recommendations of RFC 4648, Section 12
https://www.rfc-editor.org/rfc/rfc4648.html#section-12 which advise
rejecting inputs containing invalid characters due to potential security
concerns.

That makes total sense, and I support both the choice of default and standard-compliant implementation. However, it feels like it will be hard to document why people should stop using the long-established functions, and exactly what the difference is. Putting off the problem until a later RFC is just inviting confusion until then.

Perhaps we should include an option in the new API to emulate the old behaviour, named as "legacy" or "unsafe" and immediately soft-deprecated with a note in the manual, similar to the MT_RAND_PHP mode in the Randomizer API https://www.php.net/manual/en/random-engine-mt19937.construct.php

Then the legacy base64_decode function could have a note like:

This function always uses Mode::LegacyUnsafe, and its use is discouraged; consider using the newer Encoding\base64_decode with Mode::Strict or Mode::Lenient instead.

And the main documentation for Encoding\base64_decode could explain all three modes side by side.

What do you think?
Rowan Tommins
[IMSoP]

4 months ago by ignace nyamagana butera — view source — reply

unread

Perhaps we should include an option in the new API to emulate the old
behaviour, named as "legacy" or "unsafe" and immediately soft-deprecated
with a note in the manual, similar to the MT_RAND_PHP mode in the
Randomizer API <
https://www.php.net/manual/en/random-engine-mt19937.construct.php>;

If I follow your reasoning, this would imply introducing a new case,
DecodingMode::Unsafe, in the DecodingMode enum. This mode would
replicate the current default behavior of base64_decode, but only
within Encoding\base64_decode.

echo base64_decode('dG9===0bw??'); // returns 'toto'
//would be portable to the new API using the following code
echo Encoding\base64_decode('dG9===0bw??', decodingMode:
Encoding\DecodingMode::Unsafe); // returns 'toto'

I would therefore propose that, for all other decoding functions, any
attempt to use DecodingMode::Unsafe must result in an
UnableToDecodeException being thrown.

Additionally, we should define the timeline for the eventual
deprecation of the current base64_encode(), base64_decode(),
hex2bin() and bin2hex() functions since the new option will be
automatically soft deprecated and removed at the same time as the
current API.

Should this deprecation take place during the PHP 8 cycle, with
removal targeted for PHP 9? Or would it be more appropriate to defer
the deprecation to the PHP 9 cycle, aiming for removal in PHP 10?
Alternatively, should a second vote be held to determine the
preferred deprecation timeline?

My intuition is that phasing out those functions during PHP 9 and
removing them in PHP 10 could help minimize disruption. However, I
don’t currently have data to support that assumption.

For completeness, the issue is less severe with hex2bin where a
transparent migration path is possible

echo hex2bin('48656c6c6f2c20576f726c6421');
echo Encoding\base16_decode('48656c6c6f2c20576f726c6421',
decodingMode: Encoding\DecodingMode::Lenient);
// both codes will output: Hello, World
// whereas
echo Encoding\base16_decode('48656c6c6f2c20576f726c6421'); // will throw

4 months ago by Larry Garfield — view source — reply

unread

Perhaps we should include an option in the new API to emulate the old behaviour, named as "legacy" or "unsafe" and immediately soft-deprecated with a note in the manual, similar to the MT_RAND_PHP mode in the Randomizer API https://www.php.net/manual/en/random-engine-mt19937.construct.php

If I follow your reasoning, this would imply introducing a new case,
DecodingMode::Unsafe, in the DecodingMode enum. This mode would
replicate the current default behavior of base64_decode, but only
within Encoding\base64_decode.
echo base64_decode('dG9===0bw??'); // returns 'toto'
//would be portable to the new API using the following code
echo Encoding\base64_decode('dG9===0bw??', decodingMode: 
Encoding\DecodingMode::Unsafe); // returns 'toto'
I would therefore propose that, for all other decoding functions, any
attempt to use DecodingMode::Unsafe must result in an
UnableToDecodeException being thrown.

I don't think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false

That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn't need to deal with "disallowed" modes on the new functions.

Should this deprecation take place during the PHP 8 cycle, with removal
targeted for PHP 9? Or would it be more appropriate to defer the
deprecation to the PHP 9 cycle, aiming for removal in PHP 10?
Alternatively, should a second vote be held to determine the
preferred deprecation timeline?

Since we don't know when PHP 9 will be yet (Grrr...), I'd lean toward a secondary vote or punting it to the usual mass-deprecation RFC that often happens. (Side note: This is why we need a regular schedule for major releases.)

--Larry Garfield

4 months ago by ignace nyamagana butera — view source — reply

unread

I don't think it needs to be added to the enum, necessarily. Just make it
a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false, ?DecodingMode
= null): string|false

That would leave the default behavior of the function intact, but also
allows switching it over to either of the new modes (which would then just
defer to the new implementations). And we wouldn't need to deal with
"disallowed" modes on the new functions.

Hi Larry,

The goal is not to change the signature of the existing base64_encode
function, but rather to preserve its current non-strict behavior within the
new API. This is intended to ensure a smoother transition from the existing
API to the proposed one. Therefore, we shouldn’t alter or retrofit the
existing function. Instead, the focus should be on providing a clear
migration path for users, which is why the addition of a
DecodingMode::Unsafe case is being proposed.

If I were to follow your suggestion, I would have proposed an alternative
signature like this:

base64_encode(string $string, bool|DecodingMode $strict = false);

Where:

Encoding\DecodingMode::Strict is identical to $strict = true
Encoding\DecodingMode::Unsafe would be identical to $strict = false

and the current function would then become an alias of

Encoding\base64_decode(string $encoded, decodingMode:
Encoding\DecodingMode::Unsafe);
// or
Encoding\base64_decode(string $encoded, decodingMode:
Encoding\DecodingMode::Strict);

The caveat is that, in the new API, errors will throw exceptions instead of
emitting an E_WARNING and returning false. Once the current API is
eventually removed, the Encoding\DecodingMode::Unsafe mode would also be
deprecated and removed accordingly. And documentation would rightly
highlight the danger of using such settings.

Keep in mind that this is in response to Rowan comment and depending on
feedback I may not add the Encoding\DecodingMode::Unsafe to the proposal.
I know I do not represent the majority but I tend to always use strict mode
when decoding base64 encoded data and when I forget PHPStan reminds me to
do so.

Best regards,
Ignace

4 months ago by Larry Garfield — view source — reply

unread

I don't think it needs to be added to the enum, necessarily. Just make it a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false, ?DecodingMode = null): string|false

That would leave the default behavior of the function intact, but also allows switching it over to either of the new modes (which would then just defer to the new implementations). And we wouldn't need to deal with "disallowed" modes on the new functions.

Hi Larry,

The goal is not to change the signature of the existing base64_encode
function, but rather to preserve its current non-strict behavior within
the new API. This is intended to ensure a smoother transition from the
existing API to the proposed one. Therefore, we shouldn’t alter or
retrofit the existing function. Instead, the focus should be on
providing a clear migration path for users, which is why the addition
of a DecodingMode::Unsafe case is being proposed.

If I were to follow your suggestion, I would have proposed an
alternative signature like this:
base64_encode(string $string, bool|DecodingMode $strict = false);

That would work, too. My point is just trying to avoid DecodingMode::Unsafe as a thing that has to then be checked for and rejected by the new functions. That feels like clunkiness that we should be able to avoid. So with that signature, false would still use the existing "unsafe" mode; there's no enum case for "old unsafe logic", just for the new-correct modes.

--Larry Garfield

4 months ago by ignace nyamagana butera — view source — reply

unread

Hi all,

I have updated the RFC to include a section outlining the migration path
https://wiki.php.net/rfc/data_encoding_api#migration_path. Since the
proposed migration strategy for base64_decode() may be considered
controversial, I plan to submit it as an optional vote—allowing
contributors to decide specifically on that aspect. If the optional vote
fails, I want to ensure that the rest of the proposal is not rejected
solely due to disagreements over the migration approach for this function.

Best regards,
Ignace

On Wed, Jul 2, 2025 at 9:57 PM Larry Garfield larry@garfieldtech.com
wrote:

I don't think it needs to be added to the enum, necessarily. Just make
it a nullable argument to base64_decode.

function base64_decode(string $string, bool $strict = false,
?DecodingMode = null): string|false

That would leave the default behavior of the function intact, but also
allows switching it over to either of the new modes (which would then just
defer to the new implementations). And we wouldn't need to deal with
"disallowed" modes on the new functions.

Hi Larry,

The goal is not to change the signature of the existing base64_encode
function, but rather to preserve its current non-strict behavior within
the new API. This is intended to ensure a smoother transition from the
existing API to the proposed one. Therefore, we shouldn’t alter or
retrofit the existing function. Instead, the focus should be on
providing a clear migration path for users, which is why the addition
of a DecodingMode::Unsafe case is being proposed.

If I were to follow your suggestion, I would have proposed an
alternative signature like this:
base64_encode(string $string, bool|DecodingMode $strict = false);
That would work, too. My point is just trying to avoid
DecodingMode::Unsafe as a thing that has to then be checked for and
rejected by the new functions. That feels like clunkiness that we should
be able to avoid. So with that signature, false would still use the
existing "unsafe" mode; there's no enum case for "old unsafe logic", just
for the new-correct modes.

--Larry Garfield

3 months ago by Andrey Andreev — view source — reply

unread

Hi all,

I have a few suggestions, starting with naming improvements:

Forgiving instead of Lenient (align with
https://infra.spec.whatwg.org/#forgiving-base64)
Shorten the option names; one example would be Variable/Constant instead
of Unprotected/ConstantTime, but I think most could be rethinked
$input or $data instead of $decoded (could actually do the same instead
of $encoded, but that one doesn't feel as wrong)
Not strictly about naming, but it similarly feels wrong that
UnableToDecodeException extends EncodingException (which seems to have no
purpose)

However, I'm not a fan of how these simple functions have so many option
flags ... it feels forced, trying to accomodate too much at once. I'd
rather have discrete functions, like base64_() and base64url_() - I chose
this example because base64 and base64url also have arguably different
desirable defaults for padding; almost all pad-stripping I've seen in the
wild has been for the purposes of converting to base64url.
On a semi-related note, I'm not sure if including the IMAP variant isn't
complicating things for no good reason (it is extra-niche, and we have
imap_binary/base64() already).

Also, the RFC doesn't specify whether DecodingMode::Strict would cause an
error in case of missing padding?

That being said, I'm very glad to see this!

Cheers,
Andrey.

3 months ago by ignace nyamagana butera — view source — reply

unread

Hi all,

I have a few suggestions, starting with naming improvements:

Forgiving instead of Lenient (align with
https://infra.spec.whatwg.org/#forgiving-base64)

Shorten the option names; one example would be Variable/Constant instead
of Unprotected/ConstantTime, but I think most could be rethinked

$input or $data instead of $decoded (could actually do the same instead
of $encoded, but that one doesn't feel as wrong)

Not strictly about naming, but it similarly feels wrong that
UnableToDecodeException extends EncodingException (which seems to have no
purpose)

However, I'm not a fan of how these simple functions have so many option
flags ... it feels forced, trying to accomodate too much at once. I'd
rather have discrete functions, like base64_() and base64url_() - I chose
this example because base64 and base64url also have arguably different
desirable defaults for padding; almost all pad-stripping I've seen in the
wild has been for the purposes of converting to base64url.
On a semi-related note, I'm not sure if including the IMAP variant isn't
complicating things for no good reason (it is extra-niche, and we have
imap_binary/base64() already).

Also, the RFC doesn't specify whether DecodingMode::Strict would cause an
error in case of missing padding?

That being said, I'm very glad to see this!

Cheers,
Andrey.

Hi Andrey,

Forgiving instead of Lenient (align with
https://infra.spec.whatwg.org/#forgiving-base64)

I will adapt the text and use Forgiving instead

Shorten the option names; one example would be Variable/Constant
instead of Unprotected/ConstantTime, but I think most could be
rethinked

I will adapt the text and use Variable/Constant instead, thanks for
the suggestions,

$input or $data instead of $decoded (could actually do the same
instead of $encoded, but that one doesn't feel as wrong)

Usage of $encoded and $decoded as parameter names is done to
emphasize the state of the data*,* rather than its format. This is
helpful as it avoids ambiguity ( $data is generic) and makes data
flow more explicit.

Not strictly about naming, but it similarly feels wrong that
UnableToDecodeException extends EncodingException (which seems to have
no purpose)

This follows the RFC guidelines regarding the introduction of new
exceptions to the language, particularly within extensions. Each
exception should reference its own exception marker (in this proposal,
EncodingException). Additionally, we introduce a more specific
exception to handle errors that occur during the decoding of encoded
data.

On a semi-related note, I'm not sure if including the IMAP variant
isn't complicating things for no good reason (it is extra-niche, and
we have imap_binary/base64() already).

The ext/imap extension from which those functions are coming from
has been unbundled from
PHP

I chose this example because base64 and base64url also have arguably
different desirable defaults for padding; almost all pad-stripping
I've seen in the wild has been for the purposes of converting to
base64url.

Base64 and Base64url vary on their alphabet and on the presence or
absence of the padding string. With the proposed API it would mean
doing the following

\Encoding::base64_encode('Hello world!'); //base64 standard encoding
\Encoding::base64_encode('Hello world!', variant:
\Encoding\Base64::UrlSafe); //base64 URL Safe encoding

Padding is by default controlled by the variant. Since UrlSafe does
not need padding no padding will be used. You should not even need to
specify the presence or
absence of padding. Unless you want to do something really specific
for your use case. In which case being explicit in what you want to
achieve is always a good design choice.

The default values for the options are chosen to cover the most common
use cases, so in many situations you won’t need to specify them at
all—making the API easier to use than it might initially appear.

Also, the RFC doesn't specify whether DecodingMode::Strict would cause
an error in case of missing padding?

Strict decoding behavior depends on the variant. For example, in the
case of Base64url, padding is considered optional. Therefore, under
DecodingMode::Strict, the absence of = padding characters will not
trigger an exception, as this behavior is compliant with the relevant
RFC.

In contrast, for Base64::Standard, omitting the padding character
*in strict mode *will result in an exception, since padding is
mandatory where applicable with such a variant. For clarity, I will
revise the RFC to explicitly state the behavior of each encoding
variant during strict mode decoding.

Best regards,

Ignace Nyamagana Butera

3 months ago by Andrey Andreev — view source — reply

unread

Hello Ignace,

$input or $data instead of $decoded (could actually do the same
instead of $encoded, but that one doesn't feel as wrong)

Usage of $encoded and $decoded as parameter names is done to emphasize the state of the data*,* rather than its format. This is helpful as it avoids ambiguity ( $data is generic) and makes data flow more explicit.

Yes, I know where you're coming from, but I don't see the ambiguity when
calling a *_decode() function, while the name $decoded is not semantically
correct. Admittedly, this is a bit of bikeshedding, but ...
For something to be "decoded", it has to have been encoded first. There's
no reason to think that this would be the case, and arguably more often
than not it won't be.
Similarly, there's no guarantee that the parameter isn't already encoded in
some other format, or even the same format (i.e. would be performing double
encoding).

Not strictly about naming, but it similarly feels wrong that UnableToDecodeException extends EncodingException (which seems to have no purpose)

This follows the RFC guidelines regarding the introduction of new exceptions to the language, particularly within extensions. Each exception should reference its own exception marker (in this proposal, EncodingException). Additionally, we introduce a more specific exception to handle errors that occur during the decoding of encoded data.

Sorry, I've been out of the loop for quite awhile and may've missed
something. Can you point me to the guideline in question?

On a semi-related note, I'm not sure if including the IMAP variant isn't complicating things for no good reason (it is extra-niche, and we have imap_binary/base64() already).

The ext/imap extension from which those functions are coming from has been unbundled from PHP

Fair enough. I do still believe it is too niche though.

Base64 and Base64url vary on their alphabet and on the presence or absence of the padding string. With the proposed API it would mean doing the following
\Encoding::base64_encode('Hello world!'); //base64 standard encoding
\Encoding::base64_encode('Hello world!', variant: \Encoding\Base64::UrlSafe); //base64 URL Safe encoding
Padding is by default controlled by the variant. Since UrlSafe does not need padding no padding will be used. You should not even need to specify the presence or
absence of padding. Unless you want to do something really specific for your use case. In which case being explicit in what you want to achieve is always a good design choice.

The default values for the options are chosen to cover the most common use cases, so in many situations you won’t need to specify them at all—making the API easier to use than it might initially appear.

Is it though? Sure it is easy for the single most common use case, but it
creates other subtle problems and violates the Principle Of Least
Astonishment:

To use base64url, one needs to write a line of code twice as long (just
the enum name itself is longer than the function name)
The API encourages that the Variant parameter be the default judge of
padding behavior, despite the function having a Padding behavior parameter.
Variant-dependent behavior is harder to both document and explain to users
RFC 4648 section 5 actually makes a big deal out of the base64 vs
base64url naming, they are not the same thing, yet the proposed API tries
to put them under a single "base64" function umbrella

API design is hard. :)

Also, the RFC doesn't specify whether DecodingMode::Strict would cause
an error in case of missing padding?

Strict decoding behavior depends on the variant. For example, in the case of Base64url, padding is considered optional. Therefore, under DecodingMode::Strict, the absence of = padding characters will not trigger an exception, as this behavior is compliant with the relevant RFC.

In contrast, for Base64::Standard, omitting the padding character *in strict mode *will result in an exception, since padding is mandatory where applicable with such a variant. For clarity, I will revise the RFC to explicitly state the behavior of each encoding variant during strict mode decoding.

Yes, please! Padding in the default base64 variant often has security
implications, that's why I asked.

Cheers,
Andrey.