Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:128049 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id C21991A00BC for ; Tue, 15 Jul 2025 13:21:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1752585591; bh=bpnWlLNd3zPhTQglgG8yv+haIaUvARyIWzla/yY0YY8=; h=References:In-Reply-To:From:Date:Subject:To:From; b=WL2W60e0lxYsuqBetd1ZKzdGAlarhFFBOYWal3WPT6SPBREn+UlolsPBLsybUrLVy EhPO7+u62nihlXRPidcqjOu7Xj9tlqwJPP75lWS1ZZdwhK/yCTyhZPN9yQldK3EPJT 7tEKOPfO0J5f6aZkNdhV2fLBnE00eRvL6uOyem8E6A5+L0dq+wJI2inYFluo0P1gAe nY+9fF3bVR9xWJCCK2X7mFuJ+3ipUIX0M+UHdrSaL+NX5I6O/ex1mLJXsTku1GP6q4 zty3HNCUcimSIl3FvtB6PnVU+m9C8o7Jdqrd8p8pg7ctg6uiWvQxrOzZXaEpBbAwNd viaZfrnkkQd5w== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id B57AE180084 for ; Tue, 15 Jul 2025 13:19:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_40,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 15 Jul 2025 13:19:50 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-31393526d0dso3907773a91.0 for ; Tue, 15 Jul 2025 06:21:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1752585697; x=1753190497; darn=lists.php.net; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=55hUvxsYfUhbTf8bxSmWueidPQdszpnARi6J8qmAaZQ=; b=Rsn/1ESusYILzoXN+DjwiLtX6XaxyIikpTZAI3x6z6JXOrjcT8K71YeGo4w6qH0dko ftqV0cHRBrOax0T4ZOuJ8W8YBa1mq7B+iZFaF8mbCWKCZJiDp0ix0fP8IAw9YxrlKG11 /lExbFRXeP9K8UMyAbfvr5qLLS8y+Krv7Gg20O0WpUUUwhTEAJ8VasOs1/s1Y+x9xJ9G hSB/IujAw1721e9XSI/ruXBROhS41zd7WVhaIm5nXJMHPwQduTQ2jUo+EuQVjWQRgo9H SwxVDVE0UszdCj3ip+RBnMEkBL8ilbS6vysGe3AHUVOK8VDu8n/44I/8Fe1TwDNJ1CJL OEfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752585697; x=1753190497; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=55hUvxsYfUhbTf8bxSmWueidPQdszpnARi6J8qmAaZQ=; b=AhGVbgiRxbj5Ru+Em7v2A9ErIFgZr5820Jj/DEywoSlXG4hm77IlnpoVvOkJ3zkdgK EKz/FQwmjJcqtS/PvGHyojnLI4bdoPyqpJbwmmmhOKzuC2MSpmFcXvQodXTLfYmMZe0I QHRknXFeLlIYECQNPdafGW9dy1nTBiWskBPntWMcK6VQCUhn6YhAM2aeyhnUxxBq2dyI cal5BT5+GdC0TGEX3zyeied9eqojoEAJxEaO7V4X5aX2Ob7LvgpVsbKHxAkSBMYX9MVy rDa4hv5EitcgXoxDPAQkzgxDiIS5KGiRBcfZcLSBPGHrpJNw3gIYLTWkNNfKvBa76feB qGfw== X-Forwarded-Encrypted: i=1; AJvYcCVNXJhG6CtERAOR47tOj5/E3T+qpFo1g6ecBp4pP9MZ2wP5xnHFaKLvrJxuZ2SJrqAh5QMfgcq6nL0=@lists.php.net X-Gm-Message-State: AOJu0Yxc+mA7bqhF78ibwE5vCH2uZ2ncZIbr6WqZjRu6oOEJk78NdI/c Ri7zJPWx6xf42ieH3Qle08OqBQYNbYBgk7AEwD6CuX/GOdj3VLbgQH62I67teU6aBizcvje9FEi m/Wp3ldw61MGs4g78jxMqpCkdFosMKuYs/32f X-Gm-Gg: ASbGnctzVhSsjazK7g6tnaBN7kh4qKl0tLVI5+lD+3zpjH8tM0af8CYB2mFoHJfvw4l ADJFgZEzzGvPIzVS9Kp2Mv7l/tLai/Mqz+8sbpZB1t3fi6D+MkSTST+0J/DdpwbZ8Uahn9FIY37 mlFw2eXOrlBXZjkrwMGOIsIEK5vBAXNG9IHIrHK8i1GhaZcUxm+QrMKfXc63vwNjMyITR01Q/6m Q6+J3V4s9Lu8I93gA== X-Google-Smtp-Source: AGHT+IFqJR2InMmBuH+MsXLSnqK1fzu97oAhpSH0mW+6aKxr9oFBF6o9HeRcJYDHsoCTDbwRGBS7W2lg6kLHIXqhcVk= X-Received: by 2002:a17:90b:3a8a:b0:313:28e7:af14 with SMTP id 98e67ed59e1d1-31c4cd65a3bmr26295915a91.19.1752585697026; Tue, 15 Jul 2025 06:21:37 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <348856E5-6A4E-455A-81AE-882832170168@rwec.co.uk> <38e57171-fc2e-4d79-8614-0b1c5a2efc72@app.fastmail.com> In-Reply-To: Date: Tue, 15 Jul 2025 15:21:25 +0200 X-Gm-Features: Ac12FXxR1YmalHFYQkSusrEC5j4icqRtarToycjuqys24_5jiY6Fkv8BfVB_R7Y Message-ID: Subject: Re: [PHP-DEV] [RFC][DISCUSSION] Add RFC 4648 compliant data encoding API To: Andrey Andreev , PHP Internals List Content-Type: multipart/alternative; boundary="000000000000e8e8bd0639f7a706" From: nyamsprod@gmail.com (ignace nyamagana butera) --000000000000e8e8bd0639f7a706 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Jul 14, 2025 at 11:26=E2=80=AFPM Andrey Andreev = wrote: > Hi all, > > I have a few suggestions, starting with naming improvements: > - Forgiving instead of Lenient (align with > https://infra.spec.whatwg.org/#forgiving-base64) > - Shorten the option names; one example would be Variable/Constant instea= d > of Unprotected/ConstantTime, but I think most could be rethinked > - $input or $data instead of $decoded (could actually do the same instead > of $encoded, but that one doesn't feel as wrong) > - Not strictly about naming, but it similarly feels wrong that > UnableToDecodeException extends EncodingException (which seems to have no > purpose) > > However, I'm not a fan of how these simple functions have so many option > flags ... it feels forced, trying to accomodate too much at once. I'd > rather have discrete functions, like base64_*() and base64url_*() - I cho= se > this example because base64 and base64url also have arguably different > desirable defaults for padding; almost all pad-stripping I've seen in the > wild has been for the purposes of converting to base64url. > On a semi-related note, I'm not sure if including the IMAP variant isn't > complicating things for no good reason (it is extra-niche, and we have > imap_binary/base64() already). > > Also, the RFC doesn't specify whether DecodingMode::Strict would cause an > error in case of missing padding? > > That being said, I'm very glad to see this! > > Cheers, > Andrey. > Hi Andrey, Forgiving instead of Lenient (align with https://infra.spec.whatwg.org/#forgiving-base64) > I will adapt the text and use `Forgiving` instead Shorten the option names; one example would be Variable/Constant instead of Unprotected/ConstantTime, but I think most could be rethinked I will adapt the text and use `Variable/Constant` instead, thanks for the suggestions, $input or $data instead of $decoded (could actually do the same instead of $encoded, but that one doesn't feel as wrong) > Usage of `$encoded` and `$decoded` as parameter names is done to emphasize the *state of the data**,* rather than its format. This is helpful as it avoids ambiguity ( `$data` is generic) and makes data flow more explicit. Not strictly about naming, but it similarly feels wrong that UnableToDecodeException extends EncodingException (which seems to have no purpose) This follows the RFC guidelines regarding the introduction of new exceptions to the language, particularly within extensions. Each exception should reference its own exception marker (in this proposal, `EncodingException`). Additionally, we introduce a more specific exception to handle errors that occur during the decoding of encoded data. On a semi-related note, I'm not sure if including the IMAP variant isn't complicating things for no good reason (it is extra-niche, and we have imap_binary/base64() already). The `ext/imap` extension from which those functions are coming from [has been unbundled from PHP](https://wiki.php.net/rfc/unbundle_imap_pspell_oci8) I chose this example because base64 and base64url also have arguably different desirable defaults for padding; almost all pad-stripping I've seen in the wild has been for the purposes of converting to base64url. Base64 and Base64url vary on their alphabet and on the presence or absence of the padding string. With the proposed API it would mean doing the following ```php \Encoding::base64_encode('Hello world!'); //base64 standard encoding \Encoding::base64_encode('Hello world!', variant: \Encoding\Base64::UrlSafe); //base64 URL Safe encoding ``` Padding is by default controlled by the variant. Since UrlSafe does not need padding no padding will be used. You should not even need to specify the presence or absence of padding. Unless you want to do something really specific for your use case. In which case being explicit in what you want to achieve is always a good design choice. The default values for the options are chosen to cover the most common use cases, so in many situations you won=E2=80=99t need to specify them at all=E2=80=94making the API easier to use than it might initially appear. Also, the RFC doesn't specify whether DecodingMode::Strict would cause an error in case of missing padding? Strict decoding behavior depends on the variant. For example, in the case of Base64url, padding is considered optional. Therefore, under `DecodingMode::Strict`, the absence of `=3D` padding characters will not trigger an exception, as this behavior is compliant with the relevant RFC. In contrast, for `Base64::Standard`, omitting the padding character *in strict mode *will result in an exception, since padding is mandatory where applicable with such a variant. For clarity, I will revise the RFC to explicitly state the behavior of each encoding variant during strict mode decoding. Best regards, Ignace Nyamagana Butera --000000000000e8e8bd0639f7a706 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Mon, Jul 14, 2025 at 11:26=E2=80=AFPM = Andrey Andreev <narf@devilix.net= > wrote:
Hi all,

I have a few suggestions, starting with naming improve= ments:
- Forgiving instead of Lenient (align with h= ttps://infra.spec.whatwg.org/#forgiving-base64)
- Shorten the= option names; one example would be Variable/Constant instead of Unprotecte= d/ConstantTime, but I think most could be rethinked
- $input or $= data instead of $decoded (could actually do the same instead of $encoded, b= ut that one doesn't feel as wrong)
- Not strictly about namin= g, but it similarly feels wrong that=20 UnableToDecodeException extends EncodingException (which seems to have=20 no purpose)

However, I'm not a fan of how thes= e simple functions have so many option flags ... it feels forced, trying to= accomodate too much at once. I'd rather have discrete functions, like = base64_*() and base64url_*() - I chose this example because base64 and base= 64url also have arguably different desirable defaults for padding; almost a= ll pad-stripping I've seen in the wild has been for the purposes of con= verting to base64url.
On a semi-related note, I'm not sure if= including the IMAP variant isn't complicating things for no good reason (it is extra-niche, and we have=20 imap_binary/base64() already).

Also, the RFC doesn= 't specify whether DecodingMode::Strict would cause an error in case of= missing padding?

That being said, I'm very gl= ad to see this!

Cheers,
Andrey.

Hi Andrey,

<= font face=3D"arial, sans-serif" style=3D"">Forgivin= g instead of Lenient (align with https://infra= .spec.whatwg.org/#forgiving-base64)
<= /font>
I will adapt the text and use `Forg=
iving` instead

Shorten the option names; one example would be Variable/Constant instea= d of Unprotected/ConstantTime, but I think most could be rethinked

I will adapt the text and use
`Variable/Constant` instead, t= hanks for the suggestions,

$input or $data i= nstead of $decoded (could actually do the same instead of $encoded, but that = one doesn't feel as wrong)
Usage of `$encoded` and `<=
font color=3D"#080808">$decoded`=
 as parameter names is done to emphasize the state of =
the data, rather than its format.=
 This is helpful as it avoids ambiguity ( `$data` is generic) and makes data =
flow more explicit.

Not strictly about naming, but it similarly feels wrong that Unabl= eToDecodeException extends EncodingException (which seems to have no purpos= e)

This follows the RFC guidelines regarding the int= roduction of new exceptions to the language, particularly within extensions= . Each exception should reference its own exception marker (in this proposa= l,
`EncodingException`). Additionally, we introduce a more specific exceptio= n to handle errors that occur during the decoding of encoded data.

<= /font>
On a semi-related= note, I'm not sure if including the IMAP variant isn't complicatin= g things for no good reason (it is extra-niche, and we have imap_binary/bas= e64() already).

The
`ext/imap` extension from which th= ose functions are coming from [= has been unbundled from PHP](https://wiki.php.net/rfc/unbundle_imap_pspell_oci8)

I chose this example because base64 and base64url also ha= ve arguably different desirable defaults for padding; almost all pad-stripp= ing I've seen in the wild has been for the purposes of converting to ba= se64url.

Base64 and Base64url vary on their alphab= et and on the presence or
absence of the pad= ding string. With the proposed API it would mean doing the following
```php
\Encoding::base64_encode('H= ello world!'); //base64 standard encoding
\Encoding::base64_encode= ('Hello world!', variant: \Encoding\Base64::UrlSafe); //base64 URL = Safe encoding
```

Padding is by default controll= ed by the variant. Since UrlSafe does not need padd= ing no padding will be used. You should not even need to specify the presen= ce or
absence of padding. Unless you want to do something really specifi= c for your use case. In which case being explicit in what you want to achie= ve is always a good design choice.
The default values for the options are chosen t=
o cover the most common use cases, so in many situations you won=E2=80=99t =
need to specify them at all=E2=80=94making the API easier to use than it mi=
ght initially appear.
Also, the RFC doesn't specify whether DecodingMode::S= trict would cause an error in case of missing padding?
=
Str= ict decoding behavior depends on the variant. For example, in the case of B= ase64url, padding is considered optional. Therefore, under
`DecodingMode::St= rict`,
the absence of `=3D` padding characters will not trigger an exception, as this beha= vior is compliant with the relevant RFC.
In contrast, for <=
/font>`Base64::=
Standard`, omitting the padding character in strict mode will result in an exception, since paddin=
g is mandatory where applicable with such a variant. For clarity, I will revise the RFC to explicitly state th=
e behavior of each encoding variant during strict mode decoding.
Best regards,
Ignace Nyamagana Butera=
=C2=A0
--000000000000e8e8bd0639f7a706--