Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:128069 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id 4D7631A00BC for ; Tue, 15 Jul 2025 21:01:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1752613152; bh=St3asej3WFkthnD0QD1yslcHoWrEw8pEDwLfBBEo14o=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Jrd1l6paW0KSruqjK4kh1v6h84Q3ptdBk2ci8L5liQ8DlKKwpYWAQ7oNBr8H1eT2r IpVT4GxF5Gh+JuGjlZ98kor8tXZa7MeXv3MLUY91yuYIYYZ9Wr56miqUMpboU2se5l 2a1u1RTyKohyK4p+rsnTQuZRmgyeFA89rj/s4JevkKWE+IEgQqOp6xs/qHQ68O8iM9 /3Ai3P8dm9TJeRZugy1Q7Sk/5ecEFSIMRd0XbxYl0CqqN5d7hmDCCt6Nz12gIWm/4E u+6CfTaOxuEIiINradHc0wH2B8GSEY0yGaxtGblizFdBL9dYv3trfSRh6q6n6iAAa8 JCdmKwAAsESSQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id B5B65180561 for ; Tue, 15 Jul 2025 20:59:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-ua1-f44.google.com (mail-ua1-f44.google.com [209.85.222.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 15 Jul 2025 20:59:08 +0000 (UTC) Received: by mail-ua1-f44.google.com with SMTP id a1e0cc1a2514c-880f92a63c7so2195512241.1 for ; Tue, 15 Jul 2025 14:00:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=devilix.net; s=google; t=1752613255; x=1753218055; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=eEgH9WcXF45tgqdDWeSMkeAdrUbgDNOs1F47z4c/utw=; b=j9qBNef+DVrZAt89p/H+/VLV7JQ33hPUszCNzXIQm5vg93jsHnty7lxqr7lFlhiyc+ aKonV+4ou4ycC4YtKUyPyOzixQI6z0/PJu0BCmbHrxdr+PiZyMaSBAZl3JVI0Um2JeNb 1Jl0ZkxicwBIoLr6ODOMbknEZQUT7YM6GbUpU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752613255; x=1753218055; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=eEgH9WcXF45tgqdDWeSMkeAdrUbgDNOs1F47z4c/utw=; b=IDDYBWXvjDOzKG0gknXaCGH/Y6xK5CXPJdr0Z1OJEfdDgexqKe/T09eNkYVVyhykS+ NLOdo2X4KBrdsXJL02FRxvc0obINR2S35S9l3i+vVbHKs6P/c1cxAZUhjP/XYUP+4LSz C8CYKjBzW355iXpgDweL6rpaPZ9bipvZEOrEroxVp76JQpwVfDC3gmk/mKcMPH22V6Ko c0CXS971RDA8g/DAmisbG6JErMni7YltFmL6jzoM5pUlhjvHA1Mb5/ot0BtCWVvMkaZp +/SyRk4eK24i+V1G3lIw4Q7+bYpiCQ7/17bAPD1S8Pqrz9z59x0OajXKcRCOdON1iLDg cBIQ== X-Gm-Message-State: AOJu0Ywn9oY+98cEEesaNI3P+pugei4MKf1ofIRMrmGmx5z6a1UBySBo C7GImVnJ+BVhHepNx3XSIVn7uv2CI9LhyolVBI3p8pfK5Q4BL2B7mws2tqCUYQgyPiRKRHq02FT nNzKl08klDcTW29f6IbAQgDzYA+5OfmQTFMr1ZQz/ X-Gm-Gg: ASbGncutEIfHBnXRu74pQh+srJUmYNhvdHGnZYmyR1983qLfEvLMkP++ghbM9qNSB3S ddCrhG4OvF0QXkErf98bOqPQETd+9dmfX32ugYv+TjWiMlT8DHi0LHrkdT9Rtxo943VRCouufjE /1HJvUw6wNHqaNIy86kTJIpXBRFmc8O3lv9/QNXhdfYtNLZT/4/CwWNdEBf7ZKRu1QMY2HALLTY Hkq7zW5N5rO3ik2zw== X-Google-Smtp-Source: AGHT+IFV1O4ZU+d4tvDZR43qgvIyhAPuC2jLUD6+WF/rGxqgJmqDzfMD95d5CwRLHHHpE3P/+zhvAGdQO6++EV10ohk= X-Received: by 2002:a05:6102:41a8:b0:4e5:93f5:e834 with SMTP id ada2fe7eead31-4f89993f6a3mr475479137.24.1752613254841; Tue, 15 Jul 2025 14:00:54 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <348856E5-6A4E-455A-81AE-882832170168@rwec.co.uk> <38e57171-fc2e-4d79-8614-0b1c5a2efc72@app.fastmail.com> In-Reply-To: Date: Wed, 16 Jul 2025 00:00:43 +0300 X-Gm-Features: Ac12FXzqpu04Yn96D9JMOjQhbUrCr4KZ__gwkdLWQ5hvJVjXv_L24kJXF3dg3QI Message-ID: Subject: Re: [PHP-DEV] [RFC][DISCUSSION] Add RFC 4648 compliant data encoding API To: ignace nyamagana butera Cc: PHP Internals List Content-Type: multipart/alternative; boundary="0000000000007bda6e0639fe1201" From: narf@devilix.net (Andrey Andreev) --0000000000007bda6e0639fe1201 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello Ignace, $input or $data instead of $decoded (could actually do the same instead of $encoded, but that one doesn't feel as wrong) >> > Usage of `$encoded` and `$decoded` as parameter names is done to emphasiz= e the *state of the data**,* rather than its format. This is helpful as it = avoids ambiguity ( `$data` is generic) and makes data flow more explicit. > > Yes, I know where you're coming from, but I don't see the ambiguity when calling a *_decode() function, while the name $decoded is not semantically correct. Admittedly, this is a bit of bikeshedding, but ... For something to be "decoded", it has to have been encoded first. There's no reason to think that this would be the case, and arguably more often than not it won't be. Similarly, there's no guarantee that the parameter isn't already encoded in some other format, or even the same format (i.e. would be performing double encoding). > Not strictly about naming, but it similarly feels wrong that UnableToDeco= deException extends EncodingException (which seems to have no purpose) > > > This follows the RFC guidelines regarding the introduction of new excepti= ons to the language, particularly within extensions. Each exception should = reference its own exception marker (in this proposal, `EncodingException`).= Additionally, we introduce a more specific exception to handle errors tha= t occur during the decoding of encoded data. > > Sorry, I've been out of the loop for quite awhile and may've missed something. Can you point me to the guideline in question? > On a semi-related note, I'm not sure if including the IMAP variant isn't = complicating things for no good reason (it is extra-niche, and we have imap= _binary/base64() already). > > > The `ext/imap` extension from which those functions are coming from [has = been unbundled from PHP](https://wiki.php.net/rfc/unbundle_imap_pspell_oci8= ) > > Fair enough. I do still believe it is too niche though. I chose this example because base64 and base64url also have arguably different desirable defaults for padding; almost all pad-stripping I've seen in the wild has been for the purposes of converting to base64url. > > > Base64 and Base64url vary on their alphabet and on the presence or absenc= e of the padding string. With the proposed API it would mean doing the foll= owing > > ```php > \Encoding::base64_encode('Hello world!'); //base64 standard encoding > \Encoding::base64_encode('Hello world!', variant: \Encoding\Base64::UrlSa= fe); //base64 URL Safe encoding > ``` > > Padding is by default controlled by the variant. Since UrlSafe does not n= eed padding no padding will be used. You should not even need to specify th= e presence or > absence of padding. Unless you want to do something really specific for y= our use case. In which case being explicit in what you want to achieve is a= lways a good design choice. > > The default values for the options are chosen to cover the most common us= e cases, so in many situations you won=E2=80=99t need to specify them at al= l=E2=80=94making the API easier to use than it might initially appear. > > Is it though? Sure it is easy for the single most common use case, but it creates other subtle problems and violates the Principle Of Least Astonishment: - To use base64url, one needs to write a line of code twice as long (just the enum name itself is longer than the function name) - The API encourages that the Variant parameter be the default judge of padding behavior, despite the function having a Padding behavior parameter. - Variant-dependent behavior is harder to both document and explain to user= s - RFC 4648 section 5 actually makes a big deal out of the base64 vs base64url naming, they are not the same thing, yet the proposed API tries to put them under a single "base64" function umbrella API design is hard. :) Also, the RFC doesn't specify whether DecodingMode::Strict would cause an error in case of missing padding? > > > Strict decoding behavior depends on the variant. For example, in the case= of Base64url, padding is considered optional. Therefore, under `DecodingMo= de::Strict`, the absence of `=3D` padding characters will not trigger an ex= ception, as this behavior is compliant with the relevant RFC. > > In contrast, for `Base64::Standard`, omitting the padding character *in s= trict mode *will result in an exception, since padding is mandatory where a= pplicable with such a variant. For clarity, I will revise the RFC to explic= itly state the behavior of each encoding variant during strict mode decodin= g. > > Yes, please! Padding in the default base64 variant often has security implications, that's why I asked. Cheers, Andrey. --0000000000007bda6e0639fe1201 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello Ignace,

$input or $data = instead of $decoded (could actually do the same instead of $encoded, but th= at one doesn't feel as wrong)
=
Usage of `$encoded<=
/font>` =
and `$decoded` as parameter names is done to e=
mphasize the state of the data=
, rather than its format. This is helpful as=
 it avoids ambiguity ( `<=
font color=3D"#080808">$data` is generic) and makes data flow more explicit.=

Yes,= I know where you're coming from, but I don't see the ambiguity whe= n calling a *_decode() function, while the name $decoded is not semanticall= y correct. Admittedly, this is a bit of bikeshedding, but ...
For= something to be "decoded", it has to have been encoded first. Th= ere's no reason to think that this would be the case, and arguably more= often than not it won't be.
Similarly, there's no guaran= tee that the parameter isn't already encoded in some other format, or e= ven the same format (i.e. would be performing double encoding).
= =C2=A0
<=
font color=3D"#080808">
Not st= rictly about naming, but it similarly feels wrong that UnableToDecodeExcept= ion extends EncodingException (which seems to have no purpose)

= This follows the RFC guidelines regarding the introduction of new exception= s to the language, particularly within extensions. Each exception should re= ference its own exception marker (in this proposal,
`EncodingException`). Add= itionally, we introduce a more specific exception to handle errors that oc= cur during the decoding of encoded data.

Sorry, I've been out of the loo= p for quite awhile and may've missed something. Can you point me to the= guideline in question?
=C2=A0
=
On a semi-related note, I'm not sure if= including the IMAP variant isn't complicating things for no good reaso= n (it is extra-niche, and we have imap_binary/base64() already).
The `ext/imap` extension from which those functions are coming from = [has been unbundled from PHP](h= ttps://wiki.php.net/rfc/unbundle_imap_pspell_oci8)

<= /div>
Fair enough. I do still believe it is too niche though.

I c= hose this example because base64 and base64url also have arguably different= desirable defaults for padding; almost all pad-stripping I've seen in = the wild has been for the purposes of converting to base64url.

= Base64 and Base64url vary on their alphabet and on the presence or
a= bsence of the padding string. With the proposed API= it would mean doing the following

```php
\Encoding::base64_encode('Hello world!'); //base64 standa= rd encoding
\Encoding::base64_encode('Hello world!', variant: = \Encoding\Base64::UrlSafe); //base64 URL Safe encoding
```
Padding is by default controlled by the = variant. Since UrlSafe does not need padding no padding will be used. You s= hould not even need to specify the presence or
absence of padding. Unles= s you want to do something really specific for your use case. In which case= being explicit in what you want to achieve is always a good design choice.=
The default value=
s for the options are chosen to cover the most common use cases, so in many=
 situations you won=E2=80=99t need to specify them at all=E2=80=94making th=
e API easier to use than it might initially appear.

Is it though? Sure it is easy for t= he single most common use case, but it creates other subtle problems and vi= olates the Principle Of Least Astonishment:
- To use base64url, o= ne needs to write a line of code twice as long (just the enum name itself i= s longer than the function name)
- The API encourages = that the Variant parameter be the default judge of padding behavior, despit= e the function having a Padding behavior parameter.
- Variant-dep= endent behavior is harder to both document and explain to users
-= RFC 4648 section 5 actually makes a big deal out of the base64 vs base64ur= l naming, they are not the same thing, yet the proposed API tries to put th= em under a single "base64" function umbrella

=
API design is hard. :)

Also, the RFC doesn't sp= ecify whether DecodingMode::Strict would cause an error in case of missing= padding?

Strict decoding behavior depends on the variant. For examp= le, in the case of Base64url, padding is considered optional. Therefore, un= der
`DecodingMode::Strict`= ,
the absence of `=3D` padding characters will not trigger an ex= ception, as this behavior is compliant with the relevant RFC.
<= pre>In contr= ast, for `Base64::Standard`, omitting the padding character in strict mode will result in an exception, sin= ce padding is mandatory where applicable with such a variant. For clarity, I will revise the RFC to explicitly= state the behavior of each encoding variant during strict mode decoding.
Yes, please! Padding in the = default base64 variant often has security implications, that's why I as= ked.

Cheers,
Andrey.
--0000000000007bda6e0639fe1201--