Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:126907
X-Original-To: internals@lists.php.net
Delivered-To: internals@lists.php.net
Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5])
	by qa.php.net (Postfix) with ESMTPS id 547CE1A00BC
	for <internals@lists.php.net>; Sat, 22 Mar 2025 15:20:22 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail;
	t=1742656671; bh=8OA5gp4VMRuMbTBRc7l31NbIbqv2eOq7Xjz9GVSuZQc=;
	h=Date:Subject:To:References:From:In-Reply-To:From;
	b=ElrmJj3xhBM9EQFzYMsYAOym790exlvNVwWo3dNVy0ExJ0KHRdgqG9rO2LrDMxGHN
	 Yny3lOxwNPvaNGmFgLMrnyTZk2R2euT1JH3yElwd/yCFei0brQxI9BD4uS2Srf2V8s
	 8RpFhXHUOjioCc7p+QyXckmr+lyy7Ccfvg00jgoeqk7wMw/vXfTjj2hPFaGZ5NtyMp
	 GgS0nRSkaaRbLrnCTph50e5NRnHxihcHviK8SnddrDRlXI+pxbPmlbH/2+pnZk1i4i
	 1gkSdmvj0uqgVcwlAbsBlwb79cc6tsczTtPiKjOwLcbvwCO7UnTB8DMG1U55yn4ZWY
	 KVeFr310cqOJA==
Received: from php-smtp4.php.net (localhost [127.0.0.1])
	by php-smtp4.php.net (Postfix) with ESMTP id BAE78180053
	for <internals@lists.php.net>; Sat, 22 Mar 2025 15:17:50 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net
X-Spam-Level: 
X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,RCVD_IN_DNSWL_LOW,
	RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS
	autolearn=no autolearn_force=no version=4.0.0
X-Spam-Virus: No
X-Envelope-From: <imsop.php@rwec.co.uk>
Received: from fout-a5-smtp.messagingengine.com (fout-a5-smtp.messagingengine.com [103.168.172.148])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by php-smtp4.php.net (Postfix) with ESMTPS
	for <internals@lists.php.net>; Sat, 22 Mar 2025 15:17:50 +0000 (UTC)
Received: from phl-compute-10.internal (phl-compute-10.phl.internal [10.202.2.50])
	by mailfout.phl.internal (Postfix) with ESMTP id DDC6D13837C6
	for <internals@lists.php.net>; Sat, 22 Mar 2025 11:20:19 -0400 (EDT)
Received: from phl-mailfrontend-01 ([10.202.2.162])
  by phl-compute-10.internal (MEProxy); Sat, 22 Mar 2025 11:20:19 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rwec.co.uk; h=cc
	:content-transfer-encoding:content-type:content-type:date:date
	:from:from:in-reply-to:in-reply-to:message-id:mime-version
	:references:reply-to:subject:subject:to:to; s=fm1; t=1742656819;
	 x=1742743219; bh=F6OMaTC/e4iY330YxkmAqNGzqRBOiveTAx3JYLaRAHM=; b=
	stW/hqKq6KZjl+rVfIPNOg4Mp6JkErvYCy6FItKHoPEOUAQ/A3dX3qKZlINabmI6
	EBASHBo0J9k9TbQGMaGTpevusq+a3tnyg/smAwWkqSFgn5PeYYXaVOqI+dpb4PE6
	O9+f7FFMT3SOjEKf7+r5hdZ6CjG9FdQL7DvOFT8xYlR4fXIfrkL4PcACRB3US5e3
	GNWYSifm/v2hnBQ3bNx/kzTCzwsWKe02vRTgciC6FFuRpC6HF6gPtFifsOEVCfR4
	Dv/es5JlSrUVN9cinM0wJBEHrK4g/u0q1gAYwQnBnh2D85I7sSD/fIpkLfSYY4yH
	5OxA3DcDyKqKBQQh1scI8A==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
	messagingengine.com; h=cc:content-transfer-encoding:content-type
	:content-type:date:date:feedback-id:feedback-id:from:from
	:in-reply-to:in-reply-to:message-id:mime-version:references
	:reply-to:subject:subject:to:to:x-me-proxy:x-me-sender
	:x-me-sender:x-sasl-enc; s=fm1; t=1742656819; x=1742743219; bh=F
	6OMaTC/e4iY330YxkmAqNGzqRBOiveTAx3JYLaRAHM=; b=vsEPPSUag0tO2QRvN
	vbf1j1nWPc/AnwUHFloiGNSoAhxmiKqrrFiCwGFyPRR0nuH1tIOaJ5eSRarppPqi
	QsSKUB9/u1q5261DpLVfPa51mrv1siSfi7ya33siETfwJYFgD68vtrVSdK8xPHTd
	I6is6KhVr6tFgGBW5bLXns7vtNueOlyzVyUIksIFGpXXWod6tqzgJVT++12k0DEP
	KGWv7wcU8fQj7aejWxw5OM1kHHkiYDJwo6/+eW6//COh5e87cTXDl0aEB1Im11mZ
	h8QBT3NbUoDSfh+MjxqRV9UuQWZmSKDddULf7clTJror9wgLulm9P0R5dNd9PJ1x
	kAFOg==
X-ME-Sender: <xms:M9XeZwC0sdpcZ3rVUnHWY3a0L0lDeBpqoDYhumGmg-owvMDks9LWRg>
    <xme:M9XeZyjAV_2Z8HxqQhdZ2YdCpZD2FIWbbgM0_s5_wxG2y3tvKZ42i6IzesP8yH0cH
    z9NFUpInBNC2inHY88>
X-ME-Received: <xmr:M9XeZzkZQYEizSc4dPbNv1TYGbTrNH844CfDDQnXMGRUCV3HEkZrycBF6tRMdI-Ee4vrGKGQLGzH-VbceNuwSg16VM-mTHlzK3t6X7JlZl2P8yV9hvjUTysJ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgdduheegfeduucetufdoteggodetrf
    dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv
    pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpefkff
    ggfgfuvfhfhfgjtgfgsehtkeertddtvdejnecuhfhrohhmpedftfhofigrnhcuvfhomhhm
    ihhnshculgfkoffuohfrngdfuceoihhmshhophdrphhhphesrhifvggtrdgtohdruhhkqe
    enucggtffrrghtthgvrhhnpeeltdetjeetvdehteffgefgleeviefgveehjeelleehgeeg
    teekheejteeiheeuhfenucffohhmrghinhepphhhphdrnhgvthenucevlhhushhtvghruf
    hiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehimhhsohhprdhphhhpsehrfigv
    tgdrtghordhukhdpnhgspghrtghpthhtohepuddpmhhouggvpehsmhhtphhouhhtpdhrtg
    hpthhtohepihhnthgvrhhnrghlsheslhhishhtshdrphhhphdrnhgvth
X-ME-Proxy: <xmx:M9XeZ2xf0HhLOlEUz2zdQYHgEC2M-_fhIfHnxkODZZSQVgAOvEmArQ>
    <xmx:M9XeZ1RtR4IXk8kIp5r3aiB3WmOttirt-JW-U5C3qypvRZ58fIB2_A>
    <xmx:M9XeZxah-cbQ6RF-wKjBLxhNZGnlj21ioFQQ7_Zc1miStMzWMsYFtQ>
    <xmx:M9XeZ-Tr6AXkLN39QYGekBKhSIGu9v5DGvI2F9KMAq5R3krZHXQRGw>
    <xmx:M9XeZ0JsE_gw0iNbZ1xieyOosQTLV5V37_3AI3EK5FIZma_9FGXNItEo>
Feedback-ID: id5114917:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA for
 <internals@lists.php.net>; Sat, 22 Mar 2025 11:20:19 -0400 (EDT)
Message-ID: <36e83b76-6bab-42a3-9c5d-07a35d43851f@rwec.co.uk>
Date: Sat, 22 Mar 2025 15:20:15 +0000
Precedence: bulk
list-help: <mailto:internals+help@lists.php.net
list-unsubscribe: <mailto:internals+unsubscribe@lists.php.net>
list-post: <mailto:internals@lists.php.net>
List-Id: internals.lists.php.net
x-ms-reactions: disallow
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PHP-DEV] Potential RFC: mb_rawurlencode() ?
To: internals@lists.php.net
References: <C47E5955-E75E-49A7-88D3-6957669057F0@pmjones.io>
 <CAEPPVa1p9tiTZLs69Mjo05KTyLi2RwLJ+Z4SSWKEwCZdR9RnOw@mail.gmail.com>
 <CAEPPVa38tgc=K5Hgcf+-d7JNfi+v97fjFQpwA0J7cOMCz1FxNw@mail.gmail.com>
 <EEC27D2D-BDBE-41F4-95F1-45EBD0B328BC@pmjones.io>
 <c0b86fc1b3a5eb1d144f80c931febeaa@bastelstu.be>
Content-Language: en-GB
In-Reply-To: <c0b86fc1b3a5eb1d144f80c931febeaa@bastelstu.be>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
From: imsop.php@rwec.co.uk ("Rowan Tommins [IMSoP]")

On 21/03/2025 11:17, Tim Düsterhus wrote:
>
> I am not sure if that signature makes sense and if the proposed 
> functionality fits into mbstring for that reason. IRIs are defined as 
> UTF-8, any other encoding results in invalid output / results that are 
> not interoperable.


This confirms a nagging feeling I had when I first saw the thread: the 
name "mb_rawurlencode" implies "do the same things as rawurlencode, but 
for multi-byte strings", but that's not what is being proposed.


Notably, a similar feature is actually slated for removal; to quote 
https://www.php.net/manual/en/migration82.deprecated.php#migration82.deprecated.mbstring

 > Usage of the QPrint, Base64, Uuencode, and HTML-ENTITIES 'text 
encodings' is deprecated for all MBString functions. Unlike all the 
other text encodings supported by MBString, these do not encode a 
sequence of Unicode codepoints, but rather a sequence of raw bytes. It 
is not clear what the correct return values for most MBString functions 
should be when one of these non-encodings is specified.

The same applies here: if you write mb_rawurlencode($my_string, 
'SHIFT-JIS'), does that mean convert what you can to ASCII, and percent 
encode the rest for a URI; or does it mean convert to UTF-8, and percent 
encode as necessary for an IRI? If the input contains sequences which 
are not valid SHIFT-JIS, are those bytes treated as unencodable 
(producing errors or substitution characters), or are they directly 
percent encoded?


> The correct solution to me is to build a proper thought-through API as 
> part of the proposed new Uri namespace and not adding new standalone 
> functions without a clear vision.


I completely agree.

For instance, the IRI standard does include an algorithm for converting 
a non-Unicode IRI representation to a URI - but it requires a Unicode 
Normalization step, which is a complex algorithm not included in 
ext/standard or ext/mbstring, only ext/intl. However, a function in the 
URI namespace that only handled the UTF-8 input case might still be useful.


> Along those lines, I think there might need to be two additional changes/additions to help with encoding for RFC 3987 and WHATWG-URL component values:
>
> - `http_build_query()` would need PHP_QUERY_3987 and PHP_QUERY_WHATWG flags and corresponding logic (or entirely new functions); and
> - `parse_str()` would need a corresponding `mb_parse_str()`.


I haven't followed the other URI thread at all, but isn't replacing the 
scattered standard library functions with a consistent API the whole 
point of that effort?

parse_str() in particular has a non-descriptive name, and a weird 
function signature because it used to directly overwrite variables by name.

As a comparison, we didn't extend the shuffle() function with an 
algorithm parameter, we added a shuffleArray() method to the new 
Randomizer class.


-- 
Rowan Tommins
[IMSoP]