Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126907 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 547CE1A00BC for <internals@lists.php.net>; Sat, 22 Mar 2025 15:20:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1742656671; bh=8OA5gp4VMRuMbTBRc7l31NbIbqv2eOq7Xjz9GVSuZQc=; h=Date:Subject:To:References:From:In-Reply-To:From; b=ElrmJj3xhBM9EQFzYMsYAOym790exlvNVwWo3dNVy0ExJ0KHRdgqG9rO2LrDMxGHN Yny3lOxwNPvaNGmFgLMrnyTZk2R2euT1JH3yElwd/yCFei0brQxI9BD4uS2Srf2V8s 8RpFhXHUOjioCc7p+QyXckmr+lyy7Ccfvg00jgoeqk7wMw/vXfTjj2hPFaGZ5NtyMp GgS0nRSkaaRbLrnCTph50e5NRnHxihcHviK8SnddrDRlXI+pxbPmlbH/2+pnZk1i4i 1gkSdmvj0uqgVcwlAbsBlwb79cc6tsczTtPiKjOwLcbvwCO7UnTB8DMG1U55yn4ZWY KVeFr310cqOJA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id BAE78180053 for <internals@lists.php.net>; Sat, 22 Mar 2025 15:17:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: <imsop.php@rwec.co.uk> Received: from fout-a5-smtp.messagingengine.com (fout-a5-smtp.messagingengine.com [103.168.172.148]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for <internals@lists.php.net>; Sat, 22 Mar 2025 15:17:50 +0000 (UTC) Received: from phl-compute-10.internal (phl-compute-10.phl.internal [10.202.2.50]) by mailfout.phl.internal (Postfix) with ESMTP id DDC6D13837C6 for <internals@lists.php.net>; Sat, 22 Mar 2025 11:20:19 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-10.internal (MEProxy); Sat, 22 Mar 2025 11:20:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rwec.co.uk; h=cc :content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1742656819; x=1742743219; bh=F6OMaTC/e4iY330YxkmAqNGzqRBOiveTAx3JYLaRAHM=; b= stW/hqKq6KZjl+rVfIPNOg4Mp6JkErvYCy6FItKHoPEOUAQ/A3dX3qKZlINabmI6 EBASHBo0J9k9TbQGMaGTpevusq+a3tnyg/smAwWkqSFgn5PeYYXaVOqI+dpb4PE6 O9+f7FFMT3SOjEKf7+r5hdZ6CjG9FdQL7DvOFT8xYlR4fXIfrkL4PcACRB3US5e3 GNWYSifm/v2hnBQ3bNx/kzTCzwsWKe02vRTgciC6FFuRpC6HF6gPtFifsOEVCfR4 Dv/es5JlSrUVN9cinM0wJBEHrK4g/u0q1gAYwQnBnh2D85I7sSD/fIpkLfSYY4yH 5OxA3DcDyKqKBQQh1scI8A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1742656819; x=1742743219; bh=F 6OMaTC/e4iY330YxkmAqNGzqRBOiveTAx3JYLaRAHM=; b=vsEPPSUag0tO2QRvN vbf1j1nWPc/AnwUHFloiGNSoAhxmiKqrrFiCwGFyPRR0nuH1tIOaJ5eSRarppPqi QsSKUB9/u1q5261DpLVfPa51mrv1siSfi7ya33siETfwJYFgD68vtrVSdK8xPHTd I6is6KhVr6tFgGBW5bLXns7vtNueOlyzVyUIksIFGpXXWod6tqzgJVT++12k0DEP KGWv7wcU8fQj7aejWxw5OM1kHHkiYDJwo6/+eW6//COh5e87cTXDl0aEB1Im11mZ h8QBT3NbUoDSfh+MjxqRV9UuQWZmSKDddULf7clTJror9wgLulm9P0R5dNd9PJ1x kAFOg== X-ME-Sender: <xms:M9XeZwC0sdpcZ3rVUnHWY3a0L0lDeBpqoDYhumGmg-owvMDks9LWRg> <xme:M9XeZyjAV_2Z8HxqQhdZ2YdCpZD2FIWbbgM0_s5_wxG2y3tvKZ42i6IzesP8yH0cH z9NFUpInBNC2inHY88> X-ME-Received: <xmr:M9XeZzkZQYEizSc4dPbNv1TYGbTrNH844CfDDQnXMGRUCV3HEkZrycBF6tRMdI-Ee4vrGKGQLGzH-VbceNuwSg16VM-mTHlzK3t6X7JlZl2P8yV9hvjUTysJ> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgdduheegfeduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpefkff ggfgfuvfhfhfgjtgfgsehtkeertddtvdejnecuhfhrohhmpedftfhofigrnhcuvfhomhhm ihhnshculgfkoffuohfrngdfuceoihhmshhophdrphhhphesrhifvggtrdgtohdruhhkqe enucggtffrrghtthgvrhhnpeeltdetjeetvdehteffgefgleeviefgveehjeelleehgeeg teekheejteeiheeuhfenucffohhmrghinhepphhhphdrnhgvthenucevlhhushhtvghruf hiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehimhhsohhprdhphhhpsehrfigv tgdrtghordhukhdpnhgspghrtghpthhtohepuddpmhhouggvpehsmhhtphhouhhtpdhrtg hpthhtohepihhnthgvrhhnrghlsheslhhishhtshdrphhhphdrnhgvth X-ME-Proxy: <xmx:M9XeZ2xf0HhLOlEUz2zdQYHgEC2M-_fhIfHnxkODZZSQVgAOvEmArQ> <xmx:M9XeZ1RtR4IXk8kIp5r3aiB3WmOttirt-JW-U5C3qypvRZ58fIB2_A> <xmx:M9XeZxah-cbQ6RF-wKjBLxhNZGnlj21ioFQQ7_Zc1miStMzWMsYFtQ> <xmx:M9XeZ-Tr6AXkLN39QYGekBKhSIGu9v5DGvI2F9KMAq5R3krZHXQRGw> <xmx:M9XeZ0JsE_gw0iNbZ1xieyOosQTLV5V37_3AI3EK5FIZma_9FGXNItEo> Feedback-ID: id5114917:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for <internals@lists.php.net>; Sat, 22 Mar 2025 11:20:19 -0400 (EDT) Message-ID: <36e83b76-6bab-42a3-9c5d-07a35d43851f@rwec.co.uk> Date: Sat, 22 Mar 2025 15:20:15 +0000 Precedence: bulk list-help: <mailto:internals+help@lists.php.net list-unsubscribe: <mailto:internals+unsubscribe@lists.php.net> list-post: <mailto:internals@lists.php.net> List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PHP-DEV] Potential RFC: mb_rawurlencode() ? To: internals@lists.php.net References: <C47E5955-E75E-49A7-88D3-6957669057F0@pmjones.io> <CAEPPVa1p9tiTZLs69Mjo05KTyLi2RwLJ+Z4SSWKEwCZdR9RnOw@mail.gmail.com> <CAEPPVa38tgc=K5Hgcf+-d7JNfi+v97fjFQpwA0J7cOMCz1FxNw@mail.gmail.com> <EEC27D2D-BDBE-41F4-95F1-45EBD0B328BC@pmjones.io> <c0b86fc1b3a5eb1d144f80c931febeaa@bastelstu.be> Content-Language: en-GB In-Reply-To: <c0b86fc1b3a5eb1d144f80c931febeaa@bastelstu.be> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit From: imsop.php@rwec.co.uk ("Rowan Tommins [IMSoP]") On 21/03/2025 11:17, Tim Düsterhus wrote: > > I am not sure if that signature makes sense and if the proposed > functionality fits into mbstring for that reason. IRIs are defined as > UTF-8, any other encoding results in invalid output / results that are > not interoperable. This confirms a nagging feeling I had when I first saw the thread: the name "mb_rawurlencode" implies "do the same things as rawurlencode, but for multi-byte strings", but that's not what is being proposed. Notably, a similar feature is actually slated for removal; to quote https://www.php.net/manual/en/migration82.deprecated.php#migration82.deprecated.mbstring > Usage of the QPrint, Base64, Uuencode, and HTML-ENTITIES 'text encodings' is deprecated for all MBString functions. Unlike all the other text encodings supported by MBString, these do not encode a sequence of Unicode codepoints, but rather a sequence of raw bytes. It is not clear what the correct return values for most MBString functions should be when one of these non-encodings is specified. The same applies here: if you write mb_rawurlencode($my_string, 'SHIFT-JIS'), does that mean convert what you can to ASCII, and percent encode the rest for a URI; or does it mean convert to UTF-8, and percent encode as necessary for an IRI? If the input contains sequences which are not valid SHIFT-JIS, are those bytes treated as unencodable (producing errors or substitution characters), or are they directly percent encoded? > The correct solution to me is to build a proper thought-through API as > part of the proposed new Uri namespace and not adding new standalone > functions without a clear vision. I completely agree. For instance, the IRI standard does include an algorithm for converting a non-Unicode IRI representation to a URI - but it requires a Unicode Normalization step, which is a complex algorithm not included in ext/standard or ext/mbstring, only ext/intl. However, a function in the URI namespace that only handled the UTF-8 input case might still be useful. > Along those lines, I think there might need to be two additional changes/additions to help with encoding for RFC 3987 and WHATWG-URL component values: > > - `http_build_query()` would need PHP_QUERY_3987 and PHP_QUERY_WHATWG flags and corresponding logic (or entirely new functions); and > - `parse_str()` would need a corresponding `mb_parse_str()`. I haven't followed the other URI thread at all, but isn't replacing the scattered standard library functions with a consistent API the whole point of that effort? parse_str() in particular has a non-descriptive name, and a weird function signature because it used to directly overwrite variables by name. As a comparison, we didn't extend the shuffle() function with an algorithm parameter, we added a shuffleArray() method to the new Randomizer class. -- Rowan Tommins [IMSoP]