Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126828 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 4D1A91A00BC for ; Tue, 18 Mar 2025 17:48:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1742319980; bh=OwbLYQkxRBvnZKmecTGab1DPSjCfFRW2WrWYBjS0yQQ=; h=From:Subject:Date:To:From; b=TxuOpQUA/YEzrmNYDcKXWSrVxqxyjJsmOlQ/vKHhKiDvh91srumNMyc+eKtWz75LB 9+NPlK2LP3ljFCDj2PljRNS2snGL/KrfC++i1GdvPAVnG3xYY+hFmSPXQqByRoPglc d6GIPwdj9IjQ1rJoCY72yPm5Fq652WksL3qxDepqQh28czMpE/gv7H4TGDgiH7TK/Y JdFhlx6BZ4bd+yYb3cjVFJLcsexgykFvBzoWfjLA1JS0fTXw/XCi4AqvOHVW+VfU+0 952NqkRb3YP9Fp7Cji1HQX427BJaWIehJRcr2rTv6JXUAIUFCDv1+y8qcJ/Pz/ZOGR 79F+bqXoA97Mw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id C5CB5180053 for ; Tue, 18 Mar 2025 17:46:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from premium76-5.web-hosting.com (premium76-5.web-hosting.com [162.213.255.108]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 18 Mar 2025 17:46:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=pmjones.io; s=default; h=To:Date:Message-Id:Subject:Mime-Version: Content-Transfer-Encoding:Content-Type:From:Sender:Reply-To:Cc:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=bxFHHHTtXWssxvvK0IVXEiXjKEmZUmOOYFehVthu9M8=; b=IeJZg3hrdVdR4ohasXuSxo1rXP MY7OZj+Hw1kDdG/wRUNd56LThLHwFWX7YGjoVeBfzfQYH+LPc13RgZtdl6VK/GaKJD5LzZetuQbaY erwXQDLzHmVZ4nYP2akPpHmwIjEMPpP2UK/0AFnXcF1EcaMFzzlEQdlbNJEGroZhd3RpeIB9KFOU8 pPWV4AtbxLS0XugqOJa9mjPUU5m0/PszmJDiU+flvxbIaK9K69QFW1UenveFyamFQ6vCKyjQYun8P eMVc5dOQ50gCPmtT5W3IWuRCeD6QhndLx6g0ZfLrhydovY21hVOgfAnp4o6ktgK8vBbxaMMyIEQPH ksrHfBbg==; Received: from 107-223-28-39.lightspeed.nsvltn.sbcglobal.net ([107.223.28.39]:54676 helo=smtpclient.apple) by premium76.web-hosting.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.1) (envelope-from ) id 1tub3X-00000009kNb-2jKA for internals@lists.php.net; Tue, 18 Mar 2025 13:48:47 -0400 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.400.131.1.6\)) Subject: [PHP-DEV] Potential RFC: mb_rawurlencode() ? Message-ID: Date: Tue, 18 Mar 2025 12:48:36 -0500 To: PHP Internals List X-Mailer: Apple Mail (2.3826.400.131.1.6) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - premium76.web-hosting.com X-AntiAbuse: Original Domain - lists.php.net X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - pmjones.io X-Get-Message-Sender-Via: premium76.web-hosting.com: authenticated_id: pmjones@pmjones.io X-Authenticated-Sender: premium76.web-hosting.com: pmjones@pmjones.io X-Source: X-Source-Args: X-Source-Dir: X-From-Rewrite: unmodified, already matched From: pmjones@pmjones.io ("Paul M. Jones") Hi all, The discussion around WHATWG-URL on this list, as well as my work = coordinating Uri-Interop , = lead me to think PHP needs a multibyte equivalent of rawurlencode(). Broadly speaking, as far as I can tell: - For an RFC 3986 URI, delimiters need to be percent-encoded, as well as = non-ASCII characters. - For an RFC 3987 IRI, delimiters need to be percent-encoded, but UCS = characters do not. (There are other details but I think you get the idea.) The rawurlencode() function does fine for URIs, but not for IRIs. Using = rawurlencode() for an IRI will encode multibyte characters when it = should leave them alone. For example: ``` $val =3D 'f=C3=BC bar'; $uriPath =3D '/heads/' . rawurlencode($val) . '/tails/'; assert($uriPath =3D=3D=3D '/heads/f%C3%BC%20bar/tails/'); // true $iriPath =3D '/heads/' . rawurlencode($val) . '/tails/'); assert($iriPath =3D=3D=3D '/heads/f=C3=BC bar/tails/'; // false ``` (This might apply to WHATWG-URL component construction as well.) Have I missed something, either in the specs or in PHP itself? If not, how do we feel about an RFC for mb_rawurlencode()? A naive = userland implementation might look something like the code below. Thoughts? * * * ```php function mb_rawurlencode(string $string) : string { $encoded =3D ''; foreach (mb_str_split($string) as $char) { $encoded .=3D match ($char) { chr(0) =3D> "%00", chr(1) =3D> "%01", chr(2) =3D> "%02", chr(3) =3D> "%03", chr(4) =3D> "%04", chr(5) =3D> "%05", chr(6) =3D> "%06", chr(7) =3D> "%07", chr(8) =3D> "%08", chr(9) =3D> "%09", chr(10) =3D> "%0A", chr(11) =3D> "%0B", chr(12) =3D> "%0C", chr(13) =3D> "%0D", chr(14) =3D> "%0E", chr(15) =3D> "%0F", chr(16) =3D> "%10", chr(17) =3D> "%11", chr(18) =3D> "%12", chr(19) =3D> "%13", chr(20) =3D> "%14", chr(21) =3D> "%15", chr(22) =3D> "%16", chr(23) =3D> "%17", chr(24) =3D> "%18", chr(25) =3D> "%19", chr(26) =3D> "%1A", chr(27) =3D> "%1B", chr(28) =3D> "%1C", chr(29) =3D> "%1D", chr(30) =3D> "%1E", chr(31) =3D> "%1F", chr(127) =3D> "%7F", "!" =3D> '%21', "#" =3D> '%23', "$" =3D> '%24', "%" =3D> '%25', "&" =3D> '%26', "'" =3D> '%27', "(" =3D> '%28', ")" =3D> '%29', "*" =3D> '%2A', "+" =3D> '%2B', "," =3D> '%2C', "/" =3D> '%2F', ":" =3D> '%3A', ";" =3D> '%3B', "=3D" =3D> '%3D', "?" =3D> '%3F', "[" =3D> '%5B', "]" =3D> '%5D', default =3D> $char, }; } return $encoded; } ``` * * * -- pmj