Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126853 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id C06E01A00BC for ; Thu, 20 Mar 2025 06:31:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1742452161; bh=8MCXnan3LnONrAQR9Bn06y3KKvdpV5oaYMp4/iR+6j8=; h=References:In-Reply-To:From:Date:Subject:To:From; b=er2E6TxcJfLXvJZ8S9DeLCEEbwJ/1khVTQo3tfRpC8HVwMGknit0Am6gGOeW0M7bk 7og0ruVJB+khPtv6Y7IqC+o9axoC9+w48p2Qf98dbUdztwfPR+jUxjpdJVbCl4qDqx fbsx8fVSbO3IKItzu3xBl5sago1JPqJka+azohd7aHIB81em8s3bHmS5IUB+mxHSWx M2cwyoXapbbL0P6M9IJ/UpxBXp7fGzF5T5uRIi/iDhukfheew290P3KK9HA/Dclssf VnXAiPazEJK+U9ux6sqotiwAQ6o1tLr2FCQjiU2CI691zvVF/px8eQ4MRuEIIptJ5L FmlQLEBbdBejw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id DC0261801D6 for ; Thu, 20 Mar 2025 06:29:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 20 Mar 2025 06:29:20 +0000 (UTC) Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-4394345e4d5so2235465e9.0 for ; Wed, 19 Mar 2025 23:31:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742452310; x=1743057110; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FT65LfTrb8UBP04qLcXFYDbIRuE7/9zJUnXi200w1Jw=; b=UKb+8tO1PQIS2EZ+Bv+49ukrcaZo6/HpOlhCCyrnJxblbwo9E1aN/k1j25AejQ+Jt5 hdOdKoubxAuKnzBlO/O6WNuCJtfbfui1mh82SRftOuK4WKaDqjuzkAsFLAY17UclxVA5 yrVz1QjoXPvauOFXU7g83RogC4hz+IVZgUOFNRc5f4jHRpoyosFTmkDJTFs4S/rO26Nc FsrdFG2yMZ/M9yIbqjg+AWmaXZNCLSfc8a/Gp8CCf46IICF3GQytGEkRZEGZ8byBOHj1 zhESoagCy4LwvAOlPtPxvUhRe7bbHcmzzDtXQsvTq5U1gFurGtzo3lnA6y+U3tXgmOdf gGAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742452310; x=1743057110; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FT65LfTrb8UBP04qLcXFYDbIRuE7/9zJUnXi200w1Jw=; b=ec2Gzg+kQ9M8cp+EcnpU/bRI3OmmCR4AAD7ekeX1NpuMdcB2PU9JxldVuHqlEVtj7b NhW/3ktojKmFRR0zWHv5Ii3QII9+Z18De0erWCNDNbhDpXsr+dKvsAc2rFm2hcNkzD3v sE5JU2vQ21XIS7VZ5Kp8DcMnlVNCIyCgev0WG8dxjk9eeCl/r4NFTRbi2ibj0I6SgSjs BpkrerHWj1VV0PdhxScLiyf7Aumbjhzu0htpOjj2ZcvDgYtiowQxcTE6BaOqAg732roX 1BUSMINsF6GcOLTWE1wLFJ5GlUB4ePYlXNTKGJWUr8jxeMNIYISeOXv8uBEbjZ3H5I/O Xm3Q== X-Gm-Message-State: AOJu0Yxj8UESoREEFGXSxYBP6eXSo5s/kzmXw+gSflDFkrqyQgSnqaad 18xMVwNaci6F0HPKIWKkEzzP3WocsE8ibg9r+b/cgl3Rvmj0hcLUHW4dLg8JPSm0V6Zq3WADjfL JpR2dID7C5JbB+tSjpr9r8ERgwR8MKh4= X-Gm-Gg: ASbGncuZ1lQBxN2zo4SgLXK6bxB7GsieDdIqnl1+KStIRNbfQPxbAIKsavMRpm1lGsZ 86XQXpzNKKpfaV/+1bObNrJ5DvUdHLCOvpESwA0HdqaX6mtcE/fzY9IrtvuJMv5JBqkVuIJY7bn haWdixFMrhMm2Jkjg3gYTzbQBX7A== X-Google-Smtp-Source: AGHT+IHSztPlcGrzyoPnlX919wdse/SJgtyp09kk746LiNoTKzYR5X7r9Q1QAWY3Na2sVglw/khnVDwEoE/1CvkmhVg= X-Received: by 2002:adf:9786:0:b0:391:22e2:cd21 with SMTP id ffacd0b85a97d-39973b32852mr2898871f8f.36.1742452310195; Wed, 19 Mar 2025 23:31:50 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: In-Reply-To: Date: Thu, 20 Mar 2025 15:31:38 +0900 X-Gm-Features: AQ5f1JrEcgdZPOE8l4rn6lWLO1WJpqbVW74Jkk7JW6Xkiix0ssvKfIzj9MirOVA Message-ID: Subject: Fwd: [PHP-DEV] Potential RFC: mb_rawurlencode() ? To: php internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) ---------- Forwarded message --------- From: youkidearitai Date: 2025=E5=B9=B43=E6=9C=8820=E6=97=A5(=E6=9C=A8) 14:41 Subject: Re: [PHP-DEV] Potential RFC: mb_rawurlencode() ? To: Paul M. Jones 2025=E5=B9=B43=E6=9C=8819=E6=97=A5(=E6=B0=B4) 2:52 Paul M. Jones : > > Hi all, > > The discussion around WHATWG-URL on this list, as well as my work coordin= ating Uri-Interop , lead me to th= ink PHP needs a multibyte equivalent of rawurlencode(). > > Broadly speaking, as far as I can tell: > > - For an RFC 3986 URI, delimiters need to be percent-encoded, as well as = non-ASCII characters. > - For an RFC 3987 IRI, delimiters need to be percent-encoded, but UCS cha= racters do not. > > (There are other details but I think you get the idea.) > > The rawurlencode() function does fine for URIs, but not for IRIs. Using r= awurlencode() for an IRI will encode multibyte characters when it should le= ave them alone. For example: > > ``` > $val =3D 'f=C3=BC bar'; > > $uriPath =3D '/heads/' . rawurlencode($val) . '/tails/'; > assert($uriPath =3D=3D=3D '/heads/f%C3%BC%20bar/tails/'); // true > > $iriPath =3D '/heads/' . rawurlencode($val) . '/tails/'); > assert($iriPath =3D=3D=3D '/heads/f=C3=BC bar/tails/'; // false > ``` > > (This might apply to WHATWG-URL component construction as well.) > > Have I missed something, either in the specs or in PHP itself? > > If not, how do we feel about an RFC for mb_rawurlencode()? A naive userla= nd implementation might look something like the code below. > > Thoughts? > > * * * > > ```php > function mb_rawurlencode(string $string) : string > { > $encoded =3D ''; > > foreach (mb_str_split($string) as $char) { > $encoded .=3D match ($char) { > chr(0) =3D> "%00", > chr(1) =3D> "%01", > chr(2) =3D> "%02", > chr(3) =3D> "%03", > chr(4) =3D> "%04", > chr(5) =3D> "%05", > chr(6) =3D> "%06", > chr(7) =3D> "%07", > chr(8) =3D> "%08", > chr(9) =3D> "%09", > chr(10) =3D> "%0A", > chr(11) =3D> "%0B", > chr(12) =3D> "%0C", > chr(13) =3D> "%0D", > chr(14) =3D> "%0E", > chr(15) =3D> "%0F", > chr(16) =3D> "%10", > chr(17) =3D> "%11", > chr(18) =3D> "%12", > chr(19) =3D> "%13", > chr(20) =3D> "%14", > chr(21) =3D> "%15", > chr(22) =3D> "%16", > chr(23) =3D> "%17", > chr(24) =3D> "%18", > chr(25) =3D> "%19", > chr(26) =3D> "%1A", > chr(27) =3D> "%1B", > chr(28) =3D> "%1C", > chr(29) =3D> "%1D", > chr(30) =3D> "%1E", > chr(31) =3D> "%1F", > chr(127) =3D> "%7F", > "!" =3D> '%21', > "#" =3D> '%23', > "$" =3D> '%24', > "%" =3D> '%25', > "&" =3D> '%26', > "'" =3D> '%27', > "(" =3D> '%28', > ")" =3D> '%29', > "*" =3D> '%2A', > "+" =3D> '%2B', > "," =3D> '%2C', > "/" =3D> '%2F', > ":" =3D> '%3A', > ";" =3D> '%3B', > "=3D" =3D> '%3D', > "?" =3D> '%3F', > "[" =3D> '%5B', > "]" =3D> '%5D', > default =3D> $char, > }; > } > > return $encoded; > } > ``` > > * * * > > > -- pmj Hi, Paul. I think signature is below: ```php function mb_rawurlencode(string $string, string $encode): string {} ``` Because the mbstring function is other than Unicode (ISO-8859-1 to ISO-8859-16, Shift_JIS, EUC-* etc). Other than that I don't know yet Oops, I missing to send to internals. Sorry resend this is. Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------