Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:96275 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 61530 invoked from network); 6 Oct 2016 17:19:38 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 6 Oct 2016 17:19:38 -0000 Authentication-Results: pb1.pair.com smtp.mail=cmbecker69@gmx.de; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=cmbecker69@gmx.de; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmx.de designates 212.227.17.20 as permitted sender) X-PHP-List-Original-Sender: cmbecker69@gmx.de X-Host-Fingerprint: 212.227.17.20 mout.gmx.net Received: from [212.227.17.20] ([212.227.17.20:64940] helo=mout.gmx.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 72/B1-23443-7A786F75 for ; Thu, 06 Oct 2016 13:19:36 -0400 Received: from [192.168.2.103] ([79.243.112.45]) by mail.gmx.com (mrgmx103) with ESMTPSA (Nemesis) id 0M7Y9j-1avZEj2hDL-00xI7p; Thu, 06 Oct 2016 19:19:30 +0200 To: David Walker , PHP internals References: Message-ID: Date: Thu, 6 Oct 2016 19:19:50 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K0:6qEHG8jvXbD1UnrT/QxAL7LmdQIduO/tgMl5o3rh40zymtMVyYL RTfRFr/bcD/JfxXNEaZC7ygF5CbUJXbO7qv1XiqAro0xW4TzpmhcuAEBndG5vgRRQORi4U6 GRPxbIIezf9fdkr4HjAIa7FBaa6R7J5BDbdjIS6M11kjazXKDGUjoAHsc9RaSN+6ugzUaWM T2hVsSOKvR6cxA7O776oQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:h6zi6pV1MkY=:qxKHivP8TH6FWAscTS3h9A dUALp+OrsIimc2cO6invCRO9RaVIO8mnqEmAltnAfW6DHYlz4LjGFgRzwBkxKW4Yly5lWJTrt TLhsErJqzaxqIdr6zdYI60JCoSxqUcPVeWN6rJiRrXUlYRMtVNiKbVKdYLFObC+XG4eSeOo54 AXipVn1eU1RAStbafF/GDTj+P2wzBBon1qaZYAV0lBg33W+FjpXawDvZ2liYdKf0qcogbOtUu kufNK1ghH0OI+TSTkKL5wmpTFCUSw2FBdRT/xEkqtcQ+tEx0JSgzygSdB5eb02SZ6XMT12CR4 kUe21iKhUtU5C8/O2veRbvvPY2wCktIyNtmkt4dzF6/tMXEy1eVm1QLBZ9qTfiTKO2hvwgcwj Wltahqdootv+XiTu/Qg/zmH7UxAKYHLgKAYLAsG6GKf+euJmM49hfJTp05ugFFd1lMj5BS5WR N/kLf/F0QlRNEAXwpbtXySTPvEgrvHoZGxeDyC3V3Iv2/WMZ6OhueRlfSxfrT4JIPR0jQeNKO VbJTsdfLIXiZJ2HtlBftYgwAojQ1vQRxExqDeQecoryMH+FJQdaagu8tJKjPlN4CSXmKOIxjG xGQ9NzoT/dZfiskBA/zILPegXaVebXCE5b5bRmFQ2drIhOXTYT2vO5ETtnnV1lZBSnra2uroS 50agDRXRB18TfutBFIXi54/DdtpVz9FBbwxqbi1DyBIQVJybPTjlUs4feWng+4PdBoopEQsoO XIll58kCJvAhteVCabhGrZ72M4hPPq/79VJkutYeiwbIGX9/FzwvFkw6rw0wBQP0w7DLnnw6F OsyDeCf Subject: Re: [RFC] Bug #72811 - Replacing parse_url() From: cmbecker69@gmx.de ("Christoph M. Becker") On 04.10.2016 at 20:14, David Walker wrote: > A couple weeks back I took a look at 72811[1]. The bug being that > parse_url() didn't accept IPv6 addresses without a scheme, like it did for > IPv4 addresses. I attempted to patch the specific bug within the scope of > how parse_url() was processing URI's. After opening a PR for the > resoution, Yasuo and Christoph both chimed in that perhaps replacing the > implementation with an re2c based parser would be better. We found a > parser[2] that did almost everything necessary. I took it and made it more > strictly adhere to RFC3986[3]. > > I have updated my original PR[4] and created a RFC[5] that aims to replace > the parsing of parse_url() to be more strict to RFC3986. This will provide > a BC break, as explained in the RFC that at very least warrants some > discussion. We had kicked around the idea on the PR of deprecating > parse_url, and creating a new function with the more-compliant parser, but > oped against it. > > I'm looking for discussion on if a total replacement is the preferred way > to go about this, and if, we should be making parse_url() more standards > strict. Since it today has many breaks with RFC3986 that provide > semi-reasonable parsing patterns. > > [1] - https://bugs.php.net/bug.php?id=72811 > [2] - https://github.com/staskobzar/url_parser_re2c > [3] - https://tools.ietf.org/html/rfc3986 > [4] - https://github.com/php/php-src/pull/2079 > [5] - https://wiki.php.net/rfc/replace_parse_url Thanks for the RFC, Dave! I'm all for having a properly implementable URI parser that exactly follows a specific standard. However, I don't think we can replace parse_url() with such a parser for BC reasons before PHP 8 (at least). The parse_url() man page explicitly states: | Partial URLs are also accepted, parse_url() tries its best to parse | them correctly. I'm quite sure that a lot of code relies on this behavior. So, I basically see two options: * wait until PHP 8 (whenever that'll be released) and switch the implementation of parse_url() then – what might delay the adoption of PHP 8 * add a new function in PHP 7.2 (maybe called parse_uri()), and perhaps deprecate parse_url() at the same time -- Christoph M. Becker