Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:96340 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 83385 invoked from network); 13 Oct 2016 16:32:18 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 13 Oct 2016 16:32:18 -0000 Authentication-Results: pb1.pair.com header.from=larry@garfieldtech.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=larry@garfieldtech.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain garfieldtech.com from 66.111.4.29 cause and error) X-PHP-List-Original-Sender: larry@garfieldtech.com X-Host-Fingerprint: 66.111.4.29 out5-smtp.messagingengine.com Received: from [66.111.4.29] ([66.111.4.29:54410] helo=out5-smtp.messagingengine.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 00/FE-41968-F07BFF75 for ; Thu, 13 Oct 2016 12:32:16 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 37EFD2068E for ; Thu, 13 Oct 2016 12:32:13 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute3.internal (MEProxy); Thu, 13 Oct 2016 12:32:13 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=kOpqDWe10jWvoPT Gh/BDXTv3c9E=; b=GEbMq9y/mMavWhM/RqHsXnzbR/JJbvICtorjtznAefh3zoX PJeauUyTAc0lFRD4Gd+HRr/RRvOc06Tt90mYc2Pnv00/F64aQeQbaLwIZwxRt+Cb pFlKMch3A3U/cGDJB2r53lJJxTEbRyVToHLNcnSUMJaujrevZhE3S2H3vcPk= X-Sasl-enc: Ai9LzSPJbyE3JX8F5V6Xbp54mrpQkye4s4LeWwvTSEMx 1476376332 Received: from [192.168.42.5] (c-50-178-40-84.hsd1.il.comcast.net [50.178.40.84]) by mail.messagingengine.com (Postfix) with ESMTPA id E4DA9F2988 for ; Thu, 13 Oct 2016 12:32:12 -0400 (EDT) To: internals@lists.php.net References: Message-ID: Date: Thu, 13 Oct 2016 11:32:12 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [RFC] Bug #72811 - Replacing parse_url() From: larry@garfieldtech.com (Larry Garfield) On 10/13/2016 10:00 AM, David Walker wrote: > On Mon, Oct 10, 2016 at 1:22 PM Larry Garfield > wrote: > >> Be aware that a user-space definition for a URL object already exists as >> part of PSR-7: >> >> http://www.php-fig.org/psr/psr-7/#3-5-psr-http-message-uriinterface >> >> A core-provided mutable and incompatible object would be problematic. >> >> What would be useful would be to have a C-level function (parse_url() or >> otherwise) that can generate a very well-known and standardized array >> structure (ie, better than parse_url()s now) that a UriInterface >> implementation could trivially wrap. Basically, a way to simplify this >> existing code: >> >> >> https://github.com/zendframework/zend-diactoros/blob/master/src/Uri.php#L435 >> >> And move the conditionals and filter*() sub-calls to C. (Right now they >> play games with regexes and hope.) >> > Hi Larry, > > I guess I'm not sure why having a RFC/WHATWG compliant parser would be > problematic with regard to PSR-7. It would be the application developers > responcibility to take a standardized output and populate their object that > implements UriInterface. WHATWG does seem to mitigate the need of some of > the filter*() calls, but certain ones would still desire to be > application-specific. > > Although WHATWG does not specify that the URL object has a getAll()-esque > method, it could be beneficial to have something that returns a structure > similar to what parse_url() does today. It could also be beneficial to > just have URL implement ArrayAccess so you wouldn't have to bother with > getting a specific array back, and can just access what you need. > > -- > Dave It's not that having an RFC-compliant parser in C is problematic. Quite the opposite. It's the representation it produces back to user-land code. Viz, right now the most common PSR-7 implementation uses parse_url() internally, which as noted is somewhat buggy and incomplete. If PHP natively provided a better parser that a PSR-7 implementation could use, that's good for everyone. What would not be helpful is for PHP to natively provide, essentially, a competitor to PSR-7's Uri object. The raw data parsing can/should live in C, while the main user-space representation is defined in user-space. That's the same point that was made for HTTP headers overall a while back; PHP already has the ability in C to read a stream and parse it out into headers, a GET array, a POST array, etc. It uses it for the super-globals. Exposing that capability to user-space would allow for more efficient and flexible implementations of PSR-7 or similar. I fully expect that in a few years PSR-7 will be updated and supplanted with something that leverages newer PHP features, and we would want to make that transition as smooth as possible. That means having a clear stack of complementary functionality, not competing "polished" functionality that would then have to be mapped back and forth in a clumsy fashion. --Larry Garfield