Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117393 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 78234 invoked from network); 21 Mar 2022 15:43:50 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 21 Mar 2022 15:43:50 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 4B2D6180505 for ; Mon, 21 Mar 2022 10:10:06 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS19151 66.111.4.0/24 X-Spam-Virus: No X-Envelope-From: Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 21 Mar 2022 10:10:05 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 5780B5C0267 for ; Mon, 21 Mar 2022 13:10:05 -0400 (EDT) Received: from imap43 ([10.202.2.93]) by compute5.internal (MEProxy); Mon, 21 Mar 2022 13:10:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=1FErla+/k0gIMGxey DBBOBZO4Wvg5Nw6YWkt/lJIOFw=; b=LS2nqo6XHBbAtE+yUzCI4iIo28gCqosnk cabNT7NQLJXUEsPqeAXxDwnkApOwECKF0lQYJqyg2151L+WJGtDsx2kK0HR9auTT /+5wlhdKoe1BVbIuLh0HTke1mWYirpNkobxrMssC4va4GpeQZwug0JigBW8FR9hE JSn9xwxu6e8heG0qi+ajt/qHFByTOibwuKfUO/X6TeosLXpqjItRioCpJvj9RVDw tAeA8WuX8JbZB/yPAHUIgfDvMMYx1O/dl+fY9n4t0NC+oUGIqUKHBSOpYx4RGSZg JbvqWBQ8+2H7FC1z3BsfnsDdCsiQZbPPnt6oGhUIil+b36FfSMVHw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudegfedgleekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvffutgesthdtredtreertdenucfhrhhomhepfdfnrghr rhihucfirghrfhhivghlugdfuceolhgrrhhrhiesghgrrhhfihgvlhguthgvtghhrdgtoh hmqeenucggtffrrghtthgvrhhnpeeglefgkeduiedvvdetffeujefftdfhjeeiveehgfff keduveektddvledvvdfffeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmh grihhlfhhrohhmpehlrghrrhihsehgrghrfhhivghlughtvggthhdrtghomh X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id 022F0AC0E98; Mon, 21 Mar 2022 13:10:04 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-4907-g25ce6f34a9-fm-20220311.001-g25ce6f34 Mime-Version: 1.0 Message-ID: <1289b56c-e766-4889-bbb2-06abb4e63a6d@www.fastmail.com> In-Reply-To: References: Date: Mon, 21 Mar 2022 12:09:44 -0500 To: "php internals" Content-Type: text/plain Subject: Re: [PHP-DEV] Discussion: String streams From: larry@garfieldtech.com ("Larry Garfield") On Mon, Mar 21, 2022, at 10:23 AM, Sara Golemon wrote: > TL;DR - Yeah, PHP, but what if C++? Feel free to tell me I'm wrong and > should feel bad. THIS IS ONLY IDLE MUSINGS. > > I was reading the arbitrary string interpolation thread (which I have mixed > feelings on, but am generally okay with), and it got me thinking > about motivations for it and other ways we might address that. > > I spend most of my time in C++ these days and that's going to show in this > proposal, and the answer is probably "PHP isn't C++" and that's fine, but I > want you to read to the end, because XSS is perennially on my mind and this > might be one extra tool, maybe. > > PHP internal classes have the ability to handle operator overloads, and one > use for overloads I quite like from C++ is streaming interfaces. Imagine > the following: > > // Don't get hung up on the name, we're a long way from bikeshedding yet. > $foo = (new \ostringstream) << "Your query returned " << $result->count() > << " rows. The first row has ID: " >> $result->peekRow()['id']; > > At each << operator, the RHS is "shifted" into the string builder, and the > object instance is returned. At the end $foo, is still that object, but > when it's echoed or cast to string it becomes the entire combined string. > As implementation details, we could keep the string as a list of segments > or materialize completely, that could also be optimized to not materialize > if we're in an output context since the intermediate complete string is > unnecessary. Don't worry about this for now though. > > That by itself is... curious as an option, but not terribly interesting as > we DO have proper interpolation and it works just fine, right? > > The reason I'm bothering to introduce this is that we could also build > contextual awareness into this. During instantiation we could identify the > context like: > > $forOuput = new \ostringstream\html << "You entered: " << > $_POST['textarea']; > $forURIs = new \stringstream\uri << BASE_URI << '?'' > foreach ($_GET as $k => $v) { > $forURIs << $k '=' $v << '&'; > } > > These specializations could perform automatic sanitization during the > materialization phase, this could even be customizable: > > $custom = new \ostringstream\user( landonize(...) ); > > We wouldn't be giving arbitrary operator overloading to the user, only > arbitrary sanitization. > > Alternatively (or in addition), the point of materialization could be where > we make this decision: > > echo $stream->html(); > > ------ > > I'd build this in userspace, but of course we don't have operator > overloading, so the API would be a somewhat uglier function call: > > $stream->append("This feels ")->append(FEELING::Sad); > > Maybe the right answer is open the door on user-defined operator overloads, > but my flame retardant suit is in the shop and I don't really need to open > that mixed metaphor. > > -Sara What you're proposing here is: 1. An overloadable operator on objects (via a new magic method or whatever) that takes one argument and returns another a new instance of the same class, with the argument included in it along with whatever the object's existing data/context is. 2. Using that for string manipulation. If you spell the operator >>=, then point 1 is adding a monadic bind. This has my full support. Using it for string manipulation is fine, although there's a bazillion other things we can do with it that I would also very much support and can be done in user space. Whether or not it makes sense for some of these operations to be done in C instead is up for debate. Once an arbitrary object can have a socket, that plus monads can push most stream operations to user space. Building a "stream wrapper" like thing, or a filter, then becomes some mix of object composition and binding. $s = new StripTagsStream(new ZlibCompress(FileStream($fileName)) >>= $htmlString; Which... feels kinda Java clunky. It would be better if we could chain out the wrapping levels. Which could potentially just be done in the implementation to allow a stream on the RHS to mean that. public function __bind(Stream|string $s) { if ($s instanceof Stream) { return $s->wrapAround($this); } // Whatever this object does with a string. } $s = new FileStream($fileName) >>= new ZlibCompress() >>= new StripTagsStream() >>= $htmlString; I'm... not sure which direction we'd want them to go in. Just spitballing. Some way to automate that pattern would likely be good. But yeah, a native bind operator has my support. :-) --Larry Garfield