Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:41663 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 75149 invoked from network); 4 Nov 2008 21:52:55 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 4 Nov 2008 21:52:55 -0000 Authentication-Results: pb1.pair.com header.from=mls@pooteeweet.org; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=mls@pooteeweet.org; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain pooteeweet.org from 88.198.8.16 cause and error) X-PHP-List-Original-Sender: mls@pooteeweet.org X-Host-Fingerprint: 88.198.8.16 bigtime.backendmedia.com Linux 2.6 Received: from [88.198.8.16] ([88.198.8.16:46349] helo=bigtime.backendmedia.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BD/32-15458-634C0194 for ; Tue, 04 Nov 2008 16:52:55 -0500 Received: from localhost (unknown [127.0.0.1]) by bigtime.backendmedia.com (Postfix) with ESMTP id D9DBD4144058; Tue, 4 Nov 2008 21:53:40 +0000 (UTC) X-Virus-Scanned: amavisd-new at backendmedia.com Received: from bigtime.backendmedia.com ([127.0.0.1]) by localhost (bigtime.backendmedia.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ii0Te7I3DesH; Tue, 4 Nov 2008 22:53:38 +0100 (CET) Received: from [192.168.0.151] (77-58-151-147.dclient.hispeed.ch [77.58.151.147]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: mls@pooteeweet.org) by bigtime.backendmedia.com (Postfix) with ESMTP id 4D0444144009; Tue, 4 Nov 2008 22:53:36 +0100 (CET) Cc: internals Mailing List Message-ID: To: Gregory Beaver , Wez Furlong , Sara Golemon In-Reply-To: <48F0E625.9050505@chiaraquartet.net> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v929.2) Date: Tue, 4 Nov 2008 22:51:45 +0100 References: <48F0E625.9050505@chiaraquartet.net> X-Mailer: Apple Mail (2.929.2) Subject: Re: [PHP-DEV] question on how to solve major stream filter design flaw From: mls@pooteeweet.org (Lukas Kahwe Smith) Hi, Sorry about the top post, since I am CC'ing Wez and Sara again with the tiny hope of a reaction. It seems the streams layer is effectively unmaintained. Greg's seems to have gotten his fingers a bit in there, but for such a critical piece I guess it would be good to have at least 1-2 more people familiar. I guess reviewing this issue is a good start. Here is hoping .. regards, Lukas On 11.10.2008, at 19:45, Gregory Beaver wrote: > Hi, > > I'm grappling with a design flaw I just uncovered in stream filters, > and > need some advice on how best to fix it. The problem exists since the > introduction of stream filters, and has 3 parts. 2 of them can > probably > be fixed safely in PHP 5.2+, but I think the third may require an > internal redesign of stream filters, and so would probably have to be > PHP 5.3+, even though it is a clear bugfix (Ilia, your opinion > appreciated on this). > > The first part of the bug that I encountered is best described here: > http://bugs.php.net/bug.php?id=46026. However, it is a deeper problem > than this, as the attempts to cache data is dangerous any time a > stream > filter is attached to a stream. I should also note that the patch in > this bug contains feature additions that would have to wait for PHP > 5.3. > > I ran into this problem because I was trying to use stream filters to > read in a bz2-compressed file within a zip archive in the phar > extension. This was failing, and I first tracked the problem down > to an > attempt by php_stream_filter_append to read in a bunch of data and > cache > it, which caused more stuff to be passed into the bz2 decompress > filter > than it could handle, making it barf. After fixing this problem, I > ran > into the problem described in the bug above because of > php_stream_fill_read_buffer doing the same thing when I tried to read > the data, because I requested it return 176 decompressed bytes, and so > php_stream_read passed in 176 bytes to the decompress filter. Only > 144 > of those bytes were actually bz2-compressed data, and so the filter > barfed upon trying to decompress the remaining data (same as bug > #46026, > found differently). > > You can probably tell from my explanation that this is an > extraordinarily complex problem. There's 3 inter-related problems > here: > > 1) bz2 (and zlib) stream filter should stop trying to decompress > when it > reaches the stream end regardless of how many bytes it is told to > decompress (easy to fix) > 2) it is never safe to cache read data when a read stream filter is > appended, as there is no safe way to determine in advance how much of > the stream can be safely filtered. (would be easy to fix if it weren't > for #3) > 3) there is no clear way to request that a certain number of filtered > bytes be returned from a stream, versus how many unfiltered bytes > should > be passed into the stream. (very hard to fix without design change) > > I need some advice on #3 from the original designers of stream filters > and streams, as well as any experts who have dealt with this kind of > problem in other contexts. In this situation, should we expect stream > filters to always stop filtering if they reach the end of valid input? > Even in this situation, there is potential that less data is available > than passed in. A clear example would be if we requested only 170 > bytes. 144 of those bytes would be passed in as the complete > compressed > data, and bz2.decompress would decompress all of it to 176 bytes. 170 > of those bytes would be returned from php_stream_read, and 6 would > have > to be placed in a cache for future reads. Thus, there would need to > be > some way of marking the cache as valid because of this logic path: > > $a = fopen('blah.zip'); > fseek($a, 132); // fills read buffer with unfiltered data > stream_filter_append($a, 'bzip2.decompress'); // clears read buffer > cache > $b = fread($a, 170); // fills read buffer cache with 6 bytes > fseek($a, 3, SEEK_CUR); // this should seek within the filtered data > read buffer cache > stream_filter_append($a, 'zlib.inflate'); > ?> > > The question is what should happen when we append the second filter > 'zlib.inflate' to filter the filtered data? If we clear the read > buffer > as we did in the first case, it will result in lost data. So, let's > assume we preserve the read buffer. Then, if we perform: > > $c = fread($a, 7); > ?> > > and assume the remaining 3 bytes expand to 8 bytes, how should the > read > buffer cache be handled? Should the first 3 bytes still be the > filtered > bzip2 decompressed data, and the last 3 replaced with the 8 bytes of > decompressed zlib data? > > Basically, I am wondering if perhaps we need to implement a read > buffer > cache for each stream filter. This could solve our problem, I think. > The data would be stored like so: > > stream: 170 bytes of unfiltered data, and a pointer to byte 145 as the > next byte for php_stream_read() > bzip2.decompress filter: 176 bytes of decompressed bzip2 data, and a > pointer to byte 171 as the next byte for php_stream_read() > zlib.inflate filter: 8 bytes of decompressed zlib data, and a > pointer to > byte 8 as the next byte for php_stream_read() > > This way, we would essentially have a stack of stream data. If the > zlib > filter were then removed, we could "back up" to the bzip2 filter and > so > on. This will allow proper read cache filling, and remove the weird > ambiguities that are apparent in a filtered stream. I don't think we > would need to worry about backwards compatibility here, as the most > common use case would be unaffected by this change, and the use case > it > would fix has never actually worked. > > I haven't got a patch for this yet, but it would be easy to do if the > logic is sound. > > Thanks, > Greg > > > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > Lukas Kahwe Smith mls@pooteeweet.org