Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:41666 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 80614 invoked from network); 4 Nov 2008 22:01:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 4 Nov 2008 22:01:15 -0000 Authentication-Results: pb1.pair.com header.from=wez@netevil.org; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=wez@netevil.org; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain netevil.org designates 8.8.38.76 as permitted sender) X-PHP-List-Original-Sender: wez@netevil.org X-Host-Fingerprint: 8.8.38.76 unknown Solaris 10 (beta) Received: from [8.8.38.76] ([8.8.38.76:51837] helo=zimbrafree.office.omniti.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id A7/63-15458-826C0194 for ; Tue, 04 Nov 2008 17:01:13 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbrafree.office.omniti.com (Postfix) with ESMTP id 8C117BB17; Tue, 4 Nov 2008 17:01:09 -0500 (EST) X-Quarantine-ID: <3MJyYZAjdbEJ> X-Spam-Score: -3.306 X-Spam-Level: X-Spam-Status: No, score=-3.306 tagged_above=-10 required=6.6 tests=[ALL_TRUSTED=-1.8, AWL=-0.420, BAYES_00=-2.599, DNS_FROM_SECURITYSAGE=1.513] Received: from zimbrafree.office.omniti.com ([127.0.0.1]) by localhost (zimbrafree.office.omniti.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3MJyYZAjdbEJ; Tue, 4 Nov 2008 17:01:08 -0500 (EST) Received: from [10.79.0.23] (onager.omniti.com [8.8.38.2]) by zimbrafree.office.omniti.com (Postfix) with ESMTP id B01BEBB12; Tue, 4 Nov 2008 17:01:08 -0500 (EST) To: Lukas Kahwe Smith In-Reply-To: References: <48F0E625.9050505@chiaraquartet.net> Message-ID: <94ED8EF4-3199-4020-B63E-6D8220DD4550@netevil.org> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v929.2) Date: Tue, 4 Nov 2008 17:01:06 -0500 Cc: Gregory Beaver , Sara Golemon , internals Mailing List X-Mailer: Apple Mail (2.929.2) Subject: Re: [PHP-DEV] question on how to solve major stream filter design flaw From: wez@netevil.org (Wez Furlong) I'm too busy to review anything anytime soon; got lots on for the next couple of months here at work. --Wez. On Nov 4, 2008, at 4:51 PM, Lukas Kahwe Smith wrote: > Hi, > > Sorry about the top post, since I am CC'ing Wez and Sara again with > the tiny hope of a reaction. > It seems the streams layer is effectively unmaintained. Greg's seems > to have gotten his fingers a bit in there, but for such a critical > piece I guess it would be good to have at least 1-2 more people > familiar. I guess reviewing this issue is a good start. > > Here is hoping .. > > regards, > Lukas > > On 11.10.2008, at 19:45, Gregory Beaver wrote: > >> Hi, >> >> I'm grappling with a design flaw I just uncovered in stream >> filters, and >> need some advice on how best to fix it. The problem exists since the >> introduction of stream filters, and has 3 parts. 2 of them can >> probably >> be fixed safely in PHP 5.2+, but I think the third may require an >> internal redesign of stream filters, and so would probably have to be >> PHP 5.3+, even though it is a clear bugfix (Ilia, your opinion >> appreciated on this). >> >> The first part of the bug that I encountered is best described here: >> http://bugs.php.net/bug.php?id=46026. However, it is a deeper >> problem >> than this, as the attempts to cache data is dangerous any time a >> stream >> filter is attached to a stream. I should also note that the patch in >> this bug contains feature additions that would have to wait for PHP >> 5.3. >> >> I ran into this problem because I was trying to use stream filters to >> read in a bz2-compressed file within a zip archive in the phar >> extension. This was failing, and I first tracked the problem down >> to an >> attempt by php_stream_filter_append to read in a bunch of data and >> cache >> it, which caused more stuff to be passed into the bz2 decompress >> filter >> than it could handle, making it barf. After fixing this problem, I >> ran >> into the problem described in the bug above because of >> php_stream_fill_read_buffer doing the same thing when I tried to read >> the data, because I requested it return 176 decompressed bytes, and >> so >> php_stream_read passed in 176 bytes to the decompress filter. Only >> 144 >> of those bytes were actually bz2-compressed data, and so the filter >> barfed upon trying to decompress the remaining data (same as bug >> #46026, >> found differently). >> >> You can probably tell from my explanation that this is an >> extraordinarily complex problem. There's 3 inter-related problems >> here: >> >> 1) bz2 (and zlib) stream filter should stop trying to decompress >> when it >> reaches the stream end regardless of how many bytes it is told to >> decompress (easy to fix) >> 2) it is never safe to cache read data when a read stream filter is >> appended, as there is no safe way to determine in advance how much of >> the stream can be safely filtered. (would be easy to fix if it >> weren't >> for #3) >> 3) there is no clear way to request that a certain number of filtered >> bytes be returned from a stream, versus how many unfiltered bytes >> should >> be passed into the stream. (very hard to fix without design change) >> >> I need some advice on #3 from the original designers of stream >> filters >> and streams, as well as any experts who have dealt with this kind of >> problem in other contexts. In this situation, should we expect >> stream >> filters to always stop filtering if they reach the end of valid >> input? >> Even in this situation, there is potential that less data is >> available >> than passed in. A clear example would be if we requested only 170 >> bytes. 144 of those bytes would be passed in as the complete >> compressed >> data, and bz2.decompress would decompress all of it to 176 bytes. >> 170 >> of those bytes would be returned from php_stream_read, and 6 would >> have >> to be placed in a cache for future reads. Thus, there would need >> to be >> some way of marking the cache as valid because of this logic path: >> >> > $a = fopen('blah.zip'); >> fseek($a, 132); // fills read buffer with unfiltered data >> stream_filter_append($a, 'bzip2.decompress'); // clears read buffer >> cache >> $b = fread($a, 170); // fills read buffer cache with 6 bytes >> fseek($a, 3, SEEK_CUR); // this should seek within the filtered data >> read buffer cache >> stream_filter_append($a, 'zlib.inflate'); >> ?> >> >> The question is what should happen when we append the second filter >> 'zlib.inflate' to filter the filtered data? If we clear the read >> buffer >> as we did in the first case, it will result in lost data. So, let's >> assume we preserve the read buffer. Then, if we perform: >> >> > $c = fread($a, 7); >> ?> >> >> and assume the remaining 3 bytes expand to 8 bytes, how should the >> read >> buffer cache be handled? Should the first 3 bytes still be the >> filtered >> bzip2 decompressed data, and the last 3 replaced with the 8 bytes of >> decompressed zlib data? >> >> Basically, I am wondering if perhaps we need to implement a read >> buffer >> cache for each stream filter. This could solve our problem, I think. >> The data would be stored like so: >> >> stream: 170 bytes of unfiltered data, and a pointer to byte 145 as >> the >> next byte for php_stream_read() >> bzip2.decompress filter: 176 bytes of decompressed bzip2 data, and a >> pointer to byte 171 as the next byte for php_stream_read() >> zlib.inflate filter: 8 bytes of decompressed zlib data, and a >> pointer to >> byte 8 as the next byte for php_stream_read() >> >> This way, we would essentially have a stack of stream data. If the >> zlib >> filter were then removed, we could "back up" to the bzip2 filter >> and so >> on. This will allow proper read cache filling, and remove the weird >> ambiguities that are apparent in a filtered stream. I don't think we >> would need to worry about backwards compatibility here, as the most >> common use case would be unaffected by this change, and the use >> case it >> would fix has never actually worked. >> >> I haven't got a patch for this yet, but it would be easy to do if the >> logic is sound. >> >> Thanks, >> Greg >> >> >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php >> > > Lukas Kahwe Smith > mls@pooteeweet.org > > >