Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:40940 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 48565 invoked from network); 12 Oct 2008 03:22:59 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Oct 2008 03:22:59 -0000 Authentication-Results: pb1.pair.com smtp.mail=ilia@prohost.org; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=ilia@prohost.org; sender-id=unknown Received-SPF: error (pb1.pair.com: domain prohost.org from 64.233.178.244 cause and error) X-PHP-List-Original-Sender: ilia@prohost.org X-Host-Fingerprint: 64.233.178.244 hs-out-0708.google.com Received: from [64.233.178.244] ([64.233.178.244:60790] helo=hs-out-0708.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 21/C1-46613-19D61F84 for ; Sat, 11 Oct 2008 23:22:57 -0400 Received: by hs-out-0708.google.com with SMTP id m63so528987hsc.7 for ; Sat, 11 Oct 2008 20:22:54 -0700 (PDT) Received: by 10.65.35.1 with SMTP id n1mr6929884qbj.56.1223781773499; Sat, 11 Oct 2008 20:22:53 -0700 (PDT) Received: from ?192.168.1.139? (CPE0018f8c0ee69-CM000f9f7d6664.cpe.net.cable.rogers.com [72.138.241.182]) by mx.google.com with ESMTPS id p27sm6359712qbp.16.2008.10.11.20.22.51 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 11 Oct 2008 20:22:52 -0700 (PDT) Cc: internals Mailing List Message-ID: To: Gregory Beaver In-Reply-To: <48F0E625.9050505@chiaraquartet.net> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v929.2) Date: Sat, 11 Oct 2008 23:22:50 -0400 References: <48F0E625.9050505@chiaraquartet.net> X-Mailer: Apple Mail (2.929.2) Subject: Re: [PHP-DEV] question on how to solve major stream filter design flaw From: ilia@prohost.org (Ilia Alshanetsky) Greg, First of great job on the analysis of the problem. As far as I can tell (correct me if I am wrong) the problem lies in extraordinary complex use of stream filters, with >2 stacking such as phar://zip://bz2:// (for example). Since I'd wager that this is not a common use care, I would prefer to refrain from any major changes for this for the 5.2 release. Whatever simple solutions we can implement such as fix for #1 (which appears to have already been committed) are more then welcome, however my call is that #2 and #3 wait for 5.3, unless a simple and reliable solution is found. On 11-Oct-08, at 1:45 PM, Gregory Beaver wrote: > Hi, > > I'm grappling with a design flaw I just uncovered in stream filters, > and > need some advice on how best to fix it. The problem exists since the > introduction of stream filters, and has 3 parts. 2 of them can > probably > be fixed safely in PHP 5.2+, but I think the third may require an > internal redesign of stream filters, and so would probably have to be > PHP 5.3+, even though it is a clear bugfix (Ilia, your opinion > appreciated on this). > > The first part of the bug that I encountered is best described here: > http://bugs.php.net/bug.php?id=46026. However, it is a deeper problem > than this, as the attempts to cache data is dangerous any time a > stream > filter is attached to a stream. I should also note that the patch in > this bug contains feature additions that would have to wait for PHP > 5.3. > > I ran into this problem because I was trying to use stream filters to > read in a bz2-compressed file within a zip archive in the phar > extension. This was failing, and I first tracked the problem down > to an > attempt by php_stream_filter_append to read in a bunch of data and > cache > it, which caused more stuff to be passed into the bz2 decompress > filter > than it could handle, making it barf. After fixing this problem, I > ran > into the problem described in the bug above because of > php_stream_fill_read_buffer doing the same thing when I tried to read > the data, because I requested it return 176 decompressed bytes, and so > php_stream_read passed in 176 bytes to the decompress filter. Only > 144 > of those bytes were actually bz2-compressed data, and so the filter > barfed upon trying to decompress the remaining data (same as bug > #46026, > found differently). > > You can probably tell from my explanation that this is an > extraordinarily complex problem. There's 3 inter-related problems > here: > > 1) bz2 (and zlib) stream filter should stop trying to decompress > when it > reaches the stream end regardless of how many bytes it is told to > decompress (easy to fix) > 2) it is never safe to cache read data when a read stream filter is > appended, as there is no safe way to determine in advance how much of > the stream can be safely filtered. (would be easy to fix if it weren't > for #3) > 3) there is no clear way to request that a certain number of filtered > bytes be returned from a stream, versus how many unfiltered bytes > should > be passed into the stream. (very hard to fix without design change) > > I need some advice on #3 from the original designers of stream filters > and streams, as well as any experts who have dealt with this kind of > problem in other contexts. In this situation, should we expect stream > filters to always stop filtering if they reach the end of valid input? > Even in this situation, there is potential that less data is available > than passed in. A clear example would be if we requested only 170 > bytes. 144 of those bytes would be passed in as the complete > compressed > data, and bz2.decompress would decompress all of it to 176 bytes. 170 > of those bytes would be returned from php_stream_read, and 6 would > have > to be placed in a cache for future reads. Thus, there would need to > be > some way of marking the cache as valid because of this logic path: > > $a = fopen('blah.zip'); > fseek($a, 132); // fills read buffer with unfiltered data > stream_filter_append($a, 'bzip2.decompress'); // clears read buffer > cache > $b = fread($a, 170); // fills read buffer cache with 6 bytes > fseek($a, 3, SEEK_CUR); // this should seek within the filtered data > read buffer cache > stream_filter_append($a, 'zlib.inflate'); > ?> > > The question is what should happen when we append the second filter > 'zlib.inflate' to filter the filtered data? If we clear the read > buffer > as we did in the first case, it will result in lost data. So, let's > assume we preserve the read buffer. Then, if we perform: > > $c = fread($a, 7); > ?> > > and assume the remaining 3 bytes expand to 8 bytes, how should the > read > buffer cache be handled? Should the first 3 bytes still be the > filtered > bzip2 decompressed data, and the last 3 replaced with the 8 bytes of > decompressed zlib data? > > Basically, I am wondering if perhaps we need to implement a read > buffer > cache for each stream filter. This could solve our problem, I think. > The data would be stored like so: > > stream: 170 bytes of unfiltered data, and a pointer to byte 145 as the > next byte for php_stream_read() > bzip2.decompress filter: 176 bytes of decompressed bzip2 data, and a > pointer to byte 171 as the next byte for php_stream_read() > zlib.inflate filter: 8 bytes of decompressed zlib data, and a > pointer to > byte 8 as the next byte for php_stream_read() > > This way, we would essentially have a stack of stream data. If the > zlib > filter were then removed, we could "back up" to the bzip2 filter and > so > on. This will allow proper read cache filling, and remove the weird > ambiguities that are apparent in a filtered stream. I don't think we > would need to worry about backwards compatibility here, as the most > common use case would be unaffected by this change, and the use case > it > would fix has never actually worked. > > I haven't got a patch for this yet, but it would be easy to do if the > logic is sound. > > Thanks, > Greg > > > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > Ilia Alshanetsky