Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:63299 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 63446 invoked from network); 9 Oct 2012 09:24:01 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Oct 2012 09:24:01 -0000 Authentication-Results: pb1.pair.com smtp.mail=theanomaly.is@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=theanomaly.is@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.217.170 as permitted sender) X-PHP-List-Original-Sender: theanomaly.is@gmail.com X-Host-Fingerprint: 209.85.217.170 mail-lb0-f170.google.com Received: from [209.85.217.170] ([209.85.217.170:35857] helo=mail-lb0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 78/D6-23861-03DE3705 for ; Tue, 09 Oct 2012 05:24:01 -0400 Received: by mail-lb0-f170.google.com with SMTP id gm13so3629151lbb.29 for ; Tue, 09 Oct 2012 02:23:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=0pysyHgojrNeadJzmJVXUjiVsXVKED9oyGUNDd9BZVo=; b=iY8JEjC4ZI8eCNa0NWUd52bVWUl62aKGpBeXrNB9RlOH1yh3hrmpgOe0bFOexsLHtp PLXq1t6Jn7/IUy/hXqypayvfyGo/+/xP6kRBp+tZ+7T6BiO1+DuB8FKrksd0obMxzRVd uxq9/JdMjJa3ekuX8ferMtl7uCDbR9AO02vCf+8soBgtGuDL95zieTx7jHnmxxogObJO B83C0nAbpHptUMVy2tNe1AewtSrYHBgprk6bD4bGVk78pC1jH+o+YyhkS17ytmzEDtK/ fA26X5N1N3Blxrznc9Y7p3SZmft+c9YFy0UcpH2gSyk0FMVU8v+CGjjxHMGWcd3OJIon XFgg== MIME-Version: 1.0 Received: by 10.152.148.8 with SMTP id to8mr7966278lab.2.1349774637878; Tue, 09 Oct 2012 02:23:57 -0700 (PDT) Received: by 10.112.12.178 with HTTP; Tue, 9 Oct 2012 02:23:57 -0700 (PDT) In-Reply-To: References: Date: Tue, 9 Oct 2012 05:23:57 -0400 Message-ID: To: Tjerk Anne Meesters Cc: Nicolai Scheer , internals@lists.php.net Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [PHP-DEV] stream_get_line behaviour Bug #63240 From: theanomaly.is@gmail.com (Sherif Ramadan) On Tue, Oct 9, 2012 at 5:10 AM, Tjerk Anne Meesters wrote: > Hi, > > I've managed to pinpoint the issue inside the code itself and attached a > patch for 5.4.4 (I can make one for trunk as well, but at the time of > writing I worked with what I had). > > The bug manifests itself when delimiter size > 1 AND the file pointer falls > in between a delimiter after filling the read buffer with > php_stream_fill_read_buffer(). > > When this happens, the part of the delimiter that falls on the left side of > the file pointer is skipped at the next iteration because it was examined > before; however, that only makes sense for single character delimiters. > > My patch will decrement the skip length (if non-zero) by at most length - 1> bytes before performing the search. This will make sure any > buffered characters are taken into consideration (again). > > Yup, that makes perfect sense. I had narrowed it down to somewhere within _php_stream_search_delim, but I couldn't actually think of a reasonable fix without potentially breaking something else. This looks quite reasonable and I think it should work fine. > On Tue, Oct 9, 2012 at 4:33 PM, Sherif Ramadan > wrote: >> >> On Tue, Oct 9, 2012 at 12:59 AM, Tjerk Anne Meesters >> wrote: >> > On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer >> > wrote: >> > >> >> Hi! >> >> >> >> We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a >> >> behaviour of stream_get_line that is most likely a bug and breaks a >> >> lot of our file processing code. >> >> >> >> The issue seems to have been introduced from 5.3.10 to 5.3.11. >> >> >> >> I opened a bug report: #63240. >> >> >> > >> > I've managed to reduce the code to this; it's very specific: >> > >> > $file = __DIR__ . '/input_dummy.txt'; >> > $delimiter = 'MM'; >> > file_put_contents($file, str_repeat('.', 8189) . $delimiter . >> > $delimiter); >> > >> > $fh = fopen($file, "rb"); >> > >> > stream_get_line($fh, 8192, $delimiter); >> > var_dump($delimiter === stream_get_line($fh, 8192, $delimter)); >> > >> > fclose($fh); >> > unlink($file); >> > >> > If the internal buffer length is 8192, after the first call to >> > stream_get_line() the read position (x) and physical file pointer (y) >> > should be positioned like so: >> > >> > .......MM(x)M(y)M >> > >> > The fact that (y) is in between the delimiter seems to cause an issue. >> > >> > >> >> >> I'm not sure why this bug exists, and I haven't exactly been able to >> pinpoint where the bug manifests itself, but something I find >> incredibly unusual here is the fact that the size of the stream being >> exactly 8193 bytes long is the reason the bug exists. >> >> It has nothing to do with the file pointers position since all we have >> to do here is increase or decrease the size of the file by exactly 1 >> byte and the bug will never show its face. >> >> Test case 1: (we decrease the file size from 8193 bytes to 8192 bytes) >> >> $file = __DIR__ . '/input_dummy.txt'; >> $delimiter = 'MM'; >> file_put_contents($file, str_repeat('.', 8188) . $delimiter . $delimiter); >> >> $fh = fopen($file, "rb"); >> >> stream_get_line($fh, 8192, $delimiter); >> var_dump($delimiter === stream_get_line($fh, 8192, $delimiter)); >> >> fclose($fh); >> unlink($file); >> >> /* bool(false) */ >> >> --------------------------------------- >> >> Test 2: (we increase the file size from 8193 bytes to 8194 bytes) >> >> $file = __DIR__ . '/input_dummy.txt'; >> $delimiter = 'MM'; >> file_put_contents($file, str_repeat('.', 8190) . $delimiter . $delimiter); >> >> $fh = fopen($file, "rb"); >> >> stream_get_line($fh, 8192, $delimiter); >> var_dump($delimiter === stream_get_line($fh, 8192, $delimiter)); >> >> fclose($fh); >> unlink($file); >> >> /* bool(false) */ >> >> >> ---------------------- >> >> >> As long as the file size is not exactly equal to 8193 bytes you don't >> get this issue. In fact, you can test it with any multiple of 8192 + 1 >> and the same issue appears. However, the bigger anomaly is that it >> also requires the length of the delimiter to be larger than 1 before >> the bug manifests itself. >> >> I suspect this has something to do with the way PHP streams are >> buffered internally. The internal stream is read up to a certain >> length and buffered in memory using the internal API functions, while >> your calls to PHP-facing functions like stream_get_line() read >> directly from the buffer instead. So it's possible somewhere in this >> function (line 1026 of main/streams/streams.c >> http://lxr.php.net/xref/PHP_5_4/main/streams/streams.c#1026) lies the >> bug. >> >> >> >> >> The issue seems to be related to #44607, but that one got fixed years >> >> ago. >> >> >> >> Is anybody able to confirm this behaviour or has stumbled upon this? >> >> >> >> Furthermore the behaviour of stream_get_line on an empty file seems to >> >> have changed between php 5.3.10 and php 5.3.11: >> >> >> >> > >> >> >> $file = __DIR__ . 'empty.txt'; >> >> file_put_contents( $file, '' ); >> >> $fh = fopen( $file, 'rb' ); >> >> $data = stream_get_line( $fh, 4096 ); >> >> var_dump( $data ); >> >> >> >> result in >> >> >> >> string(0) "" >> >> >> >> for php 5.3.10 >> >> >> >> and in >> >> >> >> bool(false) >> >> >> >> for php > 5.3.10. >> > >> > I don't know if this should be considered a bug, but as far as I know >> >> such a behaviour should not change during minor releases... >> >> >> >> Any insight is appreciated! >> >> >> >> Greetings >> >> >> >> Nico >> >> >> >> -- >> >> PHP Internals - PHP Runtime Development Mailing List >> >> To unsubscribe, visit: http://www.php.net/unsub.php >> >> >> >> >> > >> > >> > -- >> > -- >> > Tjerk > > > > > -- > -- > Tjerk