Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:63297 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 56221 invoked from network); 9 Oct 2012 08:33:22 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Oct 2012 08:33:22 -0000 Authentication-Results: pb1.pair.com header.from=theanomaly.is@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=theanomaly.is@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.217.170 as permitted sender) X-PHP-List-Original-Sender: theanomaly.is@gmail.com X-Host-Fingerprint: 209.85.217.170 mail-lb0-f170.google.com Received: from [209.85.217.170] ([209.85.217.170:48383] helo=mail-lb0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 50/75-23861-151E3705 for ; Tue, 09 Oct 2012 04:33:22 -0400 Received: by mail-lb0-f170.google.com with SMTP id gm13so3603519lbb.29 for ; Tue, 09 Oct 2012 01:33:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=BwHa1MhIQsILv8jDO4TUMNW4x9VCcwnRZiL86nsHXzQ=; b=kQnoopKjw4XRwEcA6EdLZn730F/hSu+ymGTSHQPi+KdYer7oGZETkm4eUW3SdtmdOL swd8hEUBfcqKCZTb4w8KDHnrLhr7+PM8+6Y7HRvspyMRDXFEC6ImxsC7+3stMMlkhjjR 21JW4lPpINy+InKndzUoaIfKACn5fO6N7ir55tWG+YbXlZJBmDpJ9ewtXlipCOeYrdAM 09OAHvM4jGfwOEyfIGFIsZVnP2QDRcCIrgEOsj74owuvpec7VZ0HX9zNO/xB3LSCOBb/ whYV5HaDIa4Y0+IcTi1PhTm/pFcdL0jbQzzbrDzFTiRq8Eox0RFU/26fohRymP/0XtUi N6+g== MIME-Version: 1.0 Received: by 10.152.112.233 with SMTP id it9mr15693041lab.40.1349771598738; Tue, 09 Oct 2012 01:33:18 -0700 (PDT) Received: by 10.112.12.178 with HTTP; Tue, 9 Oct 2012 01:33:18 -0700 (PDT) In-Reply-To: References: Date: Tue, 9 Oct 2012 04:33:18 -0400 Message-ID: To: Tjerk Anne Meesters Cc: Nicolai Scheer , internals@lists.php.net Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [PHP-DEV] stream_get_line behaviour Bug #63240 From: theanomaly.is@gmail.com (Sherif Ramadan) On Tue, Oct 9, 2012 at 12:59 AM, Tjerk Anne Meesters wrote: > On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer wrote: > >> Hi! >> >> We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a >> behaviour of stream_get_line that is most likely a bug and breaks a >> lot of our file processing code. >> >> The issue seems to have been introduced from 5.3.10 to 5.3.11. >> >> I opened a bug report: #63240. >> > > I've managed to reduce the code to this; it's very specific: > > $file = __DIR__ . '/input_dummy.txt'; > $delimiter = 'MM'; > file_put_contents($file, str_repeat('.', 8189) . $delimiter . $delimiter); > > $fh = fopen($file, "rb"); > > stream_get_line($fh, 8192, $delimiter); > var_dump($delimiter === stream_get_line($fh, 8192, $delimter)); > > fclose($fh); > unlink($file); > > If the internal buffer length is 8192, after the first call to > stream_get_line() the read position (x) and physical file pointer (y) > should be positioned like so: > > .......MM(x)M(y)M > > The fact that (y) is in between the delimiter seems to cause an issue. > > I'm not sure why this bug exists, and I haven't exactly been able to pinpoint where the bug manifests itself, but something I find incredibly unusual here is the fact that the size of the stream being exactly 8193 bytes long is the reason the bug exists. It has nothing to do with the file pointers position since all we have to do here is increase or decrease the size of the file by exactly 1 byte and the bug will never show its face. Test case 1: (we decrease the file size from 8193 bytes to 8192 bytes) $file = __DIR__ . '/input_dummy.txt'; $delimiter = 'MM'; file_put_contents($file, str_repeat('.', 8188) . $delimiter . $delimiter); $fh = fopen($file, "rb"); stream_get_line($fh, 8192, $delimiter); var_dump($delimiter === stream_get_line($fh, 8192, $delimiter)); fclose($fh); unlink($file); /* bool(false) */ --------------------------------------- Test 2: (we increase the file size from 8193 bytes to 8194 bytes) $file = __DIR__ . '/input_dummy.txt'; $delimiter = 'MM'; file_put_contents($file, str_repeat('.', 8190) . $delimiter . $delimiter); $fh = fopen($file, "rb"); stream_get_line($fh, 8192, $delimiter); var_dump($delimiter === stream_get_line($fh, 8192, $delimiter)); fclose($fh); unlink($file); /* bool(false) */ ---------------------- As long as the file size is not exactly equal to 8193 bytes you don't get this issue. In fact, you can test it with any multiple of 8192 + 1 and the same issue appears. However, the bigger anomaly is that it also requires the length of the delimiter to be larger than 1 before the bug manifests itself. I suspect this has something to do with the way PHP streams are buffered internally. The internal stream is read up to a certain length and buffered in memory using the internal API functions, while your calls to PHP-facing functions like stream_get_line() read directly from the buffer instead. So it's possible somewhere in this function (line 1026 of main/streams/streams.c http://lxr.php.net/xref/PHP_5_4/main/streams/streams.c#1026) lies the bug. >> The issue seems to be related to #44607, but that one got fixed years ago. >> >> Is anybody able to confirm this behaviour or has stumbled upon this? >> >> Furthermore the behaviour of stream_get_line on an empty file seems to >> have changed between php 5.3.10 and php 5.3.11: >> >> > >> $file = __DIR__ . 'empty.txt'; >> file_put_contents( $file, '' ); >> $fh = fopen( $file, 'rb' ); >> $data = stream_get_line( $fh, 4096 ); >> var_dump( $data ); >> >> result in >> >> string(0) "" >> >> for php 5.3.10 >> >> and in >> >> bool(false) >> >> for php > 5.3.10. > > I don't know if this should be considered a bug, but as far as I know >> such a behaviour should not change during minor releases... >> >> Any insight is appreciated! >> >> Greetings >> >> Nico >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php >> >> > > > -- > -- > Tjerk