Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:63298 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 61497 invoked from network); 9 Oct 2012 09:10:43 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Oct 2012 09:10:43 -0000 Authentication-Results: pb1.pair.com header.from=tjerk.meesters@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=tjerk.meesters@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.160.42 as permitted sender) X-PHP-List-Original-Sender: tjerk.meesters@gmail.com X-Host-Fingerprint: 209.85.160.42 mail-pb0-f42.google.com Received: from [209.85.160.42] ([209.85.160.42:62951] helo=mail-pb0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D4/76-23861-21AE3705 for ; Tue, 09 Oct 2012 05:10:42 -0400 Received: by mail-pb0-f42.google.com with SMTP id ro2so5136902pbb.29 for ; Tue, 09 Oct 2012 02:10:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=vU3gs/OGg7hyU2ImNF1y0vaXXzB6bKCoi9m6WpIV8mo=; b=mG/d52HVegtvjXlMGutDPstkpShFbxCB/fYrEOD2Lp6piU3muNjh5zR/2RWGWEzPCP 2F2jWLvNTQzpt4KWQXHGMkqeq/C8ywLCOLA/M/RZUGGjdQ6Xy/RSGWATpcNonWNINr8r I0H6ZiSy/aYryo5c5J3REpuvm+Fg5xRLfEfmuCIHKGCAfx7A3D3rasB3VoNgsP38VvBW aMNdT2/MMAYR98IgTc9IAHZIB+5x6FVKimr0njAGg5oWtIT3Grp5K0iWPEwVgAwzYL2y WC6jNUb7B3MKkM3amaNCLgNaR77kDxR2gwVksB6GHyoZlQueAthnEKoNhp899FVpKYzy TvWA== MIME-Version: 1.0 Received: by 10.68.222.226 with SMTP id qp2mr60943637pbc.57.1349773839656; Tue, 09 Oct 2012 02:10:39 -0700 (PDT) Sender: tjerk.meesters@gmail.com Received: by 10.66.147.201 with HTTP; Tue, 9 Oct 2012 02:10:39 -0700 (PDT) In-Reply-To: References: Date: Tue, 9 Oct 2012 17:10:39 +0800 X-Google-Sender-Auth: TN6TJ50htTk-iVxD_9Cq1Fm611M Message-ID: To: Sherif Ramadan Cc: Nicolai Scheer , internals@lists.php.net Content-Type: multipart/mixed; boundary=047d7b2ee7813d68aa04cb9cb6d6 Subject: Re: [PHP-DEV] stream_get_line behaviour Bug #63240 From: datibbaw@php.net (Tjerk Anne Meesters) --047d7b2ee7813d68aa04cb9cb6d6 Content-Type: multipart/alternative; boundary=047d7b2ee7813d68a404cb9cb6d4 --047d7b2ee7813d68a404cb9cb6d4 Content-Type: text/plain; charset=ISO-8859-1 Hi, I've managed to pinpoint the issue inside the code itself and attached a patch for 5.4.4 (I can make one for trunk as well, but at the time of writing I worked with what I had). The bug manifests itself when delimiter size > 1 AND the file pointer falls in between a delimiter after filling the read buffer with php_stream_fill_read_buffer(). When this happens, the part of the delimiter that falls on the left side of the file pointer is skipped at the next iteration because it was examined before; however, that only makes sense for single character delimiters. My patch will decrement the skip length (if non-zero) by at most bytes before performing the search. This will make sure any buffered characters are taken into consideration (again). On Tue, Oct 9, 2012 at 4:33 PM, Sherif Ramadan wrote: > On Tue, Oct 9, 2012 at 12:59 AM, Tjerk Anne Meesters > wrote: > > On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer >wrote: > > > >> Hi! > >> > >> We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a > >> behaviour of stream_get_line that is most likely a bug and breaks a > >> lot of our file processing code. > >> > >> The issue seems to have been introduced from 5.3.10 to 5.3.11. > >> > >> I opened a bug report: #63240. > >> > > > > I've managed to reduce the code to this; it's very specific: > > > > $file = __DIR__ . '/input_dummy.txt'; > > $delimiter = 'MM'; > > file_put_contents($file, str_repeat('.', 8189) . $delimiter . > $delimiter); > > > > $fh = fopen($file, "rb"); > > > > stream_get_line($fh, 8192, $delimiter); > > var_dump($delimiter === stream_get_line($fh, 8192, $delimter)); > > > > fclose($fh); > > unlink($file); > > > > If the internal buffer length is 8192, after the first call to > > stream_get_line() the read position (x) and physical file pointer (y) > > should be positioned like so: > > > > .......MM(x)M(y)M > > > > The fact that (y) is in between the delimiter seems to cause an issue. > > > > > > > I'm not sure why this bug exists, and I haven't exactly been able to > pinpoint where the bug manifests itself, but something I find > incredibly unusual here is the fact that the size of the stream being > exactly 8193 bytes long is the reason the bug exists. > > It has nothing to do with the file pointers position since all we have > to do here is increase or decrease the size of the file by exactly 1 > byte and the bug will never show its face. > > Test case 1: (we decrease the file size from 8193 bytes to 8192 bytes) > > $file = __DIR__ . '/input_dummy.txt'; > $delimiter = 'MM'; > file_put_contents($file, str_repeat('.', 8188) . $delimiter . $delimiter); > > $fh = fopen($file, "rb"); > > stream_get_line($fh, 8192, $delimiter); > var_dump($delimiter === stream_get_line($fh, 8192, $delimiter)); > > fclose($fh); > unlink($file); > > /* bool(false) */ > > --------------------------------------- > > Test 2: (we increase the file size from 8193 bytes to 8194 bytes) > > $file = __DIR__ . '/input_dummy.txt'; > $delimiter = 'MM'; > file_put_contents($file, str_repeat('.', 8190) . $delimiter . $delimiter); > > $fh = fopen($file, "rb"); > > stream_get_line($fh, 8192, $delimiter); > var_dump($delimiter === stream_get_line($fh, 8192, $delimiter)); > > fclose($fh); > unlink($file); > > /* bool(false) */ > > > ---------------------- > > > As long as the file size is not exactly equal to 8193 bytes you don't > get this issue. In fact, you can test it with any multiple of 8192 + 1 > and the same issue appears. However, the bigger anomaly is that it > also requires the length of the delimiter to be larger than 1 before > the bug manifests itself. > > I suspect this has something to do with the way PHP streams are > buffered internally. The internal stream is read up to a certain > length and buffered in memory using the internal API functions, while > your calls to PHP-facing functions like stream_get_line() read > directly from the buffer instead. So it's possible somewhere in this > function (line 1026 of main/streams/streams.c > http://lxr.php.net/xref/PHP_5_4/main/streams/streams.c#1026) lies the > bug. > > > > >> The issue seems to be related to #44607, but that one got fixed years > ago. > >> > >> Is anybody able to confirm this behaviour or has stumbled upon this? > >> > >> Furthermore the behaviour of stream_get_line on an empty file seems to > >> have changed between php 5.3.10 and php 5.3.11: > >> > >> >> > >> $file = __DIR__ . 'empty.txt'; > >> file_put_contents( $file, '' ); > >> $fh = fopen( $file, 'rb' ); > >> $data = stream_get_line( $fh, 4096 ); > >> var_dump( $data ); > >> > >> result in > >> > >> string(0) "" > >> > >> for php 5.3.10 > >> > >> and in > >> > >> bool(false) > >> > >> for php > 5.3.10. > > > > I don't know if this should be considered a bug, but as far as I know > >> such a behaviour should not change during minor releases... > >> > >> Any insight is appreciated! > >> > >> Greetings > >> > >> Nico > >> > >> -- > >> PHP Internals - PHP Runtime Development Mailing List > >> To unsubscribe, visit: http://www.php.net/unsub.php > >> > >> > > > > > > -- > > -- > > Tjerk > -- -- Tjerk --047d7b2ee7813d68a404cb9cb6d4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

I've managed to pinpoint the issue inside the co= de itself and attached a patch for 5.4.4 (I can make one for trunk as well,= but at the time of writing I worked with what I had).

The bug manifests itself when delimiter size > 1 AND the file point= er falls in between a delimiter after filling the read buffer with php_stre= am_fill_read_buffer().

When this happens, the part= of the delimiter that falls on the left side of the file pointer is skippe= d at the next iteration because it was examined before; however, that only = makes sense for single character delimiters.

My patch will decrement the skip length (if non-zero) by at most <de= limiter length - 1> bytes before performing the search. This will make s= ure any buffered characters are taken into consideration (again).


On Tue, Oct 9, 2012 at 4:33 PM, She= rif Ramadan <theanomaly.is@gmail.com> wrote:
On Tue, Oct 9, 2012 at 12:59 AM, Tjerk Anne Meesters <= datibbaw@php.net> wrote:
> On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer <scope@planetavent.de>wrote:
>
>> Hi!
>>
>> We switched from php 5.3.10 to 5.3.17 this weekend and stumbled up= on a
>> behaviour of stream_get_line that is most likely a bug and breaks = a
>> lot of our file processing code.
>>
>> The issue seems to have been introduced from 5.3.10 to 5.3.11.
>>
>> I opened a bug report: #63240.
>>
>
> I've managed to reduce the code to this; it's very specific: >
> $file =3D __DIR__ . '/input_dummy.txt';
> $delimiter =3D 'MM';
> file_put_contents($file, str_repeat('.', 8189) . $delimiter . = $delimiter);
>
> $fh =3D fopen($file, "rb");
>
> stream_get_line($fh, 8192, $delimiter);
> var_dump($delimiter =3D=3D=3D stream_get_line($fh, 8192, $delimter));<= br> >
> fclose($fh);
> unlink($file);
>
> If the internal buffer length is 8192, after the first call to
> stream_get_line() the read position (x) and physical file pointer (y)<= br> > should be positioned like so:
>
> .......MM(x)M(y)M
>
> The fact that (y) is in between the delimiter seems to cause an issue.=
>
>


I'm not sure why this bug exists, and I haven't exactly been = able to
pinpoint where the bug manifests itself, but something I find
incredibly unusual here is the fact that the size of the stream being
exactly 8193 bytes long is the reason the bug exists.

It has nothing to do with the file pointers position since all we have
to do here is increase or decrease the size of the file by exactly 1
byte and the bug will never show its face.

Test case 1: (we decrease the file size from 8193 bytes to 8192 bytes)

$file =3D __DIR__ . '/input_dummy.txt';
$delimiter =3D 'MM';
file_put_contents($file, str_repeat('.', 8188) . $delimiter .= $delimiter);

$fh =3D fopen($file, "rb");

stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter =3D=3D=3D stream_get_line($fh, 8192, $delimiter))= ;

fclose($fh);
unlink($file);

/* bool(false) */

---------------------------------------

Test 2: (we increase the file size from 8193 bytes to 8194 bytes)

$file =3D __DIR__ . '/input_dummy.txt';
$delimiter =3D 'MM';
file_put_contents($file, str_repeat('.', 8190) . $delimiter .= $delimiter);

$fh =3D fopen($file, "rb");

stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter =3D=3D=3D stream_get_line($fh, 8192, $delimiter))= ;

fclose($fh);
unlink($file);

/* bool(false) */


----------------------


As long as the file size is not exactly equal to 8193 bytes you don't get this issue. In fact, you can test it with any multiple of 8192 + 1
and the same issue appears. However, the bigger anomaly is that it
also requires the length of the delimiter to be larger than 1 before
the bug manifests itself.

I suspect this has something to do with the way PHP streams are
buffered internally. The internal stream is read up to a certain
length and buffered in memory using the internal API functions, while
your calls to PHP-facing functions like stream_get_line() read
directly from the buffer instead. So it's possible somewhere in this function (line 1026 of main/streams/streams.c
http://lxr.php.net/xref/PHP_5_4/main/streams/streams.c#1026<= /a>) lies the
bug.



--
= --
Tjerk
--047d7b2ee7813d68a404cb9cb6d4-- --047d7b2ee7813d68aa04cb9cb6d6 Content-Type: text/plain; charset=US-ASCII; name="streams-getrecord-bug.patch.txt" Content-Disposition: attachment; filename="streams-getrecord-bug.patch.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_h82sb47h0 KioqIG1haW4vc3RyZWFtcy9zdHJlYW1zLmMgICAgICAyMDEyLTA2LTEzIDEyOjU0OjIzLjAwMDAw MDAwMCArMDgwMA0KLS0tIG15c3RyZWFtcy5jIDIwMTItMTAtMDkgMTc6MDA6MTIuMDAwMDAwMDAw ICswODAwDQoqKioqKioqKioqKioqKioNCioqKiAxMDE3LDEwMjIgKioqKg0KLS0tIDEwMTcsMTAy NyAtLS0tDQogICAgICAgICAgICAgICAgcmV0dXJuIG1lbWNocigmc3RyZWFtLT5yZWFkYnVmW3N0 cmVhbS0+cmVhZHBvcyArIHNraXBsZW5dLA0KICAgICAgICAgICAgICAgICAgICAgICAgZGVsaW1b MF0sIHNlZWtfbGVuIC0gc2tpcGxlbik7DQogICAgICAgIH0gZWxzZSB7DQorICAgICAgICAgICAg ICAgaWYgKHNraXBsZW4pIHsNCisgICAgICAgICAgICAgICAgICAgICAgIC8qIGxlZnQgcGFydCBv ZiB0aGUgZGVsaW1pdGVyIG1heSBzdGlsbCByZW1haW4gaW4gdGhlIGJ1ZmZlciwNCisgICAgICAg ICAgICAgICAgICAgICAgIHJld2luZCB1cCB0byA8ZGVsaW1fbGVuIC0gMT4qLw0KKyAgICAgICAg ICAgICAgICAgICAgICAgc2tpcGxlbiAtPSBNSU4oc2tpcGxlbiwgZGVsaW1fbGVuIC0gMSk7DQor ICAgICAgICAgICAgICAgfQ0KICAgICAgICAgICAgICAgIHJldHVybiBwaHBfbWVtbnN0cigoY2hh ciopJnN0cmVhbS0+cmVhZGJ1ZltzdHJlYW0tPnJlYWRwb3MgKyBza2lwbGVuXSwNCiAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgZGVsaW0sIGRlbGltX2xlbiwNCiAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgKGNoYXIqKSZzdHJlYW0tPnJlYWRidWZbc3RyZWFtLT5yZWFkcG9z ICsgc2Vla19sZW5dKTsNCg== --047d7b2ee7813d68aa04cb9cb6d6--