Hi!
We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a
behaviour of stream_get_line that is most likely a bug and breaks a
lot of our file processing code.
The issue seems to have been introduced from 5.3.10 to 5.3.11.
I opened a bug report: #63240.
The issue seems to be related to #44607, but that one got fixed years ago.
Is anybody able to confirm this behaviour or has stumbled upon this?
Furthermore the behaviour of stream_get_line on an empty file seems to
have changed between php 5.3.10 and php 5.3.11:
<?php
$file = DIR . 'empty.txt';
file_put_contents( $file, '' );
$fh = fopen( $file, 'rb' );
$data = stream_get_line( $fh, 4096 );
var_dump( $data );
result in
string(0) ""
for php 5.3.10
and in
bool(false)
for php > 5.3.10.
I don't know if this should be considered a bug, but as far as I know
such a behaviour should not change during minor releases...
Any insight is appreciated!
Greetings
Nico
On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer scope@planetavent.dewrote:
Hi!
We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a
behaviour of stream_get_line that is most likely a bug and breaks a
lot of our file processing code.The issue seems to have been introduced from 5.3.10 to 5.3.11.
I opened a bug report: #63240.
I've managed to reduce the code to this; it's very specific:
$file = DIR . '/input_dummy.txt';
$delimiter = 'MM';
file_put_contents($file, str_repeat('.', 8189) . $delimiter . $delimiter);
$fh = fopen($file, "rb");
stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter === stream_get_line($fh, 8192, $delimter));
fclose($fh);
unlink($file);
If the internal buffer length is 8192, after the first call to
stream_get_line()
the read position (x) and physical file pointer (y)
should be positioned like so:
.......MM(x)M(y)M
The fact that (y) is in between the delimiter seems to cause an issue.
The issue seems to be related to #44607, but that one got fixed years ago.
Is anybody able to confirm this behaviour or has stumbled upon this?
Furthermore the behaviour of stream_get_line on an empty file seems to
have changed between php 5.3.10 and php 5.3.11:<?php
$file = DIR . 'empty.txt';
file_put_contents( $file, '' );
$fh = fopen( $file, 'rb' );
$data = stream_get_line( $fh, 4096 );
var_dump( $data );result in
string(0) ""
for php 5.3.10
and in
bool(false)
for php > 5.3.10.
I don't know if this should be considered a bug, but as far as I know
such a behaviour should not change during minor releases...
Any insight is appreciated!
Greetings
Nico
--
--
Tjerk
On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer scope@planetavent.dewrote:
Hi!
We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a
behaviour of stream_get_line that is most likely a bug and breaks a
lot of our file processing code.The issue seems to have been introduced from 5.3.10 to 5.3.11.
I opened a bug report: #63240.
I've managed to reduce the code to this; it's very specific:
$file = DIR . '/input_dummy.txt';
$delimiter = 'MM';
file_put_contents($file, str_repeat('.', 8189) . $delimiter . $delimiter);$fh = fopen($file, "rb");
stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter === stream_get_line($fh, 8192, $delimter));fclose($fh);
unlink($file);If the internal buffer length is 8192, after the first call to
stream_get_line()
the read position (x) and physical file pointer (y)
should be positioned like so:.......MM(x)M(y)M
The fact that (y) is in between the delimiter seems to cause an issue.
I'm not sure why this bug exists, and I haven't exactly been able to
pinpoint where the bug manifests itself, but something I find
incredibly unusual here is the fact that the size of the stream being
exactly 8193 bytes long is the reason the bug exists.
It has nothing to do with the file pointers position since all we have
to do here is increase or decrease the size of the file by exactly 1
byte and the bug will never show its face.
Test case 1: (we decrease the file size from 8193 bytes to 8192 bytes)
$file = DIR . '/input_dummy.txt';
$delimiter = 'MM';
file_put_contents($file, str_repeat('.', 8188) . $delimiter . $delimiter);
$fh = fopen($file, "rb");
stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter === stream_get_line($fh, 8192, $delimiter));
fclose($fh);
unlink($file);
/* bool(false) */
Test 2: (we increase the file size from 8193 bytes to 8194 bytes)
$file = DIR . '/input_dummy.txt';
$delimiter = 'MM';
file_put_contents($file, str_repeat('.', 8190) . $delimiter . $delimiter);
$fh = fopen($file, "rb");
stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter === stream_get_line($fh, 8192, $delimiter));
fclose($fh);
unlink($file);
/* bool(false) */
As long as the file size is not exactly equal to 8193 bytes you don't
get this issue. In fact, you can test it with any multiple of 8192 + 1
and the same issue appears. However, the bigger anomaly is that it
also requires the length of the delimiter to be larger than 1 before
the bug manifests itself.
I suspect this has something to do with the way PHP streams are
buffered internally. The internal stream is read up to a certain
length and buffered in memory using the internal API functions, while
your calls to PHP-facing functions like stream_get_line()
read
directly from the buffer instead. So it's possible somewhere in this
function (line 1026 of main/streams/streams.c
http://lxr.php.net/xref/PHP_5_4/main/streams/streams.c#1026) lies the
bug.
The issue seems to be related to #44607, but that one got fixed years ago.
Is anybody able to confirm this behaviour or has stumbled upon this?
Furthermore the behaviour of stream_get_line on an empty file seems to
have changed between php 5.3.10 and php 5.3.11:<?php
$file = DIR . 'empty.txt';
file_put_contents( $file, '' );
$fh = fopen( $file, 'rb' );
$data = stream_get_line( $fh, 4096 );
var_dump( $data );result in
string(0) ""
for php 5.3.10
and in
bool(false)
for php > 5.3.10.
I don't know if this should be considered a bug, but as far as I know
such a behaviour should not change during minor releases...
Any insight is appreciated!
Greetings
Nico
--
--
Tjerk
Hi,
I've managed to pinpoint the issue inside the code itself and attached a
patch for 5.4.4 (I can make one for trunk as well, but at the time of
writing I worked with what I had).
The bug manifests itself when delimiter size > 1 AND the file pointer falls
in between a delimiter after filling the read buffer with
php_stream_fill_read_buffer().
When this happens, the part of the delimiter that falls on the left side of
the file pointer is skipped at the next iteration because it was examined
before; however, that only makes sense for single character delimiters.
My patch will decrement the skip length (if non-zero) by at most <delimiter
length - 1> bytes before performing the search. This will make sure any
buffered characters are taken into consideration (again).
On Tue, Oct 9, 2012 at 4:33 PM, Sherif Ramadan theanomaly.is@gmail.comwrote:
On Tue, Oct 9, 2012 at 12:59 AM, Tjerk Anne Meesters datibbaw@php.net
wrote:On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer <scope@planetavent.de
wrote:Hi!
We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a
behaviour of stream_get_line that is most likely a bug and breaks a
lot of our file processing code.The issue seems to have been introduced from 5.3.10 to 5.3.11.
I opened a bug report: #63240.
I've managed to reduce the code to this; it's very specific:
$file = DIR . '/input_dummy.txt';
$delimiter = 'MM';
file_put_contents($file, str_repeat('.', 8189) . $delimiter .
$delimiter);$fh = fopen($file, "rb");
stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter === stream_get_line($fh, 8192, $delimter));fclose($fh);
unlink($file);If the internal buffer length is 8192, after the first call to
stream_get_line()
the read position (x) and physical file pointer (y)
should be positioned like so:.......MM(x)M(y)M
The fact that (y) is in between the delimiter seems to cause an issue.
I'm not sure why this bug exists, and I haven't exactly been able to
pinpoint where the bug manifests itself, but something I find
incredibly unusual here is the fact that the size of the stream being
exactly 8193 bytes long is the reason the bug exists.It has nothing to do with the file pointers position since all we have
to do here is increase or decrease the size of the file by exactly 1
byte and the bug will never show its face.Test case 1: (we decrease the file size from 8193 bytes to 8192 bytes)
$file = DIR . '/input_dummy.txt';
$delimiter = 'MM';
file_put_contents($file, str_repeat('.', 8188) . $delimiter . $delimiter);$fh = fopen($file, "rb");
stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter === stream_get_line($fh, 8192, $delimiter));fclose($fh);
unlink($file);/* bool(false) */
Test 2: (we increase the file size from 8193 bytes to 8194 bytes)
$file = DIR . '/input_dummy.txt';
$delimiter = 'MM';
file_put_contents($file, str_repeat('.', 8190) . $delimiter . $delimiter);$fh = fopen($file, "rb");
stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter === stream_get_line($fh, 8192, $delimiter));fclose($fh);
unlink($file);/* bool(false) */
As long as the file size is not exactly equal to 8193 bytes you don't
get this issue. In fact, you can test it with any multiple of 8192 + 1
and the same issue appears. However, the bigger anomaly is that it
also requires the length of the delimiter to be larger than 1 before
the bug manifests itself.I suspect this has something to do with the way PHP streams are
buffered internally. The internal stream is read up to a certain
length and buffered in memory using the internal API functions, while
your calls to PHP-facing functions likestream_get_line()
read
directly from the buffer instead. So it's possible somewhere in this
function (line 1026 of main/streams/streams.c
http://lxr.php.net/xref/PHP_5_4/main/streams/streams.c#1026) lies the
bug.The issue seems to be related to #44607, but that one got fixed years
ago.Is anybody able to confirm this behaviour or has stumbled upon this?
Furthermore the behaviour of stream_get_line on an empty file seems to
have changed between php 5.3.10 and php 5.3.11:<?php
$file = DIR . 'empty.txt';
file_put_contents( $file, '' );
$fh = fopen( $file, 'rb' );
$data = stream_get_line( $fh, 4096 );
var_dump( $data );result in
string(0) ""
for php 5.3.10
and in
bool(false)
for php > 5.3.10.
I don't know if this should be considered a bug, but as far as I know
such a behaviour should not change during minor releases...
Any insight is appreciated!
Greetings
Nico
--
--
Tjerk
--
Tjerk
Hi,
I've managed to pinpoint the issue inside the code itself and attached a
patch for 5.4.4 (I can make one for trunk as well, but at the time of
writing I worked with what I had).The bug manifests itself when delimiter size > 1 AND the file pointer falls
in between a delimiter after filling the read buffer with
php_stream_fill_read_buffer().When this happens, the part of the delimiter that falls on the left side of
the file pointer is skipped at the next iteration because it was examined
before; however, that only makes sense for single character delimiters.My patch will decrement the skip length (if non-zero) by at most <delimiter
length - 1> bytes before performing the search. This will make sure any
buffered characters are taken into consideration (again).
Yup, that makes perfect sense. I had narrowed it down to somewhere
within _php_stream_search_delim, but I couldn't actually think of a
reasonable fix without potentially breaking something else. This looks
quite reasonable and I think it should work fine.
On Tue, Oct 9, 2012 at 4:33 PM, Sherif Ramadan theanomaly.is@gmail.com
wrote:On Tue, Oct 9, 2012 at 12:59 AM, Tjerk Anne Meesters datibbaw@php.net
wrote:On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer
scope@planetavent.dewrote:Hi!
We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a
behaviour of stream_get_line that is most likely a bug and breaks a
lot of our file processing code.The issue seems to have been introduced from 5.3.10 to 5.3.11.
I opened a bug report: #63240.
I've managed to reduce the code to this; it's very specific:
$file = DIR . '/input_dummy.txt';
$delimiter = 'MM';
file_put_contents($file, str_repeat('.', 8189) . $delimiter .
$delimiter);$fh = fopen($file, "rb");
stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter === stream_get_line($fh, 8192, $delimter));fclose($fh);
unlink($file);If the internal buffer length is 8192, after the first call to
stream_get_line()
the read position (x) and physical file pointer (y)
should be positioned like so:.......MM(x)M(y)M
The fact that (y) is in between the delimiter seems to cause an issue.
I'm not sure why this bug exists, and I haven't exactly been able to
pinpoint where the bug manifests itself, but something I find
incredibly unusual here is the fact that the size of the stream being
exactly 8193 bytes long is the reason the bug exists.It has nothing to do with the file pointers position since all we have
to do here is increase or decrease the size of the file by exactly 1
byte and the bug will never show its face.Test case 1: (we decrease the file size from 8193 bytes to 8192 bytes)
$file = DIR . '/input_dummy.txt';
$delimiter = 'MM';
file_put_contents($file, str_repeat('.', 8188) . $delimiter . $delimiter);$fh = fopen($file, "rb");
stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter === stream_get_line($fh, 8192, $delimiter));fclose($fh);
unlink($file);/* bool(false) */
Test 2: (we increase the file size from 8193 bytes to 8194 bytes)
$file = DIR . '/input_dummy.txt';
$delimiter = 'MM';
file_put_contents($file, str_repeat('.', 8190) . $delimiter . $delimiter);$fh = fopen($file, "rb");
stream_get_line($fh, 8192, $delimiter);
var_dump($delimiter === stream_get_line($fh, 8192, $delimiter));fclose($fh);
unlink($file);/* bool(false) */
As long as the file size is not exactly equal to 8193 bytes you don't
get this issue. In fact, you can test it with any multiple of 8192 + 1
and the same issue appears. However, the bigger anomaly is that it
also requires the length of the delimiter to be larger than 1 before
the bug manifests itself.I suspect this has something to do with the way PHP streams are
buffered internally. The internal stream is read up to a certain
length and buffered in memory using the internal API functions, while
your calls to PHP-facing functions likestream_get_line()
read
directly from the buffer instead. So it's possible somewhere in this
function (line 1026 of main/streams/streams.c
http://lxr.php.net/xref/PHP_5_4/main/streams/streams.c#1026) lies the
bug.The issue seems to be related to #44607, but that one got fixed years
ago.Is anybody able to confirm this behaviour or has stumbled upon this?
Furthermore the behaviour of stream_get_line on an empty file seems to
have changed between php 5.3.10 and php 5.3.11:<?php
$file = DIR . 'empty.txt';
file_put_contents( $file, '' );
$fh = fopen( $file, 'rb' );
$data = stream_get_line( $fh, 4096 );
var_dump( $data );result in
string(0) ""
for php 5.3.10
and in
bool(false)
for php > 5.3.10.
I don't know if this should be considered a bug, but as far as I know
such a behaviour should not change during minor releases...
Any insight is appreciated!
Greetings
Nico
--
--
Tjerk
--
Tjerk
Hi again!
Thanks for your help an comments on the issue.
cataphract commented on the stream_get_line behaviour (returning false
when used on an empty file) on the bug report page.
I do agree that reading on an empty file can be considered an error
thus returning false, because there's nothing to read.
Unfortunately, and that's why we stumbled upon this in the first
place, feof does not return true when opening an empty file.
I did not have a look at the internals, but my guess is that feof just
does not return true because no one did a read on the file handle yet.
To my mind if would be sensible to return true using feof on an empty
file, so that one does not actually try a read...
What do you think?
Greetings
Nico
Sent from my iPhone
Hi again!
Thanks for your help an comments on the issue.
cataphract commented on the stream_get_line behaviour (returning false
when used on an empty file) on the bug report page.
I do agree that reading on an empty file can be considered an error
thus returning false, because there's nothing to read.Unfortunately, and that's why we stumbled upon this in the first
place, feof does not return true when opening an empty file.
I did not have a look at the internals, but my guess is that feof just
does not return true because no one did a read on the file handle yet.
To my mind if would be sensible to return true using feof on an empty
file, so that one does not actually try a read...
That wouldn't be right. Technically the EOF should be discovered, not deduced from other information like stat()
. Also, some streams don't support reporting an appropriate size.
What do you think?
Greetings
Nico
Em 2012-10-10 15:52, Tjerk Meesters escreveu:
Sent from my iPhone
On 10 Oct, 2012, at 6:39 PM, Nicolai Scheer
nicolai.scheer@gmail.com wrote:Hi again!
Thanks for your help an comments on the issue.
cataphract commented on the stream_get_line behaviour (returning
false
when used on an empty file) on the bug report page.
I do agree that reading on an empty file can be considered an error
thus returning false, because there's nothing to read.Unfortunately, and that's why we stumbled upon this in the first
place, feof does not return true when opening an empty file.
I did not have a look at the internals, but my guess is that feof
just
does not return true because no one did a read on the file handle
yet.
To my mind if would be sensible to return true using feof on an
empty
file, so that one does not actually try a read...That wouldn't be right. Technically the EOF should be discovered, not
deduced from other information likestat()
. Also, some streams don't
support reporting an appropriate size.What do you think?
I second what Mr. Meesters has said. The end-of-file indicator should
be discovered. That's how stdio works, which is what PHP's function is
modeled after. There's nothing wrong about the indicator being set in a
read call that returned no data, be it because the file is empty or
because the read before just happened to read all the data left.
Before some recent bug fixes, the successive return values would depend
somewhat on chance. bool(false) would only be returned after the
end-of-file had been discovered. So if the last read had read all the
data but had not found eof, then the next call would return an empty
string. But if it had (the most common scenario), it would return false.
This ambiguity, which is problematic mostly because stream_get_line()
strips off the delimiter, has been eliminated. The only one left is that
you cannot tell whether the input stream ends with the delimiter or not:
$ php -r '$fd = fopen("php://temp", "r+"); fwrite($fd, "aa");
rewind($fd); var_dump(stream_get_line($fd, 10, "MM"),
stream_get_line($fd, 10, "MM"));'
string(2) "aa"
bool(false)
$ php -r '$fd = fopen("php://temp", "r+"); fwrite($fd, "aaMM");
rewind($fd); var_dump(stream_get_line($fd, 10, "MM"),
stream_get_line($fd, 10, "MM"));'
string(2) "aa"
bool(false)
Finally, what I suggest is that use stream_get_line()
in the same way
that's recommended for fgets()
:
while (($buffer = stream_get_line($handle, 8192, "MM")) !== false)
{
if (strlen($buffer) == 8192) {
//You may consider this an error
}
echo $buffer;
}
If you're using non-blocking sockets, stream_get_line()
may return
false temporarily. In those cases, you may want to do something like:
//stream_get_line() will return false if the stream is temporarily
out
//of data, even if there's some data buffered, as long that
buffered data
//is less than the maxsize you specify and it doesn't contain the
delimiter
do {
//call stream_select()
here to wait to for data
while (($buffer = stream_get_line($handle, 8192, "MM")) !==
false) {
if (strlen($buffer) == 8192) {
//You may consider this an error
}
echo $buffer;
}
} while (!feof($handle))
--
Gustavo Lopes