Hi list,
As I was fiddling with CSV data reading and writing I noticed that fgetcsv()
is inherently incompatible with fputcsv()
when it comes to the enclosure escape character that’s used.
Example: http://3v4l.org/LHEZj
The above example code demonstrates how, by default, fputcsv()
encodes a backslash as is but fgetcsv()
will treat that same backslash as the enclosure escape character as well as the enclosure character itself; this is rather surprising behaviour and imho unnecessarily complicated.
I would suggest changing the behaviour in such a way that:
a) the default enclosure escape character for fgetcsv()
is a double quote.
b) fgetcsv()
only treats the escape character as … an escape character.
Due to the kind of change BC can’t be maintained, so I’d propose this change for PHP 7.
If anyone has violent objections to the above, or thinks that an RFC should be drawn up, do let me know … otherwise I’ll commit the change into master by next week or so.
Best,
Tjerk
Tjerk Meesters wrote on 19/11/2014 12:20:
Hi list,
As I was fiddling with CSV data reading and writing I noticed that
fgetcsv()
is inherently incompatible withfputcsv()
when it comes to the enclosure escape character that’s used.Example: http://3v4l.org/LHEZj
The above example code demonstrates how, by default,
fputcsv()
encodes a backslash as is butfgetcsv()
will treat that same backslash as the enclosure escape character as well as the enclosure character itself; this is rather surprising behaviour and imho unnecessarily complicated.I would suggest changing the behaviour in such a way that:
a) the default enclosure escape character forfgetcsv()
is a double quote.
b)fgetcsv()
only treats the escape character as … an escape character.Due to the kind of change BC can’t be maintained, so I’d propose this change for PHP 7.
If anyone has violent objections to the above, or thinks that an RFC should be drawn up, do let me know … otherwise I’ll commit the change into master by next week or so.
Best,
Tjerk
If I remember rightly, some of the escaping behaviour is actually
incompatible with MS Excel (I believe 2000 was the current version when
I tested) as well, which struck me as rather unfortunate.
However, I'm always a bit wary of subtle breaking changes like this, so
we need to be sure we definitely get it right this time, and that there
are documented options to both functions which emulate the old behaviour.
Regards,
Rowan Collins
[IMSoP]
Tjerk Meesters wrote:
Hi list,
As I was fiddling with CSV data reading and writing I noticed that
fgetcsv()
is inherently incompatible withfputcsv()
when it comes to the enclosure escape character that’s used.Example: http://3v4l.org/LHEZj
The above example code demonstrates how, by default,
fputcsv()
encodes a backslash as is butfgetcsv()
will treat that same backslash as the enclosure escape character as well as the enclosure character itself; this is rather surprising behaviour and imho unnecessarily complicated.I would suggest changing the behaviour in such a way that:
a) the default enclosure escape character forfgetcsv()
is a double quote.
b)fgetcsv()
only treats the escape character as … an escape character.Due to the kind of change BC can’t be maintained, so I’d propose this change for PHP 7.
If anyone has violent objections to the above, or thinks that an RFC should be drawn up, do let me know … otherwise I’ll commit the change into master by next week or so.
Are you aware of https://bugs.php.net/bug.php?id=50686? It seems this
very inconsistency has been reported a few years ago, but has been
tagged as "Wont fix" back then.
https://bugs.php.net/bug.php?id=38929 also seems to deal with this
inconsistency, and had been tagged as "Not a bug".
So maybe an RFC is appropriate?
--
Christoph M. Becker
Tjerk Meesters wrote:
Hi list,
As I was fiddling with CSV data reading and writing I noticed that
fgetcsv()
is inherently incompatible withfputcsv()
when it comes to the enclosure escape character that’s used.Example: http://3v4l.org/LHEZj
The above example code demonstrates how, by default,
fputcsv()
encodes a backslash as is butfgetcsv()
will treat that same backslash as the enclosure escape character as well as the enclosure character itself; this is rather surprising behaviour and imho unnecessarily complicated.I would suggest changing the behaviour in such a way that:
a) the default enclosure escape character forfgetcsv()
is a double quote.
b)fgetcsv()
only treats the escape character as … an escape character.Due to the kind of change BC can’t be maintained, so I’d propose this change for PHP 7.
If anyone has violent objections to the above, or thinks that an RFC should be drawn up, do let me know … otherwise I’ll commit the change into master by next week or so.
Are you aware of https://bugs.php.net/bug.php?id=50686? It seems this
very inconsistency has been reported a few years ago, but has been
tagged as "Wont fix" back then.
Actually that bug report seems to suggest that fputcsv()
uses backslash to encode enclosure characters, but AFAICT it doesn’t.
And then there are bug reports like https://bugs.php.net/bug.php?id=67566 which were fixed but really just made the situation worse =(
https://bugs.php.net/bug.php?id=38929 also seems to deal with this
inconsistency, and had been tagged as "Not a bug".So maybe an RFC is appropriate?
Yeah, I didn’t realise the can of worms until I opened it; I’ll round up all the bug reports and run them against whatever RFC I can get my hands on.
PS: Favourite quote from the semi-authoritative spec of Perl_CSV: http://rath.ca/Misc/Perl_CSV/CSV-2.0.html#csv:
Given that the essence of CSV files is simplicity, I have decided to reject all escape and escaped characters with the exception of quoation marks appearing within quotation marks …
Good times :)
--
Christoph M. Becker
Tjerk Meesters wrote:
Are you aware of https://bugs.php.net/bug.php?id=50686? It seems this
very inconsistency has been reported a few years ago, but has been
tagged as "Wont fix" back then.Actually that bug report seems to suggest that
fputcsv()
uses backslash to encode enclosure characters, but AFAICT it doesn’t.
Apparently, there is a somewhat hidden bug, see http://3v4l.org/El5Xs
for a simplified test script. The expected result is
string(14) ""a""b","a""b""
or maybe
string(14) ""a"b","a\"b""
The actual result makes no sense to me, even though str_getcsv()
parses
it "correctly".
And then there are bug reports like https://bugs.php.net/bug.php?id=67566 which were fixed but really just made the situation worse =(
ACK.
https://bugs.php.net/bug.php?id=38929 also seems to deal with this
inconsistency, and had been tagged as "Not a bug".So maybe an RFC is appropriate?
Yeah, I didn’t realise the can of worms until I opened it; I’ll round up all the bug reports and run them against whatever RFC I can get my hands on.
PS: Favourite quote from the semi-authoritative spec of Perl_CSV: http://rath.ca/Misc/Perl_CSV/CSV-2.0.html#csv:
Given that the essence of CSV files is simplicity, I have decided to reject all escape and escaped characters with the exception of quoation marks appearing within quotation marks …
Good times :)
One might argue that the essence of CSV files is being a data exchange
format, so applying Postel's law would be reasonable. :)
--
Christoph M. Becker
Tjerk Meesters wrote:
Are you aware of https://bugs.php.net/bug.php?id=50686? It seems this
very inconsistency has been reported a few years ago, but has been
tagged as "Wont fix" back then.Actually that bug report seems to suggest that
fputcsv()
uses backslash to encode enclosure characters, but AFAICT it doesn’t.Apparently, there is a somewhat hidden bug, see http://3v4l.org/El5Xs
for a simplified test script. The expected result isstring(14) ""a""b","a""b""
or maybe
string(14) ""a"b","a\"b""
The actual result makes no sense to me, even though
str_getcsv()
parses
it "correctly”.
That works exactly for the wrong reasons:
- upon seeing an escape character
fgetcsv()
will print that and the following character -
fputcsv()
actually accepts an escape character too (despite what the documentation says) but treats it in the wrong way by not escaping that and the following character
The expected output, based on the given code should (imo) be:
string(15) ""a"b","a\"b""
Or: if the escape character is a double quote:
string(15) “"a""b",”a""b””
Unfortunately I can’t satisfy all the related bug reports, some decision of “correctness” needs to be made in the form of an RFC.
And then there are bug reports like https://bugs.php.net/bug.php?id=67566 which were fixed but really just made the situation worse =(
ACK.
https://bugs.php.net/bug.php?id=38929 also seems to deal with this
inconsistency, and had been tagged as "Not a bug".So maybe an RFC is appropriate?
Yeah, I didn’t realise the can of worms until I opened it; I’ll round up all the bug reports and run them against whatever RFC I can get my hands on.
PS: Favourite quote from the semi-authoritative spec of Perl_CSV: http://rath.ca/Misc/Perl_CSV/CSV-2.0.html#csv:
Given that the essence of CSV files is simplicity, I have decided to reject all escape and escaped characters with the exception of quoation marks appearing within quotation marks …
Good times :)
One might argue that the essence of CSV files is being a data exchange
format, so applying Postel's law would be reasonable. :)--
Christoph M. Becker
Am 21.11.2014 um 02:53 schrieb Tjerk Meesters:
Apparently, there is a somewhat hidden bug, see http://3v4l.org/El5Xs
for a simplified test script. The expected result isstring(14) ""a""b","a""b""
or maybe
string(14) ""a"b","a\"b""
The actual result makes no sense to me, even though
str_getcsv()
parses
it "correctly”.That works exactly for the wrong reasons:
- upon seeing an escape character
fgetcsv()
will print that and the following characterfputcsv()
actually accepts an escape character too (despite what the documentation says) but treats it in the wrong way by not escaping that and the following character
Ah, I see, thanks for the explanation. Apparently[1], the $escape
parameter to fputcsv()
has been added in PHP 5.5. I have made a
respective edit at edit.php.net.
BTW: wouldn't it be reasonable to also add the $escape parameter to
SplFileObject::fputcsv()? For now one has to use
SplFileObject::setCsvControl().
The expected output, based on the given code should (imo) be:
string(15) ""a"b","a\"b""
Or: if the escape character is a double quote:
string(15) “"a""b",”a""b””
ACK. (Even though that would be a BC break.)
Unfortunately I can’t satisfy all the related bug reports, some decision of “correctness” needs to be made in the form of an RFC.
ACK. Thanks for taking the time. :)
[1] http://lxr.php.net/xref/PHP_5_5/ext/standard/file.c#1810 vs
http://lxr.php.net/xref/PHP_5_4/ext/standard/file.c#1807
--
Christoph M. Becker