Hi!
I hereby put the “Kill proprietary CSV escaping mechanism” under discussion:
https://wiki.php.net/rfc/kill-csv-escaping
Any comments are welcome!
--
Christoph M. Becker
I hereby put the “Kill proprietary CSV escaping mechanism” under discussion:
https://wiki.php.net/rfc/kill-csv-escaping
Any comments are welcome!
Huge +1 from me! This would remove the need for a lot of nasty hacks and
workarounds when reading/writing CSVs.
From my perspective, the first part of the proposal (allow passing an empty
string as the $escape argument) hardly needs an RFC, since it's essentially
a bugfix and shouldn't break userland.
So it seems like the main question the RFC will decide is a deprecation
timeline for the $escape argument. I'm not sure it makes sense to have
separate deprecation phases for passing a non-empty $escape string, and
passing an explicit escape argument at all. This would require developers
to change their code twice: once to pass an empty string, and again to
remove the empty string.
Would it be acceptable to change the default value of $escape to an empty
string and deprecate passing an explicit $escape parameter in PHP 7.4?
This would be a minor BC break, but would allow developers to fix their
code once, so it will continue working when the escape parameter is removed
(possibly in PHP 8).
But in the end I don't care so much about the exact timeline, as long as
this issue can finally be fixed!
Thanks again for your work on this,
Theodore Brown
Hi!
I hereby put the “Kill proprietary CSV escaping mechanism” under discussion:
For fputcsv (and generally writing) this is probably OK. I am much more
concerned about reading - this may make files generated by PHP be
unreadable by php. And some use cases keep archives for a very long
time. Trying to find something in a backup and discovering that your
code is not longer able to read your data is no fun.
--
Stas Malyshev
smalyshev@gmail.com
Hi Stas
On Fri, Sep 28, 2018, 2:00 PM Stanislav Malyshev smalyshev@gmail.com
wrote:
Hi!
I hereby put the “Kill proprietary CSV escaping mechanism” under
discussion:For fputcsv (and generally writing) this is probably OK. I am much more
concerned about reading - this may make files generated by PHP be
unreadable by php. And some use cases keep archives for a very long
time. Trying to find something in a backup and discovering that your
code is not longer able to read your data is no fun.
This is a very good point. Removing it in read mode is going to break
things badly out there (I had a few projects using it. We need a sensible
decision here.
best,
Pierre
On Fri, Sep 28, 2018, 2:00 PM Stanislav Malyshev smalyshev@gmail.com
wrote:I hereby put the “Kill proprietary CSV escaping mechanism” under
discussion:For fputcsv (and generally writing) this is probably OK. I am much more
concerned about reading - this may make files generated by PHP be
unreadable by php. And some use cases keep archives for a very long
time. Trying to find something in a backup and discovering that your
code is not longer able to read your data is no fun.This is a very good point. Removing it in read mode is going to break
things badly out there (I had a few projects using it. We need a sensible
decision here.
If we cannot eventually get rid of the escaping for CSV reading (note
that I proposed to keep the possibility to use it until PHP 9), I
don't think that we should remove it for CSV writing either, since that
would make SplFileObject::setCsvControl() and ::getCsvControl() pretty
much confusing. Consequently, it wouldn't make sense to deprecate using
a non-empty string as $escape, and therefore we can't change the
default. So we're basically left with
https://github.com/php/php-src/pull/3515, which, in my opinion,
doesn't need an RFC at all.
Anybody who uses the escaping mechanism (deliberately or unwittingly)
should be aware of the roundtrip issues (see
https://bugs.php.net/74713 and https://bugs.php.net/76940), though.
--
Christoph M. Becker
Hi!
If we cannot eventually get rid of the escaping for CSV reading (note
that I proposed to keep the possibility to use it until PHP 9), I
don't think that we should remove it for CSV writing either, since that
would make SplFileObject::setCsvControl() and ::getCsvControl() pretty
much confusing. Consequently, it wouldn't make sense to deprecate using
SplFileObject is a bit of a problem since it has single setting for both
reading and writing. I'd say it's fine to deprecate using escape there,
with the idea that if somebody needs it for reading, they'd ignore the
deprecation message, or use fgetcsv. I guess using single set of setting
for both read and write is not that great an idea, in retrospect, but we
probably can't do too much about it.
I think allowing not having an escape and making it the default is a
good idea, though mismatch between read and write may be problematic. At
least the first part - allowing not having escape - I think should be
done and does not need RFC even, but the default change probably does
and I think this should be in a major version, since it's a BC break,
albeit not a major one. So, I'd do this:
- Enable no escape char at all ASAP
- Switch to no escape by default for read & write for PHP 8
- Deprecate non-empty escape for SplFileObject and fputcsv for PHP 8,
leave fgetcsv alone.
--
Stas Malyshev
smalyshev@gmail.com
If we cannot eventually get rid of the escaping for CSV reading (note
that I proposed to keep the possibility to use it until PHP 9), I
don't think that we should remove it for CSV writing either, since that
would make SplFileObject::setCsvControl() and ::getCsvControl() pretty
much confusing. Consequently, it wouldn't make sense to deprecate usingSplFileObject is a bit of a problem since it has single setting for both
reading and writing. I'd say it's fine to deprecate using escape there,
with the idea that if somebody needs it for reading, they'd ignore the
deprecation message, or use fgetcsv. I guess using single set of setting
for both read and write is not that great an idea, in retrospect, but we
probably can't do too much about it.I think allowing not having an escape and making it the default is a
good idea, though mismatch between read and write may be problematic. At
least the first part - allowing not having escape - I think should be
done and does not need RFC even, but the default change probably does
and I think this should be in a major version, since it's a BC break,
albeit not a major one. So, I'd do this:
- Enable no escape char at all ASAP
- Switch to no escape by default for read & write for PHP 8
- Deprecate non-empty escape for SplFileObject and fputcsv for PHP 8,
leave fgetcsv alone.
I don't think we're doing anyone a favor by introducing even more
inconsistencies between reading and writing CSV. Also, if we stick with
the legacy CSV reading, we can't simplify the overly complex
implementation. Therefore, I suggest to only add support for an empty
$escape parameter (PR #3515), and withdraw the RFC to deprecate and
eventually remove the proprietary escaping mechanism.
--
Christoph M. Becker
Hi!
implementation. Therefore, I suggest to only add support for an empty
$escape parameter (PR #3515),
Sounds good.
Stas Malyshev
smalyshev@gmail.com
On Thu, Sep 27, 2018 at 12:29 PM Christoph M. Becker cmbecker69@gmx.de
wrote:
Hi!
I hereby put the “Kill proprietary CSV escaping mechanism” under
discussion:https://wiki.php.net/rfc/kill-csv-escaping
Any comments are welcome!
Could you please add a description of how the escaping mechanism currently
works for read and write? My vague recollection is that the write and read
behavior actually have nothing to do with each other and the fputcsv
$escape parameter would be better described as the $corrupt parameter. We
may want to treat both cases in different ways.
Nikita
On Thu, Sep 27, 2018 at 12:29 PM Christoph M. Becker cmbecker69@gmx.de
wrote:I hereby put the “Kill proprietary CSV escaping mechanism” under
discussion:https://wiki.php.net/rfc/kill-csv-escaping
Any comments are welcome!
Could you please add a description of how the escaping mechanism currently
works for read and write? My vague recollection is that the write and read
behavior actually have nothing to do with each other and the fputcsv
$escape parameter would be better described as the $corrupt parameter. We
may want to treat both cases in different ways.
It's hard for me to describe something whose sense escapes me. How can
two competing escaping mechanisms work at the same time?
Anyhow, from looking at the implementation the write case (fputcsv)[1]
is pretty clear:
- if a field contains an escape character, it is enclosed with the
enclosure character - if an enclosure character is preceeded by an escape character in a
field, both are written verbatim (i.e. the enclosure character is not
doubled)
The implementation of the CSV reading[2] is way more complex, but
appears to be broken anyway. For instance:
$str = '"\\"a"';
print_r(str_getcsv($str));
outputs:
Array
(
[0] => \a"
)
Note that $str could have been produced by fputcsv($stream, ['\\"a']).
[1]
https://github.com/php/php-src/blob/php-7.3.0RC2/ext/standard/file.c#L1932-L1990
[2]
https://github.com/php/php-src/blob/php-7.3.0RC2/ext/standard/file.c#L2092-L2349
--
Christoph M. Becker