Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79058 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 44775 invoked from network); 21 Nov 2014 01:53:46 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Nov 2014 01:53:46 -0000 Authentication-Results: pb1.pair.com header.from=tjerk.meesters@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=tjerk.meesters@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.220.50 as permitted sender) X-PHP-List-Original-Sender: tjerk.meesters@gmail.com X-Host-Fingerprint: 209.85.220.50 mail-pa0-f50.google.com Received: from [209.85.220.50] ([209.85.220.50:51518] helo=mail-pa0-f50.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 8B/C1-32541-82B9E645 for ; Thu, 20 Nov 2014 20:53:44 -0500 Received: by mail-pa0-f50.google.com with SMTP id bj1so3761166pad.37 for ; Thu, 20 Nov 2014 17:53:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=dTE93ekPqYKWqX42JUIyPgveVdvmpYkVs5+/dycBh6E=; b=yuOZw5+xVHdteYh+cDMKufg9MtbVAQQebuDJAVsyxO4wdDSGHvPx1nj+343rO4IV8U F1hqzzN/OpwEINSe2KMPJamKUiAyPmcMc88PM8qyI3slBUAk8F0XFhd8FVHua4vi379Z /LkVctknnoW4KhDZcJEgCNEIs9/2mtFvGGgJ3PB+Y3UO9cIN+2enFayi6NZ5oRkt6Hjm PnPejraPp8mIHuVOSpQXNHbJCiELa2f4bWqg3HnoqgLGgq4MXPZUHY/8yz3Xc7Hzv04I aXKToZtyE6M+tpS1NZVIJIp3FDegwyGPndVGH51nqiFBBwgyG0mQre6G4LnwRR9g2sRd +Cqw== X-Received: by 10.70.118.165 with SMTP id kn5mr2336874pdb.140.1416534821465; Thu, 20 Nov 2014 17:53:41 -0800 (PST) Received: from tjerks-imac.gateway.2wire.net (bb121-7-198-24.singnet.com.sg. [121.7.198.24]) by mx.google.com with ESMTPSA id cq6sm3205831pad.30.2014.11.20.17.53.39 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 20 Nov 2014 17:53:40 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) In-Reply-To: <546E2FA1.1010006@gmx.de> Date: Fri, 21 Nov 2014 09:53:37 +0800 Cc: PHP Internals Content-Transfer-Encoding: quoted-printable Message-ID: <058DB7FA-6A59-4FDD-BFF8-6430A0ED3674@gmail.com> References: <82139FDD-8D8B-43D9-B811-ECC1FFE6E8A6@gmail.com> <546CC4C8.7010001@gmx.de> <8E49EAC8-3681-4E36-BDAD-163D292193DD@gmail.com> <546E2FA1.1010006@gmx.de> To: Christoph Becker X-Mailer: Apple Mail (2.1993) Subject: Re: fgetcsv incompatible with fputcsv From: tjerk.meesters@gmail.com (Tjerk Meesters) > On 21 Nov 2014, at 02:14, Christoph Becker wrote: >=20 > Tjerk Meesters wrote: >=20 >>> On 20 Nov 2014, at 00:26, Christoph Becker = wrote: >>>=20 >>> Are you aware of ? It = seems this >>> very inconsistency has been reported a few years ago, but has been >>> tagged as "Wont fix" back then. >>=20 >> Actually that bug report seems to suggest that fputcsv() uses = backslash to encode enclosure characters, but AFAICT it doesn=E2=80=99t. >=20 > Apparently, there is a somewhat hidden bug, see = > for a simplified test script. The expected result is >=20 > string(14) ""a""b","a\""b"" >=20 > or maybe >=20 > string(14) ""a\"b","a\\"b"" >=20 > The actual result makes no sense to me, even though str_getcsv() = parses > it "correctly=E2=80=9D. That works exactly for the wrong reasons: 1) upon seeing an escape character fgetcsv() will print that and the = following character 2) fputcsv() actually accepts an escape character too (despite what the = documentation says) but treats it in the wrong way by not escaping that = and the following character The expected output, based on the given code should (imo) be: string(15) ""a\"b","a\\\"b"" Or: if the escape character is a double quote: string(15) =E2=80=9C"a""b",=E2=80=9Da\""b=E2=80=9D=E2=80=9D Unfortunately I can=E2=80=99t satisfy all the related bug reports, some = decision of =E2=80=9Ccorrectness=E2=80=9D needs to be made in the form = of an RFC. >=20 >> And then there are bug reports like = https://bugs.php.net/bug.php?id=3D67566 which were fixed but really just = made the situation worse =3D( >=20 > ACK. >=20 >>> also seems to deal with = this >>> inconsistency, and had been tagged as "Not a bug". >>>=20 >>> So maybe an RFC is appropriate? >>=20 >> Yeah, I didn=E2=80=99t realise the can of worms until I opened it; = I=E2=80=99ll round up all the bug reports and run them against whatever = RFC I can get my hands on. >>=20 >> PS: Favourite quote from the semi-authoritative spec of Perl_CSV: = http://rath.ca/Misc/Perl_CSV/CSV-2.0.html#csv: >>=20 >>> Given that the essence of CSV files is simplicity, I have decided to = reject all escape and escaped characters with the exception of quoation = marks appearing within quotation marks =E2=80=A6 >>=20 >> Good times :) >=20 > One might argue that the essence of CSV files is being a data exchange > format, so applying Postel's law would be reasonable. :) >=20 > --=20 > Christoph M. Becker >=20