Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:100759 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 67384 invoked from network); 22 Sep 2017 22:09:27 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 22 Sep 2017 22:09:27 -0000 Authentication-Results: pb1.pair.com header.from=cmbecker69@gmx.de; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=cmbecker69@gmx.de; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmx.de designates 212.227.17.21 as permitted sender) X-PHP-List-Original-Sender: cmbecker69@gmx.de X-Host-Fingerprint: 212.227.17.21 mout.gmx.net Received: from [212.227.17.21] ([212.227.17.21:50167] helo=mout.gmx.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BD/80-62331-41A85C95 for ; Fri, 22 Sep 2017 18:09:25 -0400 Received: from [192.168.1.190] ([79.243.117.113]) by mail.gmx.com (mrgmx103 [212.227.17.168]) with ESMTPSA (Nemesis) id 0MVedf-1dsbkN3gat-00YwHG; Sat, 23 Sep 2017 00:09:21 +0200 To: Dan Ackroyd Cc: PHP internals References: <7cf5adb8-0738-259e-6d1e-f966722fdae2@gmx.de> Message-ID: Date: Sat, 23 Sep 2017 00:09:23 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: de-DE Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:oJIkt2WUGrskGKvY2gTBrMWaepR6Fc8WFVQrnGDFs0zn1ovRacU LBStkutQaz7N6gLDNuVnhGy6ojUMrFbPo3T6yDCKgFXGqhSi+lf04h7i/2cuI7c/KbO+naq 1Qxmo3psAuij+c4xznA/MDdngNk0IwfSgMv02prkug6UXKZcnWLaE/fMkG4zeuzVrJgBkgs FMkdcRvkQUXrZkSUqRpMg== X-UI-Out-Filterresults: notjunk:1;V01:K0:X07vogpnZz8=:fcZDdtu8x7rJnhwkkrcJbv 9warAR3IUgSzWkNmt+XR1NPQTCYj6S5dEyXBRvEH6F73DcPfUZvvy2BkdzdDfBK3cIvRfSp0A GRNFs8CUJ/5RAoTVDdUVGlIn8SiL+CWzeFVM65J8dk0rlWQxbyqY2rJZAPEdccW185pe1jz4U lAUETpYkgNOsKnK/hJo5PZtFCEy7SIRvo5HrpTaGZKNjty+4xoC4HOutXxfaebo2o3wqu/R5T Bfbqkld8UaHnbFex6GmKvdJ6rotIDnlXg97sRTYcXR2di3KtXVRnuRlJBMVtsxixZ0XUIlioP gb78S48BdEqKzHE+9rmrVH576CQs/n9wPbBo2ocr+TipsyjKiqlbxo07zB88G/GurvdA7odfL 3Igyyfv5Dw2OBrv55279Gh8usnJiLg5VpiinFNY3yAifTiuEidI3b6FxVzAo+IGb3Kh2rxHwT APfGNYtfgexpMAum7f2VpOpXpIv8oWFeUY2HC5sc1YYMKfKSLeVboObiobuCe50tCq7le7YyJ kfyEEuPRF5bHGIhEIeJdvxi7B/iKBmR0HbZM9cYZSczq85r/Lw8jqd/jRSmDobYmE9rznP+9B e7G45yLFgxZypFfVbioiqkg0RtLjYS3ZFPzXayT4ZInwkVNM4NnRk8RL/oBGISO4jnS5Pu3qu rhP6dz4r0M6lkQCSuQL3wdFfEWqK8sPlp4oCWLsfmhjPh/MvEpLKGFPnXwtLoWSRQaphnY6wF 0kHe8h5YZNLP8H0hn8/bDIQ4HquLaevIt3JpCG2K3muNv3UPkyZVBRs46xTVvtmeTH6qRQvlq H3/VozrFYbO6qSo79gmeWt5UC0KY/SUAykFQWHA5mw/xsnpYAc= Subject: Re: [PHP-DEV] fputcsv() and $escape character From: cmbecker69@gmx.de ("Christoph M. Becker") On 21.09.2017 at 23:08, Dan Ackroyd wrote: > On 21 September 2017 at 12:43, Christoph M. Becker wrote: > >> There are several bug reports regarding "broken" fputcsv() behavior in >> our tracker, namely, because the $escape parameter causes unexpected >> results. For instance: > > I looked at fixing some of the CSV related bugs about a year or so > ago. My conclusions were: > > i) There is no way to fix the problems that wouldn't cause horrible BC > breaks for code that is only coincidentally working currently. > > ii) Handling strings in C is much more error prone than handling them in PHP. > > I'm reasonably certain that trying to fix the current functions is the > wrong approach, and one of the following would be much better. > > Either, find a C library that has already been proven to handle CSV > parsing/generating 'correctly' and bring that into PHP core under > either new function names or namespace. > > Or, write the code in PHP (or just use > https://github.com/thephpleague/csv) and find a way to make that fast > enough for people to use. > > Touching the existing code is pretty certain to bring a lot of pain, > without resulting in a fully compliant csv parser/generator. I agree that php_fgetcsv() has serious issues, and it might not be possible to fix it without causing severe BC breaks. php_fputcsv(), on the other hand, is less of a problem, though. Overall, the most demanding issue is that both functions try to regard the current locale, but already fail that generally, since several parameters are declared as char, which can't work for (some) multibyte encodings. For instance, it is impossible to generate proper UTF-16 encoded CSV files, or to read them. This issue continues, because several (mostly?) whitespace characters are hard-coded assuming an ASCII compatible character encoding. A minor issue are the hard-coded record terminators, which are currently LF (RFC 4180 specifies CRLF). Apparently, this isn't a real issue nowadays (besides the missing support for non ASCII compatible character encodings). Another issue concerns the escape character. Frankly, I don't even have the slightest clue how that is supposed to work, and why it even has been introduced in the first place. Maybe it has been introduced for compatibility with some application requiring it; maybe it has been introduced to support "DSV style"[1]. If the latter, at least php_fputcsv() doesn't support it (anymore). Unless it's clear what the escape character exactly is supposed to do, we *cannot* even *hope* to fix the implementation. Introducing new functions with a clearly defined behavior would be nice, but appears to me somewhat as pie in the sky (somebody would have to do the actual work!). But even if we do so, at least the actual behavior of the existing functions would have to be documented. And frankly, I don't see why it would be a problem to allow to use no escape character for fputcsv(). That certainly wouldn't be a BC break, since currently the function bails out if `escape_str_len < 1`. Of course, that wouldn't fix all issues, but it appears to make the function work as expected for ASCII compatible character encodings (for other character encodings the function appears to be broken anyway). Ad league/csv: rather impressive! However, including this functionality into ext/standard is totally over the top, in my opinion. I guess that fgetcsv() and fputcsv() are mostly used for importing from and exporting to CSV, respectively, but not as replacement for an SQL database engine. [1] -- Christoph M. Becker