Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:6375 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 9730 invoked by uid 1010); 12 Dec 2003 21:19:02 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 9605 invoked from network); 12 Dec 2003 21:19:01 -0000 Received: from unknown (HELO vckyb4.nw.wakwak.com) (211.9.231.145) by pb1.pair.com with SMTP; 12 Dec 2003 21:19:01 -0000 Received: from at.wakwak.com (at.wakwak.com [211.9.230.135]) by vckyb4.nw.wakwak.com (Postfix) with ESMTP id 5357C3FE77; Sat, 13 Dec 2003 06:19:00 +0900 (JST) Received: from [192.168.0.130] (z152.218-225-128.ppp.wakwak.ne.jp [218.225.128.152]) by at.wakwak.com (8.12.10/8.12.10/2003-09-30) with ESMTP/inet id hBCLJ0ng080898; Sat, 13 Dec 2003 06:19:00 +0900 (JST) (envelope-from moriyoshi@at.wakwak.com) In-Reply-To: <200312121551.25399.ilia@prohost.org> References: <25BBBBC2-2CD2-11D8-8FCC-000A95CE0C62@at.wakwak.com> <200312121509.19291.ilia@prohost.org> <200312121551.25399.ilia@prohost.org> Mime-Version: 1.0 (Apple Message framework v606) Content-Type: multipart/mixed; boundary=Apple-Mail-3-614933393 Message-ID: Cc: PHP Internals Date: Sat, 13 Dec 2003 06:18:50 +0900 To: ilia@prohost.org X-Mailer: Apple Mail (2.606) Subject: Re: [PHP-DEV] Re: Regarding the latest patch on fgetcsv() (stable branch) From: moriyoshi@at.wakwak.com (Moriyoshi Koizumi) --Apple-Mail-3-614933393 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed On 2003/12/13, at 5:51, Ilia Alshanetsky wrote: > How about we add mb_fgetcsv(), which would have full multi-byte support > (including delimeters). I'd imagine for people who need to parse > multi-byte > csv files, full functionality is more important then speed. As for the > fgetcsv() in ext/standard/, we can port the 4.3.X code (copy & paste > really) > and let PHP 5 users benefit from a faster fgetcsv() for common > applications. > What do you think? I disagree, because of the following reasons: 1) Not a few people *actually* use fgetcsv() commonly with multibyte characters indeed. Regarding this, applications made by those who don't use such characters don't (and won't) use multibyte specific functions and that's the problem. This greatly prevents them from being portable. 2) IMO speed is not a key factor here. People rather wants trust-worthy behaviour. 3) fgetcsv() implementation in the stable branch is now too complicated to add a new feature to and also hard to maintain. We should be able to eliminate the mblen() calls for acceptable performance. See the attached result. Moriyoshi p.s. fgetcsv() in the stable branch still seems to segfault with the attached test case (segfault.php.txt). [The benchmark result] My code with mblen() (on php5-csv): real 0m1.389s user 0m1.330s sys 0m0.060s Ditto without mblen(): real 0m0.396s user 0m0.350s sys 0m0.040s Your code (on php4-csv): real 0m0.332s user 0m0.270s sys 0m0.060s --Apple-Mail-3-614933393 Content-Transfer-Encoding: 7bit Content-Type: text/plain; x-unix-mode=0644; name="bench.php.txt" Content-Disposition: attachment; filename=bench.php.txt --Apple-Mail-3-614933393 Content-Transfer-Encoding: 7bit Content-Type: text/plain; x-unix-mode=0644; name="eliminate-mblen-patch.diff.txt" Content-Disposition: attachment; filename=eliminate-mblen-patch.diff.txt Index: ext/standard/php_string.h =================================================================== RCS file: /repository/php-src/ext/standard/php_string.h,v retrieving revision 1.83 diff -u -r1.83 php_string.h --- ext/standard/php_string.h 10 Dec 2003 21:23:35 -0000 1.83 +++ ext/standard/php_string.h 12 Dec 2003 21:16:09 -0000 @@ -144,15 +144,7 @@ #define strerror php_strerror #endif -#ifndef HAVE_MBLEN -# define php_mblen(ptr, len) 1 -#else -# if defined(_REENTRANT) && defined(HAVE_MBRLEN) && defined(HAVE_MBSTATE_T) -# define php_mblen(ptr, len) ((ptr) == NULL ? mbsinit(&BG(mblen_state)): (int)mbrlen(ptr, len, &BG(mblen_state))) -# else -# define php_mblen(ptr, len) mblen(ptr, len) -# endif -#endif +#define php_mblen(ptr, len) 1 void register_string_constants(INIT_FUNC_ARGS); --Apple-Mail-3-614933393 Content-Transfer-Encoding: 7bit Content-Type: text/plain; x-unix-mode=0644; name="segfault.php.txt" Content-Disposition: attachment; filename=segfault.php.txt --Apple-Mail-3-614933393 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed --Apple-Mail-3-614933393--