Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:37142 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 7134 invoked from network); 19 Apr 2008 04:43:17 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 19 Apr 2008 04:43:17 -0000 Authentication-Results: pb1.pair.com smtp.mail=alan@akbkhome.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=alan@akbkhome.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain akbkhome.com designates 202.81.246.113 as permitted sender) X-PHP-List-Original-Sender: alan@akbkhome.com X-Host-Fingerprint: 202.81.246.113 246-113.netfront.net Received: from [202.81.246.113] ([202.81.246.113:46406] helo=akbkhome.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 41/E5-06290-36879084 for ; Sat, 19 Apr 2008 00:43:16 -0400 Received: from wideboy ([192.168.0.27]) by akbkhome.com with esmtp (Exim 4.67) (envelope-from ) id 1Jn4uu-0000j6-OO; Sat, 19 Apr 2008 12:43:13 +0800 Message-ID: <4809785B.80303@akbkhome.com> Date: Sat, 19 Apr 2008 12:43:07 +0800 User-Agent: Thunderbird 2.0.0.12 (X11/20080227) MIME-Version: 1.0 To: Stanislav Malyshev CC: 'PHP Internals' References: <479A60BA.7030905@zend.com> In-Reply-To: <479A60BA.7030905@zend.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-mailfort-sig: 5aed94233034eeaa981e4743caec5744 Subject: Re: [PHP-DEV] json_encode() bug From: alan@akbkhome.com (Alan Knowles) This should really be fixed similar to the iconv //IGNORE flag - so that bad characters are just replaced with '?' We use it to render spam email summaries, and dont really care if the encoding is incorrect, just as long as it shows something. Throwing a warning without having a fix/workaround, just reduces the usefulness of the function. Regards Alan Stanislav Malyshev wrote: > Hi! > > Right now, if json_encode sees wrong UTF-8 data, it just cuts the > string in the middle, no error returned, no message produced. Example: > > var_dump(json_encode("ab\xE0")); > var_dump(json_encode("ab\xE0\"")); > > Both strings get cut at "ab". I think it's not a good idea to just > silently cut the data. In fact, I think it is a bug caused by this > code in ext/json/utf8_to_utf16.c: > if (c < 0) { > return UTF8_END ? the_index : UTF8_ERROR; > } > which inherited this bug from code published on json.org. It should be: > if (c < 0) { > return (c == UTF8_END) ? the_index : UTF8_ERROR; > } > Now this is an easy fix but would lead to bad strings silently > converted to empty strings. The question is - should we have an error > there? If so, which one - E_WARNING, E_NOTICE? I'm for E_WARNING. > Also filed as bug #43941. > Any comments?