Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:34933 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 52761 invoked by uid 1010); 25 Jan 2008 22:20:51 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 52745 invoked from network); 25 Jan 2008 22:20:51 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 25 Jan 2008 22:20:51 -0000 Authentication-Results: pb1.pair.com smtp.mail=stas@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=stas@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 212.25.124.162 as permitted sender) X-PHP-List-Original-Sender: stas@zend.com X-Host-Fingerprint: 212.25.124.162 mail.zend.com Windows 2000 SP4, XP SP1 Received: from [212.25.124.162] ([212.25.124.162:44186] helo=mx1.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id C0/56-08850-1C06A974 for ; Fri, 25 Jan 2008 17:20:50 -0500 Received: from us-ex1.zend.com ([192.168.16.5]) by mx1.zend.com with Microsoft SMTPSVC(6.0.3790.3959); Sat, 26 Jan 2008 00:20:46 +0200 Received: from [192.168.16.90] ([192.168.16.90]) by us-ex1.zend.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 25 Jan 2008 14:20:42 -0800 Message-ID: <479A60BA.7030905@zend.com> Date: Fri, 25 Jan 2008 14:20:42 -0800 Organization: Zend Technologies User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: 'PHP Internals' Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 25 Jan 2008 22:20:42.0915 (UTC) FILETIME=[86177730:01C85FA0] Subject: json_encode() bug From: stas@zend.com (Stanislav Malyshev) Hi! Right now, if json_encode sees wrong UTF-8 data, it just cuts the string in the middle, no error returned, no message produced. Example: var_dump(json_encode("ab\xE0")); var_dump(json_encode("ab\xE0\"")); Both strings get cut at "ab". I think it's not a good idea to just silently cut the data. In fact, I think it is a bug caused by this code in ext/json/utf8_to_utf16.c: if (c < 0) { return UTF8_END ? the_index : UTF8_ERROR; } which inherited this bug from code published on json.org. It should be: if (c < 0) { return (c == UTF8_END) ? the_index : UTF8_ERROR; } Now this is an easy fix but would lead to bad strings silently converted to empty strings. The question is - should we have an error there? If so, which one - E_WARNING, E_NOTICE? I'm for E_WARNING. Also filed as bug #43941. Any comments? -- Stanislav Malyshev, Zend Software Architect stas@zend.com http://www.zend.com/ (408)253-8829 MSN: stas@zend.com