Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:34935 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 24416 invoked by uid 1010); 26 Jan 2008 04:36:22 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 24400 invoked from network); 26 Jan 2008 04:36:22 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 26 Jan 2008 04:36:22 -0000 X-Host-Fingerprint: 82.41.135.70 82-41-135-70.cable.ubr02.glen.blueyonder.co.uk Received: from [82.41.135.70] ([82.41.135.70:29093] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 66/6E-08850-5C8BA974 for ; Fri, 25 Jan 2008 23:36:21 -0500 Message-ID: <66.6E.08850.5C8BA974@pb1.pair.com> To: internals@lists.php.net Date: Sat, 26 Jan 2008 04:35:43 +0000 Reply-To: nrixham@gmail.com User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 References: <479A60BA.7030905@zend.com> In-Reply-To: <479A60BA.7030905@zend.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Posted-By: 82.41.135.70 Subject: Re: json_encode() bug From: nrixham@gmail.com (Nathan Rixham) quick work around for now.. base64_decode(json_decode(json_encode(base64_encode("ab\xE0\" something")))) Stanislav Malyshev wrote: > Hi! > > Right now, if json_encode sees wrong UTF-8 data, it just cuts the string > in the middle, no error returned, no message produced. Example: > > var_dump(json_encode("ab\xE0")); > var_dump(json_encode("ab\xE0\"")); > > Both strings get cut at "ab". I think it's not a good idea to just > silently cut the data. In fact, I think it is a bug caused by this code > in ext/json/utf8_to_utf16.c: > if (c < 0) { > return UTF8_END ? the_index : UTF8_ERROR; > } > which inherited this bug from code published on json.org. It should be: > if (c < 0) { > return (c == UTF8_END) ? the_index : UTF8_ERROR; > } > Now this is an easy fix but would lead to bad strings silently converted > to empty strings. The question is - should we have an error there? If > so, which one - E_WARNING, E_NOTICE? I'm for E_WARNING. > Also filed as bug #43941. > Any comments?