Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:35098 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 54715 invoked by uid 1010); 2 Feb 2008 11:40:41 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 54700 invoked from network); 2 Feb 2008 11:40:41 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 2 Feb 2008 11:40:41 -0000 Authentication-Results: pb1.pair.com smtp.mail=derick@php.net; spf=unknown; sender-id=unknown Authentication-Results: pb1.pair.com header.from=derick@php.net; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 82.94.239.7 as permitted sender) X-PHP-List-Original-Sender: derick@php.net X-Host-Fingerprint: 82.94.239.7 mail.jdi-ict.nl Linux 2.6 Received: from [82.94.239.7] ([82.94.239.7:34496] helo=mail.jdi-ict.nl) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 58/12-41947-7B654A74 for ; Sat, 02 Feb 2008 06:40:40 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.jdi-ict.nl (8.13.7/8.12.11) with ESMTP id m12Beaw6031330; Sat, 2 Feb 2008 12:40:36 +0100 Date: Sat, 2 Feb 2008 12:40:41 +0100 (CET) X-X-Sender: derick@kossu.ez.no To: Stanislav Malyshev cc: "'PHP Internals'" In-Reply-To: <479A60BA.7030905@zend.com> Message-ID: References: <479A60BA.7030905@zend.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=UTF-8 Subject: Re: [PHP-DEV] json_encode() bug From: derick@php.net (Derick Rethans) On Fri, 25 Jan 2008, Stanislav Malyshev wrote: > Hi! > > Right now, if json_encode sees wrong UTF-8 data, it just cuts the string in > the middle, no error returned, no message produced. Example: > > var_dump(json_encode("ab\xE0")); > var_dump(json_encode("ab\xE0\"")); > > Both strings get cut at "ab". I think it's not a good idea to just silently > cut the data. In fact, I think it is a bug caused by this code in > ext/json/utf8_to_utf16.c: > if (c < 0) { > return UTF8_END ? the_index : UTF8_ERROR; > } > which inherited this bug from code published on json.org. It should be: > if (c < 0) { > return (c == UTF8_END) ? the_index : UTF8_ERROR; > } > Now this is an easy fix but would lead to bad strings silently converted to > empty strings. The question is - should we have an error there? If so, which > one - E_WARNING, E_NOTICE? I'm for E_WARNING. > Also filed as bug #43941. > Any comments? Like I mentioned before (I think), it should not return an empty string of course because programmatically it's not possible to check for this. As most of our functions return false in those cases, so should this function. Derick