Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:59336 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 39259 invoked from network); 3 Apr 2012 01:56:41 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Apr 2012 01:56:41 -0000 Authentication-Results: pb1.pair.com smtp.mail=adam@adamharvey.name; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=adam@adamharvey.name; sender-id=pass Received-SPF: pass (pb1.pair.com: domain adamharvey.name designates 209.85.210.170 as permitted sender) X-PHP-List-Original-Sender: adam@adamharvey.name X-Host-Fingerprint: 209.85.210.170 mail-iy0-f170.google.com Received: from [209.85.210.170] ([209.85.210.170:52474] helo=mail-iy0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D4/10-43049-9D85A7F4 for ; Mon, 02 Apr 2012 21:56:41 -0400 Received: by iaeh11 with SMTP id h11so5704964iae.29 for ; Mon, 02 Apr 2012 18:56:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adamharvey.name; s=google; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=KqgqmoUmPxJn++aPgGF0xbA24fksHEn+J4KzY8zqegc=; b=tekrvmsAwE5TMHRAUII+Ojy50K0bLhL63dSJnxpRNV1YpIif+kB55SAOyL0b1w4Yfz LRfKrtU+jIz/+izdoGVkD0CMkKalKfm6fVpfMjKhON7DkjhMkEnqIoWdpQ2Rpe5tvzhi utqYeEZbFu9PonBJ/tlTfypIkAzbJok2Dcybc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=KqgqmoUmPxJn++aPgGF0xbA24fksHEn+J4KzY8zqegc=; b=EnkTOGh3lY8L7tavGBlXmkX47Ic/O/FGYDSefhASCiR1FAAIQKYM6YNSmacLMHXerw BWrE0cQRh6jvrmXpV2TXYvZldypWyGOUot3EE+VwZLJFp/UWGZn8WDJMSya8CubAeQN2 2iLLviSakwYd0tq0LRE6jtjyvnPgvI5BUkPAmj8p88N+DfUTBjVhA/Fr75ZdtmYt5LvE KQbI6QK7gBxppaeAEhcsTfG0dy0SfQfnhDhKYkrxuME7czppTW6Sfu/AQRmpGCfVFo7X m0Chse6wJyi3kGaZl9KL3d9Z1ykyO4KIPn0v7Sqig7PhaaXbjYqqZPAZp3BW7GI9At6x T2Sg== Received: by 10.42.148.200 with SMTP id s8mr5897027icv.39.1333418198422; Mon, 02 Apr 2012 18:56:38 -0700 (PDT) MIME-Version: 1.0 Sender: adam@adamharvey.name Received: by 10.42.243.73 with HTTP; Mon, 2 Apr 2012 18:56:18 -0700 (PDT) In-Reply-To: <4F7A5684.4050102@lerdorf.com> References: <7EBB3287C8464A93B2B264FDAD2B6756@charliesomerville.com> <4F7A5684.4050102@lerdorf.com> Date: Tue, 3 Apr 2012 09:56:18 +0800 X-Google-Sender-Auth: p80c4O8wRKD-dGfUvT54QPnLE8I Message-ID: To: Rasmus Lerdorf Cc: Charlie Somerville , internals@lists.php.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQnWQ0J/Te7IBNhTQk4GI9FVRWqXJEHbSsl82WfW42Zl61J5BpkMAy4KMJxwSL8BH2AOzYUm Subject: Re: [PHP-DEV] json_encode() and non-UTF8 strings From: aharvey@php.net (Adam Harvey) On 3 April 2012 09:46, Rasmus Lerdorf wrote: > On 04/02/2012 06:35 PM, Charlie Somerville wrote: >> I've created a pull request (https://github.com/php/php-src/pull/33) tha= t changes json_encode to fall back to ASCII for strings that are not valid = UTF-8. >> >> I ran into an issue in a production application involving PayPal IPN cal= lbacks (which are sent encoded as windows-1252) and json_encode(). If there= was an accented character present in the data, json_encode() would fail to= encode the string and serialize it as 'null'. >> >> I've modified the behaviour of the underlying json_escape_string() imple= mentation to attempt to encode strings anyway while still producing a warni= ng. > > JSON with non-Unicode strings is no longer JSON. The spec is explicit > that all strings must be Unicode. The default encoding is UTF-8, but it > could be UTF-16/32 as well. > > See http://www.ietf.org/rfc/rfc4627.txt Agreed. I have a patch lying around for bug #61537 that I need to finish up that actually goes the other way and changes the default behaviour of json_encode() to match the documentation and return false if a string is invalid UTF-8, rather than just nulling that string. -1 from me on this. It's a regression from the current behaviour, IMO. Adam