Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:59335 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 37712 invoked from network); 3 Apr 2012 01:46:52 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Apr 2012 01:46:52 -0000 Authentication-Results: pb1.pair.com smtp.mail=rasmus@lerdorf.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=rasmus@lerdorf.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lerdorf.com from 209.85.216.46 cause and error) X-PHP-List-Original-Sender: rasmus@lerdorf.com X-Host-Fingerprint: 209.85.216.46 mail-qa0-f46.google.com Received: from [209.85.216.46] ([209.85.216.46:50187] helo=mail-qa0-f46.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id E1/CF-43049-A865A7F4 for ; Mon, 02 Apr 2012 21:46:51 -0400 Received: by qatm19 with SMTP id m19so2492809qat.12 for ; Mon, 02 Apr 2012 18:46:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-gm-message-state; bh=kNeztD97wg3gn0BcW01gt2aD8Z7tMSg9MIgRfVG7394=; b=QYTjGqYAUy0Lz62cmhZhJxWEg8y/zjmZ4S+yupAcTnGJchi/zYkR0yX2dz1hWcFZn8 yDa4IJaT2S55u4a5iH2ThqXNGMEezEE+kdHVbMnTstTNrAy4G8R0l/1a2GWOTe4fgKPU uMplNk0gpu/X45ieMwQePKee07OSFYGCcBcaMjXN5fQ2vBTpfxbvb5ahsdPt5BHJYO+s yvKEgQIqKX9pNBT2TDsrkSu+0ZJo4ChDkZtDY8t0sLuNgq+QYaqL2pDNIKjYD/G1ms3M nCSvXvl8oq4KQM5BoUwGtEae8Bf6f8fDW4p07OJPLsyDkTL5sN/5IhZi3LyNObv62qzL P9Ww== Received: by 10.224.208.1 with SMTP id ga1mr14388276qab.21.1333417607370; Mon, 02 Apr 2012 18:46:47 -0700 (PDT) Received: from [192.168.200.5] (c-50-131-44-225.hsd1.ca.comcast.net. [50.131.44.225]) by mx.google.com with ESMTPS id m6sm37627510qah.2.2012.04.02.18.46.45 (version=SSLv3 cipher=OTHER); Mon, 02 Apr 2012 18:46:46 -0700 (PDT) Message-ID: <4F7A5684.4050102@lerdorf.com> Date: Mon, 02 Apr 2012 18:46:44 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120310 Thunderbird/11.0 MIME-Version: 1.0 To: Charlie Somerville CC: internals@lists.php.net References: <7EBB3287C8464A93B2B264FDAD2B6756@charliesomerville.com> In-Reply-To: <7EBB3287C8464A93B2B264FDAD2B6756@charliesomerville.com> X-Enigmail-Version: 1.4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQn/lDnpyPeTc7Go15zl3vXMnBTctZs8n86LCPy7Lp5NlkPTu+U7DkYDj1b0gdAwImAF0pf1 Subject: Re: [PHP-DEV] json_encode() and non-UTF8 strings From: rasmus@lerdorf.com (Rasmus Lerdorf) On 04/02/2012 06:35 PM, Charlie Somerville wrote: > Hi internals, > > I've created a pull request (https://github.com/php/php-src/pull/33) that changes json_encode to fall back to ASCII for strings that are not valid UTF-8. > > I ran into an issue in a production application involving PayPal IPN callbacks (which are sent encoded as windows-1252) and json_encode(). If there was an accented character present in the data, json_encode() would fail to encode the string and serialize it as 'null'. > > I've modified the behaviour of the underlying json_escape_string() implementation to attempt to encode strings anyway while still producing a warning. JSON with non-Unicode strings is no longer JSON. The spec is explicit that all strings must be Unicode. The default encoding is UTF-8, but it could be UTF-16/32 as well. See http://www.ietf.org/rfc/rfc4627.txt -Rasmus