Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:40057 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 43241 invoked from network); 21 Aug 2008 16:41:20 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Aug 2008 16:41:20 -0000 Authentication-Results: pb1.pair.com header.from=rasmus@lerdorf.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=rasmus@lerdorf.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lerdorf.com from 204.11.219.139 cause and error) X-PHP-List-Original-Sender: rasmus@lerdorf.com X-Host-Fingerprint: 204.11.219.139 mail.lerdorf.com Received: from [204.11.219.139] ([204.11.219.139:38262] helo=mail.lerdorf.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 7D/EF-06543-EAA9DA84 for ; Thu, 21 Aug 2008 12:41:20 -0400 Received: from [216.145.54.15] (socks3.corp.yahoo.com [216.145.54.15]) (authenticated bits=0) by mail.lerdorf.com (8.14.3/8.14.3/Debian-5) with ESMTP id m7LGfAmE026681 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 21 Aug 2008 09:41:10 -0700 Message-ID: <48AD9AA6.9040805@lerdorf.com> Date: Thu, 21 Aug 2008 09:41:10 -0700 User-Agent: Thunderbird/3.0a2pre (Macintosh; 2008071516) MIME-Version: 1.0 To: =?UTF-8?B?RGF2aWQgWsO8bGtl?= CC: "William A. Rowe, Jr." , Stanislav Malyshev , "'PHP Internals'" References: <48ACC389.2030801@zend.com> <48ACC638.1030904@rowe-clan.net> <7C51580F-C656-47D9-9269-CA140AA9EBC2@bitextender.com> <48AD9312.9050903@lerdorf.com> <1D87B84E-1502-4BBA-8CDB-0A9E73A8196F@bitextender.com> In-Reply-To: <1D87B84E-1502-4BBA-8CDB-0A9E73A8196F@bitextender.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-3.0 (mail.lerdorf.com [204.11.219.139]); Thu, 21 Aug 2008 09:41:11 -0700 (PDT) Subject: Re: [PHP-DEV] bug #43941 From: rasmus@lerdorf.com (Rasmus Lerdorf) David Zülke wrote: > Am 21.08.2008 um 18:08 schrieb Rasmus Lerdorf: > >> David Zülke wrote: >>> Am 21.08.2008 um 03:34 schrieb William A. Rowe, Jr.: >>> >>>> Stanislav Malyshev wrote: >>>>> Hi! >>>>> Are there any objections to incorporating bugfix for #43941 (fix for >>>>> how json handles invalid UTF-8 sequences) into 5.2? I had some >>>>> requests about it, right now it's only in 5.3+. >>>> >>>> Is there the alternative of substituting an unmappable character >>>> FFFD in >>>> place of the invalid sequence? This a a reasonable alternative behavior >>>> for some less stringent cases. >>>> >>>> (Yes, the fix is better than the status quo, but just taking this a >>>> step >>>> further). >>> >>> I agree, that would be quite reasonable and also more consistent with >>> how UTF-8 works in other apps (browsers etc). >> >> Well, using browsers as the benchmark here is a bad idea. IE is >> absolutely braindead about dealing with illegal UTF-8 chars. It will >> accept just about any sequence of bytes as a valid UTF-8 char which >> causes all sorts of problems. > > I was talking about the common representation of an invalid sequence. > That's the question mark sign you usually see in a browser when the > encoding is incorrect. Yes, but it all comes down to how you do it. Say you have a 3 byte sequence that starts with 0xE0 (E0 indicates the start of a 3-byte utf-8 char) but the 3 bytes together don't actually make up a valid utf-8 char. Id you substitute those 3 bytes with a ? or some other character you have just created a nasty XSS vector for web apps. And yes, that is exactly what IE does and it has caused us no end of headaches over the years. -Rasmus