Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:40058 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 44606 invoked from network); 21 Aug 2008 16:44:38 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Aug 2008 16:44:38 -0000 Authentication-Results: pb1.pair.com header.from=david.zuelke@bitextender.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=david.zuelke@bitextender.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain bitextender.com from 80.237.132.12 cause and error) X-PHP-List-Original-Sender: david.zuelke@bitextender.com X-Host-Fingerprint: 80.237.132.12 wp005.webpack.hosteurope.de Received: from [80.237.132.12] ([80.237.132.12:57968] helo=wp005.webpack.hosteurope.de) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id DF/30-06543-57B9DA84 for ; Thu, 21 Aug 2008 12:44:37 -0400 Received: from munich.bitxtender.net ([85.183.90.3] helo=[10.224.254.2]); authenticated by wp005.webpack.hosteurope.de running ExIM using esmtpsa (TLSv1:RC4-SHA:128) id 1KWDH0-0003HE-7D; Thu, 21 Aug 2008 18:44:34 +0200 Cc: "William A. Rowe, Jr." , Stanislav Malyshev , "'PHP Internals'" Message-ID: To: Rasmus Lerdorf In-Reply-To: <48AD9AA6.9040805@lerdorf.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v926) Date: Thu, 21 Aug 2008 18:44:33 +0200 References: <48ACC389.2030801@zend.com> <48ACC638.1030904@rowe-clan.net> <7C51580F-C656-47D9-9269-CA140AA9EBC2@bitextender.com> <48AD9312.9050903@lerdorf.com> <1D87B84E-1502-4BBA-8CDB-0A9E73A8196F@bitextender.com> <48AD9AA6.9040805@lerdorf.com> X-Mailer: Apple Mail (2.926) X-bounce-key: webpack.hosteurope.de;david.zuelke@bitextender.com;1219337077;ebdccd67; Subject: Re: [PHP-DEV] bug #43941 From: david.zuelke@bitextender.com (=?ISO-8859-1?Q?David_Z=FClke?=) Am 21.08.2008 um 18:41 schrieb Rasmus Lerdorf: > David Z=FClke wrote: >> Am 21.08.2008 um 18:08 schrieb Rasmus Lerdorf: >> >>> David Z=FClke wrote: >>>> Am 21.08.2008 um 03:34 schrieb William A. Rowe, Jr.: >>>> >>>>> Stanislav Malyshev wrote: >>>>>> Hi! >>>>>> Are there any objections to incorporating bugfix for #43941 =20 >>>>>> (fix for >>>>>> how json handles invalid UTF-8 sequences) into 5.2? I had some >>>>>> requests about it, right now it's only in 5.3+. >>>>> >>>>> Is there the alternative of substituting an unmappable character >>>>> FFFD in >>>>> place of the invalid sequence? This a a reasonable alternative =20 >>>>> behavior >>>>> for some less stringent cases. >>>>> >>>>> (Yes, the fix is better than the status quo, but just taking =20 >>>>> this a >>>>> step >>>>> further). >>>> >>>> I agree, that would be quite reasonable and also more consistent =20= >>>> with >>>> how UTF-8 works in other apps (browsers etc). >>> >>> Well, using browsers as the benchmark here is a bad idea. IE is >>> absolutely braindead about dealing with illegal UTF-8 chars. It will >>> accept just about any sequence of bytes as a valid UTF-8 char which >>> causes all sorts of problems. >> >> I was talking about the common representation of an invalid sequence. >> That's the question mark sign you usually see in a browser when the >> encoding is incorrect. > > Yes, but it all comes down to how you do it. Say you have a 3 byte =20= > sequence that starts with 0xE0 (E0 indicates the start of a 3-byte =20 > utf-8 char) but the 3 bytes together don't actually make up a valid =20= > utf-8 char. Id you substitute those 3 bytes with a ? or some other =20= > character you have just created a nasty XSS vector for web apps. You don't substitute it with "a ? or some other character", you =20 replace it with U+FFFD (0xEF 0xBF 0xBD in UTF-8). I'd love to hear how =20= that causes an attack vector. David=