Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:40060 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 49288 invoked from network); 21 Aug 2008 16:56:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Aug 2008 16:56:15 -0000 Authentication-Results: pb1.pair.com smtp.mail=david.zuelke@bitextender.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=david.zuelke@bitextender.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain bitextender.com from 80.237.132.12 cause and error) X-PHP-List-Original-Sender: david.zuelke@bitextender.com X-Host-Fingerprint: 80.237.132.12 wp005.webpack.hosteurope.de Received: from [80.237.132.12] ([80.237.132.12:56137] helo=wp005.webpack.hosteurope.de) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D6/51-06543-E2E9DA84 for ; Thu, 21 Aug 2008 12:56:15 -0400 Received: from munich.bitxtender.net ([85.183.90.3] helo=[10.224.254.2]); authenticated by wp005.webpack.hosteurope.de running ExIM using esmtpsa (TLSv1:RC4-SHA:128) id 1KWDSF-0004QN-5s; Thu, 21 Aug 2008 18:56:11 +0200 Cc: "William A. Rowe, Jr." , Stanislav Malyshev , "'PHP Internals'" Message-ID: <158D158E-8A72-4DE2-81D1-49A01BC948B2@bitextender.com> To: Rasmus Lerdorf In-Reply-To: <48AD9CCC.8070208@lerdorf.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v926) Date: Thu, 21 Aug 2008 18:56:10 +0200 References: <48ACC389.2030801@zend.com> <48ACC638.1030904@rowe-clan.net> <7C51580F-C656-47D9-9269-CA140AA9EBC2@bitextender.com> <48AD9312.9050903@lerdorf.com> <1D87B84E-1502-4BBA-8CDB-0A9E73A8196F@bitextender.com> <48AD9AA6.9040805@lerdorf.com> <48AD9CCC.8070208@lerdorf.com> X-Mailer: Apple Mail (2.926) X-bounce-key: webpack.hosteurope.de;david.zuelke@bitextender.com;1219337775;b3bbf22c; Subject: Re: [PHP-DEV] bug #43941 From: david.zuelke@bitextender.com (=?ISO-8859-1?Q?David_Z=FClke?=) Am 21.08.2008 um 18:50 schrieb Rasmus Lerdorf: > David Z=FClke wrote: >> Am 21.08.2008 um 18:41 schrieb Rasmus Lerdorf: >> >>> David Z=FClke wrote: >>>> Am 21.08.2008 um 18:08 schrieb Rasmus Lerdorf: >>>> >>>>> David Z=FClke wrote: >>>>>> Am 21.08.2008 um 03:34 schrieb William A. Rowe, Jr.: >>>>>> >>>>>>> Stanislav Malyshev wrote: >>>>>>>> Hi! >>>>>>>> Are there any objections to incorporating bugfix for #43941 =20 >>>>>>>> (fix for >>>>>>>> how json handles invalid UTF-8 sequences) into 5.2? I had some >>>>>>>> requests about it, right now it's only in 5.3+. >>>>>>> >>>>>>> Is there the alternative of substituting an unmappable character >>>>>>> FFFD in >>>>>>> place of the invalid sequence? This a a reasonable alternative >>>>>>> behavior >>>>>>> for some less stringent cases. >>>>>>> >>>>>>> (Yes, the fix is better than the status quo, but just taking =20 >>>>>>> this a >>>>>>> step >>>>>>> further). >>>>>> >>>>>> I agree, that would be quite reasonable and also more =20 >>>>>> consistent with >>>>>> how UTF-8 works in other apps (browsers etc). >>>>> >>>>> Well, using browsers as the benchmark here is a bad idea. IE is >>>>> absolutely braindead about dealing with illegal UTF-8 chars. It =20= >>>>> will >>>>> accept just about any sequence of bytes as a valid UTF-8 char =20 >>>>> which >>>>> causes all sorts of problems. >>>> >>>> I was talking about the common representation of an invalid =20 >>>> sequence. >>>> That's the question mark sign you usually see in a browser when the >>>> encoding is incorrect. >>> >>> Yes, but it all comes down to how you do it. Say you have a 3 byte >>> sequence that starts with 0xE0 (E0 indicates the start of a 3-byte >>> utf-8 char) but the 3 bytes together don't actually make up a valid >>> utf-8 char. Id you substitute those 3 bytes with a ? or some other >>> character you have just created a nasty XSS vector for web apps. >> >> You don't substitute it with "a ? or some other character", you =20 >> replace >> it with U+FFFD (0xEF 0xBF 0xBD in UTF-8). I'd love to hear how that >> causes an attack vector. > > It doesn't matter what you replace it with. If the byte sequence is: > > 0xE0 " > > > And you replace those bytes with some other byte in this sort of =20 > context: > > > > > Now do your silly replacement: > > > > That now means that IE interprets the value attribute of the foo =20 > element as: value=3D"0xEF 0xBF 0xBD And now $data is suddenly outside the quoted value attribute! Oops! =20= > Major XSS. Google Groups and Yahoo were both hit by this last year. Interesting. I assume that was a weakness in the respective =20 implementation, right? Since 0xE0 " > should never be regarded a valid sequence since neither " nor > are in =20= the range above 0x7F... David=