Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:40055 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 33666 invoked from network); 21 Aug 2008 16:16:52 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Aug 2008 16:16:52 -0000 Authentication-Results: pb1.pair.com header.from=david.zuelke@bitextender.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=david.zuelke@bitextender.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain bitextender.com from 80.237.132.12 cause and error) X-PHP-List-Original-Sender: david.zuelke@bitextender.com X-Host-Fingerprint: 80.237.132.12 wp005.webpack.hosteurope.de Received: from [80.237.132.12] ([80.237.132.12:41870] helo=wp005.webpack.hosteurope.de) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 07/DD-06543-1F49DA84 for ; Thu, 21 Aug 2008 12:16:51 -0400 Received: from munich.bitxtender.net ([85.183.90.3] helo=[10.224.254.2]); authenticated by wp005.webpack.hosteurope.de running ExIM using esmtpsa (TLSv1:RC4-SHA:128) id 1KWCq5-0000eC-I3; Thu, 21 Aug 2008 18:16:45 +0200 Cc: "William A. Rowe, Jr." , Stanislav Malyshev , "'PHP Internals'" Message-ID: <1D87B84E-1502-4BBA-8CDB-0A9E73A8196F@bitextender.com> To: Rasmus Lerdorf In-Reply-To: <48AD9312.9050903@lerdorf.com> Content-Type: text/plain; charset=UTF-8; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v926) Date: Thu, 21 Aug 2008 18:16:44 +0200 References: <48ACC389.2030801@zend.com> <48ACC638.1030904@rowe-clan.net> <7C51580F-C656-47D9-9269-CA140AA9EBC2@bitextender.com> <48AD9312.9050903@lerdorf.com> X-Mailer: Apple Mail (2.926) X-bounce-key: webpack.hosteurope.de;david.zuelke@bitextender.com;1219335410;ebf16f4b; Subject: Re: [PHP-DEV] bug #43941 From: david.zuelke@bitextender.com (=?ISO-8859-1?Q?David_Z=FClke?=) Am 21.08.2008 um 18:08 schrieb Rasmus Lerdorf: > David Z=C3=BClke wrote: >> Am 21.08.2008 um 03:34 schrieb William A. Rowe, Jr.: >> >>> Stanislav Malyshev wrote: >>>> Hi! >>>> Are there any objections to incorporating bugfix for #43941 (fix =20= >>>> for >>>> how json handles invalid UTF-8 sequences) into 5.2? I had some >>>> requests about it, right now it's only in 5.3+. >>> >>> Is there the alternative of substituting an unmappable character =20 >>> FFFD in >>> place of the invalid sequence? This a a reasonable alternative =20 >>> behavior >>> for some less stringent cases. >>> >>> (Yes, the fix is better than the status quo, but just taking this =20= >>> a step >>> further). >> >> I agree, that would be quite reasonable and also more consistent with >> how UTF-8 works in other apps (browsers etc). > > Well, using browsers as the benchmark here is a bad idea. IE is =20 > absolutely braindead about dealing with illegal UTF-8 chars. It =20 > will accept just about any sequence of bytes as a valid UTF-8 char =20 > which causes all sorts of problems. I was talking about the common representation of an invalid sequence. =20= That's the question mark sign you usually see in a browser when the =20 encoding is incorrect. According to the Unicode standard, U+FFFD is supposed to be used as =20 the replacement character instead of simply stripping invalid ones: > Replacement Character. A character used as a substitute for an =20 > uninterpretable character from another encoding. The Unicode =20 > Standard uses U+FFFD replacement character for this function. says http://unicode.org/glossary/#replacement_character > Rendering software which cannot process a Unicode character =20 > appropriately most often display it as only an open rectangle, or =20 > the Unicode =E2=80=9Creplacement character=E2=80=9D (U+FFFD, =EF=BF=BD),= to indicate =20 > the position of the unrecognized character. says http://en.wikipedia.org/wiki/Unicode#Standardized_subsets Also see http://www.fileformat.info/info/unicode/char/fffd/index.htm As always, I consider sticking to specs good practice, so doing it in =20= the above case would be wise :) Hope that helps, David=