Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:34988 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 1279 invoked by uid 1010); 28 Jan 2008 23:37:09 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 1264 invoked from network); 28 Jan 2008 23:37:09 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 28 Jan 2008 23:37:09 -0000 Authentication-Results: pb1.pair.com smtp.mail=penguin@php.net; spf=unknown; sender-id=unknown Authentication-Results: pb1.pair.com header.from=penguin@php.net; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 195.41.46.235 as permitted sender) X-PHP-List-Original-Sender: penguin@php.net X-Host-Fingerprint: 195.41.46.235 pfepa.post.tele.dk Linux 2.5 (sometimes 2.4) (4) Received: from [195.41.46.235] ([195.41.46.235:54587] helo=pfepa.post.tele.dk) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 05/3F-25507-4276E974 for ; Mon, 28 Jan 2008 18:37:09 -0500 Received: from workpenguin (0x5358bbb8.bynxx18.adsl-dhcp.tele.dk [83.88.187.184]) by pfepa.post.tele.dk (Postfix) with SMTP id BC663FAC02C; Tue, 29 Jan 2008 00:37:06 +0100 (CET) To: stas@zend.com (Stanislav Malyshev) Cc: internals Mailing List Date: Tue, 29 Jan 2008 00:36:32 +0100 Message-ID: <3hpsp3hmn2de4fard8lkpentg24k70jrhg@4ax.com> References: <200801241426.39756.arnaud.lb@gmail.com> <479A613C.8030604@zend.com> In-Reply-To: <479A613C.8030604@zend.com> X-Mailer: Forte Agent 1.91/32.564 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] [PATCH] Bug #43896 htmlspecialchars returns empty stringon invalid unicode sequence From: penguin@php.net (Peter Brodersen) On Fri, 25 Jan 2008 14:22:52 -0800, in php.internals stas@zend.com (Stanislav Malyshev) wrote: >> Should really theses functions discard the whole string for a single=20 >> incomplete sequence ? > >I think since it is not possible to recover true content of the string,=20 >it is ok to return failure value. Cutting it in random places or=20 >ignoring problems doesn't seem a good idea - it might lead to all kinds=20 >of nasty things, such as security filtering checking one data and=20 >database getting entirely different data. On the other hand utf8_decode() also expects the input to be UTF-8 encoded, but it replaces incomplete sequences with the character "?". I don't know if it is a recommended standard for invalid input but I have seen this conversion as well in a couple of other applications, e.g. Firefox. --=20 - Peter Brodersen