Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:34936 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 44163 invoked by uid 1010); 26 Jan 2008 07:18:47 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 44148 invoked from network); 26 Jan 2008 07:18:47 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 26 Jan 2008 07:18:47 -0000 Authentication-Results: pb1.pair.com header.from=tokul@users.sourceforge.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=tokul@users.sourceforge.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain users.sourceforge.net from 213.197.162.99 cause and error) X-PHP-List-Original-Sender: tokul@users.sourceforge.net X-Host-Fingerprint: 213.197.162.99 avilys.eik.lt Linux 2.6 Received: from [213.197.162.99] ([213.197.162.99:51250] helo=avilys.eik.lt) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 6D/EF-08850-6DEDA974 for ; Sat, 26 Jan 2008 02:18:46 -0500 Received: from avilys.eik.lt (avilys.local [127.0.0.1]) by avilys.eik.lt (Postfix) with ESMTP id 03A11248930 for ; Sat, 26 Jan 2008 09:15:57 +0200 (EET) Received: from avilys.eik.lt (avilys.local [127.0.0.1]) by avilys.eik.lt (Postfix) with ESMTP id DD6B424892F for ; Sat, 26 Jan 2008 09:15:56 +0200 (EET) Received: from 78.61.224.253 (NaSMail authenticated user tomas@topolis.lt) by avilys.eik.lt with HTTP; Sat, 26 Jan 2008 09:15:56 +0200 (EET) Message-ID: <37582.78.61.224.253.1201331756.nsm@avilys.eik.lt> In-Reply-To: <479A613C.8030604@zend.com> References: <200801241426.39756.arnaud.lb@gmail.com> <479A613C.8030604@zend.com> Date: Sat, 26 Jan 2008 09:15:56 +0200 (EET) To: internals@lists.php.net User-Agent: NaSMail/1.4 MIME-Version: 1.0 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Scanned: ClamAV using ClamSMTP Subject: Re: [PHP-DEV] [PATCH] Bug #43896 htmlspecialchars returns empty string on invalid unicode sequence From: tokul@users.sourceforge.net ("Tomas Kuliavas") >> Should really theses functions discard the whole string for a single >> incomplete sequence ? > > I think since it is not possible to recover true content of the string, > it is ok to return failure value. Cutting it in random places or > ignoring problems doesn't seem a good idea - it might lead to all kinds > of nasty things, such as security filtering checking one data and > database getting entirely different data. Instead of using simple sanitizing function users are forced to check for errors. How good is that? It makes code complex or unreliable. htmlspecialchars() and htmlentities() are not used to sanitize database data. What kind of errors you expect in htmlspecialchars()? I think supported charsets don't have alternative symbols in 0x22, 0x26, 0x27, 0x3C, 0x3E. Only CJK charsets and htmlentities might have issues. With any other charset you know start and end byte of symbol. If you think that broken utf-8 can cause issues, strip or sanitize broken symbols. If users detect error in htmlspecialchars(), they will use str_replace() in order to provide some failsafe instead of losing whole text and it won't solve security issues. -- Tomas