Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:34920 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 55100 invoked by uid 1010); 24 Jan 2008 13:25:15 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 55085 invoked from network); 24 Jan 2008 13:25:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 Jan 2008 13:25:15 -0000 Authentication-Results: pb1.pair.com smtp.mail=arnaud.lb@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=arnaud.lb@gmail.com; sender-id=pass; domainkeys=bad Received-SPF: pass (pb1.pair.com: domain gmail.com designates 72.14.220.155 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: arnaud.lb@gmail.com X-Host-Fingerprint: 72.14.220.155 fg-out-1718.google.com Received: from [72.14.220.155] ([72.14.220.155:4307] helo=fg-out-1718.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 7C/28-03447-9B198974 for ; Thu, 24 Jan 2008 08:25:14 -0500 Received: by fg-out-1718.google.com with SMTP id 22so249240fge.23 for ; Thu, 24 Jan 2008 05:25:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:subject:date:user-agent:mime-version:content-type:content-transfer-encoding:content-disposition:message-id; bh=W06xRrnDDE3VhhW9sfZd0713VT5TbLjPxJIalIdHZrU=; b=KZ74ogAlAK23S8FmfvhR/crtQNjinp0V8uKNFLUXDqjD32qsaXCo0HzG/AIA/81xBWlAUqgZ7kqm1XlPxfEh5xpLa26rYMwaXsnaOb13NJLFvISzD/cVv9EbkH28NBHG43cgUX/d5U6b/BiW1L0TIpwpQOEBKj7oDvo8R0agpGM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:subject:date:user-agent:mime-version:content-type:content-transfer-encoding:content-disposition:message-id; b=A64EI+QZhiJHEjpK5UFePxeTY2ESQcKC853PzHHJRyZaNO5UDA+1HE4ANAytf1rOq8TWS4yBUYyyHgBxOaRmB4jX00tBfUWpcrbxI19aBkNBVKKmAnL9dClOwJ0wUy0uZDVzwgeh9mVpcE7TmSrCjruU1tcfPGSuXLoE4SCFP5o= Received: by 10.86.100.7 with SMTP id x7mr619886fgb.10.1201181110920; Thu, 24 Jan 2008 05:25:10 -0800 (PST) Received: from noch2.local ( [213.41.177.207]) by mx.google.com with ESMTPS id e11sm649322fga.5.2008.01.24.05.25.10 (version=SSLv3 cipher=OTHER); Thu, 24 Jan 2008 05:25:10 -0800 (PST) To: "internals Mailing List" , Stanislav Malyshev Date: Thu, 24 Jan 2008 14:26:38 +0100 User-Agent: KMail/1.9.7 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-ID: <200801241426.39756.arnaud.lb@gmail.com> Subject: [PATCH] Bug #43896 htmlspecialchars returns empty string on invalid unicode sequence From: arnaud.lb@gmail.com ("Arnaud.lb") Hi, The htmlspecialchars and htmlentities functions since version 5.2.5 return an empty string when the input contains at least a single invalid or incomplete unicode sequence. What I understood is that this change was made to avoid reading more chars in the buffer than it actually contained. Should really theses functions discard the whole string for a single incomplete sequence ? I made a patch which changes the behavior of these functions to skip invalid sequences, without discarding the whole string. This involves a very few changes and makes the behavior of theses functions more consistent with previous PHP versions, keeping the fixes that was made in the get_next_char() internal function. The patch: http://s3.amazonaws.com/arnaud.lb/php_htmlentities_utf.patch The bug entry: http://bugs.php.net/bug.php?id=43896