Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:34992 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 34483 invoked by uid 1010); 29 Jan 2008 02:25:09 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 34468 invoked from network); 29 Jan 2008 02:25:09 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 29 Jan 2008 02:25:09 -0000 Authentication-Results: pb1.pair.com smtp.mail=penguin@php.net; spf=unknown; sender-id=unknown Authentication-Results: pb1.pair.com header.from=penguin@php.net; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 195.41.46.236 as permitted sender) X-PHP-List-Original-Sender: penguin@php.net X-Host-Fingerprint: 195.41.46.236 pfepb.post.tele.dk Linux 2.5 (sometimes 2.4) (4) Received: from [195.41.46.236] ([195.41.46.236:35341] helo=pfepb.post.tele.dk) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 67/24-25507-48E8E974 for ; Mon, 28 Jan 2008 21:25:08 -0500 Received: from workpenguin (0x5358bbb8.bynxx18.adsl-dhcp.tele.dk [83.88.187.184]) by pfepb.post.tele.dk (Postfix) with SMTP id 36AACA50016; Tue, 29 Jan 2008 03:25:03 +0100 (CET) To: rasmus@lerdorf.com (Rasmus Lerdorf) Cc: internals Mailing List Date: Tue, 29 Jan 2008 03:24:30 +0100 Message-ID: References: <200801241426.39756.arnaud.lb@gmail.com> <479A613C.8030604@zend.com> <3hpsp3hmn2de4fard8lkpentg24k70jrhg@4ax.com> <479E80D8.5020206@lerdorf.com> In-Reply-To: <479E80D8.5020206@lerdorf.com> X-Mailer: Forte Agent 1.91/32.564 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] [PATCH] Bug #43896 htmlspecialchars returns empty stringoninvalid unicode sequence From: penguin@php.net (Peter Brodersen) On Mon, 28 Jan 2008 17:26:48 -0800, in php.internals rasmus@lerdorf.com (Rasmus Lerdorf) wrote: >php -r '$a=3D"abcd".chr(0xE0);echo >iconv("utf-8","utf-8",$a)."\n".utf8_decode($a);' | od -t x1 > >0000000 61 62 63 64 0a 61 62 63 64 03 By the way, the 03 in your result is a bit spurious. For me it seems to differ every time I run that code. It happens with 0xE0 but not with e.g. 0xE6 (=C3=A6). It seems to be = consistent for every run though: $ php -r 'for($a=3D0;$a<20;$a++)printf("%02x ",utf8_decode(chr(0xE0)));' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 $ php -r 'for($a=3D0;$a<20;$a++)printf("%02x ",utf8_decode(chr(0xE0)));' 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 $ php -r 'for($a=3D0;$a<20;$a++)printf("%02x ",utf8_decode(chr(0xE0)));' 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 $ php -r 'for($a=3D0;$a<20;$a++)printf("%02x ",utf8_decode(chr(0xE0)));' 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 $ for a in `seq 1 20`; do php -r 'printf("%02x = ",utf8_decode(chr(0xE0)));'; done 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 09 00 $ for a in `seq 1 20`; do php -r 'printf("%02x = ",utf8_decode(chr(0xE0)));'; done 08 00 00 02 00 00 00 00 00 05 00 00 00 05 00 00 07 00 09 00 $ for a in `seq 1 20`; do php -r 'printf("%02x = ",utf8_decode(chr(0xE0)));'; done 00 00 00 00 00 00 00 00 04 00 08 00 00 00 00 05 00 00 01 00=20 I don't think there is any reason for this behaviour. I'll file a bug. --=20 - Peter Brodersen