Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:30544 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 71687 invoked by uid 1010); 6 Jul 2007 17:22:50 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 71672 invoked from network); 6 Jul 2007 17:22:50 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 6 Jul 2007 17:22:50 -0000 Authentication-Results: pb1.pair.com smtp.mail=stas@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=stas@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 63.205.162.114 as permitted sender) X-PHP-List-Original-Sender: stas@zend.com X-Host-Fingerprint: 63.205.162.114 unknown Windows 2000 SP4, XP SP1 Received: from [63.205.162.114] ([63.205.162.114:64222] helo=us-ex1.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id DA/1C-50692-86A7E864 for ; Fri, 06 Jul 2007 13:22:50 -0400 Received: from [127.0.0.1] ([192.168.16.180]) by us-ex1.zend.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 6 Jul 2007 10:22:44 -0700 Message-ID: <468E7A62.4030703@zend.com> Date: Fri, 06 Jul 2007 10:22:42 -0700 Organization: Zend Technologies User-Agent: Thunderbird 2.0.0.4 (Windows/20070604) MIME-Version: 1.0 To: Tomas Kuliavas CC: internals@lists.php.net References: <1181829227.3478.3.camel@localhost.localdomain> <7d5a202f0706141844l3c75b556hdbecbcd5a43747c9@mail.gmail.com> <4671F184.2020401@lerdorf.com> <6sof73dj69ldpspfc5ukrc58qr9ckbin2b@4ax.com> <4677E7B1.2080305@lerdorf.com> <4677F5FB.1070206@lerdorf.com> <4678252F.2050803@sci.fi> <46783212.4020900@lerdorf.com> <34654.216.230.84.67.1183064088.squirrel@www.l-i-e.com> <54557.78.61.224.253.1183098089.squirrel@avilys.eik.lt> <2159.24.1.37.132.1183693437.squirrel@www.l-i-e.com> <468DDFEB.3080404@zend.com> <47498.78.61.224.253.1183713764.squirrel@avilys.eik.lt> In-Reply-To: <47498.78.61.224.253.1183713764.squirrel@avilys.eik.lt> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-OriginalArrivalTime: 06 Jul 2007 17:22:44.0620 (UTC) FILETIME=[43F0E4C0:01C7BFF2] Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6? From: stas@zend.com (Stanislav Malyshev) > --- test.php --- > $string1 = "ą"; > $string2 = "\xC4\x85"; > var_dump($string1 == $string2) How you expect one-character string to be equal to two-character string? > ą is in utf-8 (latin small letter a with ogonek, latin extended-a range). > It contains two bytes with 0xC4 0x85 values. It contains two bytes in the filesystem. It however contains one character in PHP. In unicode mode, bytes and characters are different things. You could make $string2 as binary and then convert it from utf-8 to unicode, but without explicitly saying otherwise that string contains two characters - U+00C4 (LATIN CAPITAL LETTER A WITH DIAERESIS) and U+0085 (control character, no name). It doesn't mean escape sequences stop working, it means characters and bytes are no more the same. That's the price one has to pay for doing unicode. -- Stanislav Malyshev, Zend Software Architect stas@zend.com http://www.zend.com/ (408)253-8829 MSN: stas@zend.com