Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:30559 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 19413 invoked by uid 1010); 6 Jul 2007 18:32:38 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 19373 invoked from network); 6 Jul 2007 18:32:37 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 6 Jul 2007 18:32:37 -0000 Authentication-Results: pb1.pair.com smtp.mail=stas@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=stas@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 63.205.162.114 as permitted sender) X-PHP-List-Original-Sender: stas@zend.com X-Host-Fingerprint: 63.205.162.114 unknown Windows 2000 SP4, XP SP1 Received: from [63.205.162.114] ([63.205.162.114:8496] helo=us-ex1.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 13/31-09628-DBA8E864 for ; Fri, 06 Jul 2007 14:32:35 -0400 Received: from [127.0.0.1] ([192.168.16.180]) by us-ex1.zend.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 6 Jul 2007 11:32:18 -0700 Message-ID: <468E8AB0.9030308@zend.com> Date: Fri, 06 Jul 2007 11:32:16 -0700 Organization: Zend Technologies User-Agent: Thunderbird 2.0.0.4 (Windows/20070604) MIME-Version: 1.0 To: Tomas Kuliavas CC: internals@lists.php.net References: <1181829227.3478.3.camel@localhost.localdomain> <7d5a202f0706141844l3c75b556hdbecbcd5a43747c9@mail.gmail.com> <4671F184.2020401@lerdorf.com> <6sof73dj69ldpspfc5ukrc58qr9ckbin2b@4ax.com> <4677E7B1.2080305@lerdorf.com> <4677F5FB.1070206@lerdorf.com> <4678252F.2050803@sci.fi> <46783212.4020900@lerdorf.com> <34654.216.230.84.67.1183064088.squirrel@www.l-i-e.com> <54557.78.61.224.253.1183098089.squirrel@avilys.eik.lt> <2159.24.1.37.132.1183693437.squirrel@www.l-i-e.com> <468DDFEB.3080404@zend.com> <47498.78.61.224.253.1183713764.squirrel@avilys.eik.lt> <468E7A62.4030703@zend.com> <60304.78.61.224.253.1183745684.squirrel@avilys.eik.lt> In-Reply-To: <60304.78.61.224.253.1183745684.squirrel@avilys.eik.lt> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 06 Jul 2007 18:32:18.0796 (UTC) FILETIME=[FBF19AC0:01C7BFFB] Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6? From: stas@zend.com (Stanislav Malyshev) > In PHP4/5 \xC4 and \x85 are not characters. They are bytes. They are both. In PHP 5, character and byte is the same. In Unicode, it's not. > I can't pay such price. You are reducing available coding options and want Then you can't use Unicode, at least not directly - you would have to convert all your unicode data back to bytes and work with them on that level. Unicode works on character level, you want to work on byte level, so somewhere on the way translation should happen. We will try to make it easier, but I don't think it's reasonable to expect that code based on this assumption would work without any changes whatsoever in php 6. > If I take a look at ext/unicode/unicode.c, I see more PHP_FUNCTION > functions. I don't know PHP6 release schedule. If PHP6 is approaching RC ext/unicode as it is now is very incomplete. It will be improved quite soon. I don't want to announce things prematurely, but please just have a little patience and you'll see the improvement. > stage, maybe docs can be updated to inform about these functions. PHP > provides API for PHP scripts developers. Strongest API part is good > documentation. I shouldn't have to dig through C sources in order to learn > about available interpreter features. If you write code now and document > it later, you won't document it or it will take some time and lots of bug > reports to sync sources with manual. Nobody expects you to dig through C sources, and of course documentation is important. However the basic assumption of Unicode that characters and bytes are not the same is something that I wouln't expect to change. Of course, having docs that describe common unicode pitfalls and how to work around them is very important too. I think once we are closer to releasing it would become higher priority. -- Stanislav Malyshev, Zend Software Architect stas@zend.com http://www.zend.com/ (408)253-8829 MSN: stas@zend.com