Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:30649 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 88853 invoked by uid 1010); 9 Jul 2007 08:06:08 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 88838 invoked from network); 9 Jul 2007 08:06:08 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Jul 2007 08:06:08 -0000 Authentication-Results: pb1.pair.com smtp.mail=stas@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=stas@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 63.205.162.114 as permitted sender) X-PHP-List-Original-Sender: stas@zend.com X-Host-Fingerprint: 63.205.162.114 unknown Windows 2000 SP4, XP SP1 Received: from [63.205.162.114] ([63.205.162.114:13668] helo=us-ex1.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 58/00-22151-E6CE1964 for ; Mon, 09 Jul 2007 04:06:08 -0400 Received: from [127.0.0.1] ([192.168.17.38]) by us-ex1.zend.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 9 Jul 2007 01:06:03 -0700 Message-ID: <4691EC69.2010102@zend.com> Date: Mon, 09 Jul 2007 01:06:01 -0700 Organization: Zend Technologies User-Agent: Thunderbird 2.0.0.4 (Windows/20070604) MIME-Version: 1.0 To: ceo@l-i-e.com CC: Tomas Kuliavas , internals@lists.php.net References: <1181829227.3478.3.camel@localhost.localdomain> <7d5a202f0706141844l3c75b556hdbecbcd5a43747c9@mail.gmail.com> <4671F184.2020401@lerdorf.com> <6sof73dj69ldpspfc5ukrc58qr9ckbin2b@4ax.com> <4677E7B1.2080305@lerdorf.com> <4677F5FB.1070206@lerdorf.com> <4678252F.2050803@sci.fi> <46783212.4020900@lerdorf.com> <34654.216.230.84.67.1183064088.squirrel@www.l-i-e.com> <54557.78.61.224.253.1183098089.squirrel@avilys.eik.lt> <2159.24.1.37.132.1183693437.squirrel@www.l-i-e.com> <468DDFEB.3080404@zend.com> <2031.24.1.37.132.1183965946.squirrel@www.l-i-e.com> In-Reply-To: <2031.24.1.37.132.1183965946.squirrel@www.l-i-e.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 09 Jul 2007 08:06:03.0641 (UTC) FILETIME=[FEA53690:01C7C1FF] Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6? From: stas@zend.com (Stanislav Malyshev) > Maybe strings should be UTF-8 until declared otherwise or something, > because this just won't fly... UTF8 would not help you with bits (since nobody guarantees you incoming data is valid UTF-8) and it's impossible to do any unicode stuff on utf-8 - you'd have to convert it to utf-16 and back on every step. > I dunno. Aren't there headers to indicate what kind of data is coming > in? I know of no headers that can tell you "parameter 'foo' in a form is a bitmask so please do not try to see it as text". > If there aren't, or can't be, then you have to let ME tell you what it > is. You can. Use binary strings and explicit conversions. > You can't just go assuming I've got UTF-16 data coming in -- > especially not when the entire Internet has been built and subsisted > on ASCII (more or less) for over a decade. Actually, there's INI parameter that says which encoding the incoming data is in. The problem is not that - the problem is that PHP can't know that you pass bit fields inside textual information (and in HTTP all parameters are textual) so you have to work with it manually. > Anybody who actually NEEDS Unicode ought to be the ones who have to > type a new keyword or something, not the bazillion users who have no > need for Unicode and likely never will... If they have no need for unicode, why run unicode-enabled PHP? Turn it off and get all your strings untouched. > It's just an ASCII string, same as it's always been. IS_STRING > If you need some new-fangled UTF-16 datatype stringie, then go ahead > and give yourself one. IS_UNICODE > But don't change all MY data to UTF-16 when it isn't UTF-16!!! Then you can't use unicode mode. Because in Unicode mode the text string is UTF-16. If it's not a text string, you should tell so, PHP doesn't have any way to know. > In what sane world do you suddenly declare all that data isn't ASCII > any more and claim that it's UTF-16 when UTF-16 isn't backwards > compatible with ASCII? Python tried that. They are moving to model PHP 6 uses in Python 3. Must be not that silly an idea, I guess. > But now \xF0 isn't going to be ASCII 128 anymore, is it? ASCII doesn't have any characters beyond 0x7f AFAIK, but it doesn't matter, I get what you mean. \xF0 in unicode mode would be U+00F0 of course. Now how preg_match should handle it depends on preg_match. -- Stanislav Malyshev, Zend Software Architect stas@zend.com http://www.zend.com/ (408)253-8829 MSN: stas@zend.com