Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:30755 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 49842 invoked by uid 1010); 11 Jul 2007 00:06:50 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 49827 invoked from network); 11 Jul 2007 00:06:50 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 11 Jul 2007 00:06:50 -0000 Authentication-Results: pb1.pair.com header.from=larry@garfieldtech.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=larry@garfieldtech.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain garfieldtech.com from 204.127.200.85 cause and error) X-PHP-List-Original-Sender: larry@garfieldtech.com X-Host-Fingerprint: 204.127.200.85 sccrmhc15.comcast.net NetCache Data OnTap 5.x Received: from [204.127.200.85] ([204.127.200.85:40106] helo=sccrmhc15.comcast.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id DE/DA-63792-61F14964 for ; Tue, 10 Jul 2007 20:06:49 -0400 Received: from earth.ufp (c-71-228-13-89.hsd1.il.comcast.net[71.228.13.89]) by comcast.net (sccrmhc15) with ESMTP id <2007071100064201500j2nmoe>; Wed, 11 Jul 2007 00:06:42 +0000 Received: from localhost (localhost [127.0.0.1]) by earth.ufp (Postfix) with ESMTP id 84F6AD8143 for ; Tue, 10 Jul 2007 19:06:42 -0500 (CDT) Received: from earth.ufp ([127.0.0.1]) by localhost (earth.ufp [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0k+K+Ounx5co for ; Tue, 10 Jul 2007 19:06:42 -0500 (CDT) Received: from vulcan.ufp (vulcan.ufp [192.168.42.4]) by earth.ufp (Postfix) with ESMTP id 71440D8141 for ; Tue, 10 Jul 2007 19:06:32 -0500 (CDT) To: internals@lists.php.net Date: Tue, 10 Jul 2007 19:06:30 -0500 User-Agent: KMail/1.9.6 References: <1181829227.3478.3.camel@localhost.localdomain> <4692B1A3.1000808@zend.com> <4692B7D4.6040001@zend.com> In-Reply-To: <4692B7D4.6040001@zend.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-ID: <200707101906.30925.larry@garfieldtech.com> Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6? From: larry@garfieldtech.com (Larry Garfield) On Monday 09 July 2007, Stanislav Malyshev wrote: > > Do _I_ like that horrible IS_STRING/IS_UNICODE mess we have atm? No. > > I don't think there's any way of having both unstructured character data > and Unicode text represented without having two distinct types. Either > that or you'd have to tell on each step which one it is, and that would > suck much more. > > > I would love to have clean and easy PHP6 without all the > > "compatibility", which creates gazillion problems to both users and > > developers. > > Fixing unicode=3Don does not remove the IS_STRING/IS_UNICODE duality. We > still have two kinds of data - unstructured bit stream and structured > text. If we want strlen("=D0=BF=D1=80=D0=B5=D0=B2=D0=B5=D0=B4") to return= 6 - since that Russian word > has 6 characters - then we have no way but recognize that it's not just > a collection of bits but Unicode text, and that would require separate > type, as I see it. And as I see it, this is the source of the problems > when people try to operate on text as on bit stream and vice versa. > > Unless I totally missed what mess you are referring to... I am coming into this discussion decidedly late here, so please thwap me=20 gently if this is a FAQ. Do we have any idea of what percentage of strings= =20 in the "wild" would break if treated as Unicode vs. not? =20 If 90% of the strings in use would work fine if treated as unicode, then it= =20 would make sense to just always assume Unicode unless explicitly specified= =20 otherwise. If 90% of the strings in use would die if treated as Unicode, then Unicode= =20 should probably be the exception and only when explicitly defined. I'm not liking the ghosts of magic_quotes I'm seeing implied here with=20 different modes for the server to be in. That sounds like it would make=20 writing code that works the same everywhere and is not ugly to read (craplo= ad=20 of markers or lots of conditionals) quite difficult. As I said, feel free to assuage my fear if appropriate. :-) =2D-=20 Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of=20 exclusive property, it is the action of the thinking power called an idea,= =20 which an individual may exclusively possess as long as he keeps it to=20 himself; but the moment it is divulged, it forces itself into the possessio= n=20 of every one, and the receiver cannot dispossess himself of it." -- Thomas= =20 Jefferson