Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:30917 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 9923 invoked by uid 1010); 14 Jul 2007 08:20:14 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 9907 invoked from network); 14 Jul 2007 08:20:14 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Jul 2007 08:20:14 -0000 Authentication-Results: pb1.pair.com header.from=ceo@l-i-e.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=ceo@l-i-e.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain l-i-e.com from 67.139.134.202 cause and error) X-PHP-List-Original-Sender: ceo@l-i-e.com X-Host-Fingerprint: 67.139.134.202 o2.hostbaby.com FreeBSD 4.7-5.2 (or MacOS X 10.2-10.3) (2) Received: from [67.139.134.202] ([67.139.134.202:3177] helo=o2.hostbaby.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 2A/71-28893-B3788964 for ; Sat, 14 Jul 2007 04:20:12 -0400 Received: (qmail 37657 invoked by uid 98); 14 Jul 2007 08:20:16 -0000 Received: from 127.0.0.1 by o2.hostbaby.com (envelope-from , uid 1013) with qmail-scanner-2.01 (clamdscan: 0.88.7/3655. Clear:RC:1(127.0.0.1):. Processed in 0.147952 secs); 14 Jul 2007 08:20:16 -0000 Received: from localhost (HELO l-i-e.com) (127.0.0.1) by localhost with SMTP; 14 Jul 2007 08:20:15 -0000 Received: from 24.1.37.132 (SquirrelMail authenticated user ceo@l-i-e.com) by www.l-i-e.com with HTTP; Sat, 14 Jul 2007 03:20:15 -0500 (CDT) Message-ID: <2394.24.1.37.132.1184401215.squirrel@www.l-i-e.com> In-Reply-To: <46958E6B.1000707@lerdorf.com> References: <1181829227.3478.3.camel@localhost.localdomain> <4692B1A3.1000808@zend.com> <4692B7D4.6040001@zend.com> <200707101906.30925.larry@garfieldtech.com> <2237.24.1.37.132.1184204516.squirrel@www.l-i-e.com> <46958E6B.1000707@lerdorf.com> Date: Sat, 14 Jul 2007 03:20:15 -0500 (CDT) To: "Rasmus Lerdorf" Cc: "Larry Garfield" , internals@lists.php.net Reply-To: ceo@l-i-e.com User-Agent: Hostbaby Webmail MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6? From: ceo@l-i-e.com ("Richard Lynch") On Wed, July 11, 2007 9:14 pm, Rasmus Lerdorf wrote: > Richard, you are rather confused on this Unicode stuff. I'm 100% certain we can all agree on that point. :-) > The fact that > PHP and ICU uses UTF-16 internally has absolutely nothing to do with > what is exposed at the scripting level. But somebody has just said that it will, didn't they? That GPC data will be Unicode, and trying to use it as ASCII will break? > The only things that will break in a standard application is stuff > that > relies on strings being binary. Normal text passing back and forth > between the browser and the server will work just fine. > > The breakages, apart from various bugs at this early stage, are > limited > to places where the code is expecting to see a binary string and PHP > hasn't been able to determine this automatically. And hopefully we > can > come up with ways to automatically determine when something should > default to a binary string. > > But if you write: > > $a = "マニュアル"; > echo $a[1]; Whoa. That was weird... It was just a bunch of question marks when I read it, and now it's a bunch of symbols (variants on afz mostly) in my reply... > and you expect to have that spew out 0xe3, then yes, it will break > because it will result in ニ which is what it really should do. You have me beat at the "...if you write" part, because I have no idea how to make my keyboard make those symbols... :-v My only concern is that: http://example.com/foo=bar echo $_GET['foo'][2]; should still print out 'a' just like it always has. And: http://example.com/mask=100110 echo $_GET['mask'] & 110010; should print out 100010 just like it always has Folks keep saying that bit-string manipulation makes no sense in Unicode, and that's fine, I guess... If a scripter is trying to do that, then see if the string is ASCII [01]* and typecast it to binary string or whatever and just move on with life in the old way. > And yes, I know a lot of people reading this list don't care much for > other charsets, but people reading an english mailing list are rather > self-selecting. I love the idea of users being able to write things in their own language, and somehow it magically all just "looks right" when I slam it into the database with mysql_real_escape_string and spew it back out the the browser with htmlentities! But it never quite seems to work out, in my limited experience, because some software somewhere always manages to mangle it... And I release the whole point of Unicode in PHP 6 is to make PHP 6 not be that piece of software that mangles it, and I'm sure you guys are getting that bit right. Well, I hope so anyway. :-) I especially hope so, because if you don't get it right, I'll never be able to tell, as I wouldn't notice the difference if it's broken or not just by looking at the text in anything other than English. I just get real concerned when it seems to me like a lot of scripts are going to break, based on what folks who should know post here... -- Some people have a "gift" link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So?