Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:30474 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 48441 invoked by uid 1010); 6 Jul 2007 03:44:44 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 48403 invoked from network); 6 Jul 2007 03:44:44 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 6 Jul 2007 03:44:44 -0000 Authentication-Results: pb1.pair.com header.from=ceo@l-i-e.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=ceo@l-i-e.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain l-i-e.com from 67.139.134.202 cause and error) X-PHP-List-Original-Sender: ceo@l-i-e.com X-Host-Fingerprint: 67.139.134.202 o2.hostbaby.com FreeBSD 4.7-5.2 (or MacOS X 10.2-10.3) (2) Received: from [67.139.134.202] ([67.139.134.202:1632] helo=o2.hostbaby.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 4A/B2-26602-97ABD864 for ; Thu, 05 Jul 2007 23:43:54 -0400 Received: (qmail 79557 invoked by uid 98); 6 Jul 2007 03:43:57 -0000 Received: from 127.0.0.1 by o2.hostbaby.com (envelope-from , uid 1013) with qmail-scanner-2.01 (clamdscan: 0.88.7/3603. Clear:RC:1(127.0.0.1):. Processed in 0.145585 secs); 06 Jul 2007 03:43:57 -0000 Received: from localhost (HELO l-i-e.com) (127.0.0.1) by localhost with SMTP; 6 Jul 2007 03:43:57 -0000 Received: from 24.1.37.132 (SquirrelMail authenticated user ceo@l-i-e.com) by www.l-i-e.com with HTTP; Thu, 5 Jul 2007 22:43:57 -0500 (CDT) Message-ID: <2159.24.1.37.132.1183693437.squirrel@www.l-i-e.com> In-Reply-To: <54557.78.61.224.253.1183098089.squirrel@avilys.eik.lt> References: <1181829227.3478.3.camel@localhost.localdomain> <7d5a202f0706141844l3c75b556hdbecbcd5a43747c9@mail.gmail.com> <4671F184.2020401@lerdorf.com> <6sof73dj69ldpspfc5ukrc58qr9ckbin2b@4ax.com> <4677E7B1.2080305@lerdorf.com> <4677F5FB.1070206@lerdorf.com> <4678252F.2050803@sci.fi> <46783212.4020900@lerdorf.com> <34654.216.230.84.67.1183064088.squirrel@www.l-i-e.com> <54557.78.61.224.253.1183098089.squirrel@avilys.eik.lt> Date: Thu, 5 Jul 2007 22:43:57 -0500 (CDT) To: "Tomas Kuliavas" Cc: internals@lists.php.net Reply-To: ceo@l-i-e.com User-Agent: Hostbaby Webmail MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6? From: ceo@l-i-e.com ("Richard Lynch") On Fri, June 29, 2007 1:21 am, Tomas Kuliavas wrote: >> If unicode semantics are "on" what exactly is borked in PHP 5? > > In Unicode mode \[0-7]{1,3} and \x[0-9A-Fa-f]{1,2} refer to unicode > code > points and not to octal or hexadecimal byte values. Fix is not > backwards > compatible. Gak. You mean this will break: because of Unicode? That's nuts. That can't be right... > Scripts can't match bytes. How they are supposed to check if string is > in > plain ascii or in 8bit? Do conversion to ASCII and check for errors > instead of looking for 8bit byte values? How can scripts replace 8bit > bytes with some other strings? ISO-8859-2 decoding table contains 95 > entries written and evaluated as binary strings. Same thing applies to > other iso-8859 and windows-125x character sets. iso-89859-1 and utf-8 > decoding does not use mapping tables and performs complex calculations > with byte values. multibyte character set decoding might actually > benefit > from unicode_encode(), if Table 325 (http://www.php.net/unicode) > provides > more information about U_INVALID_SUBSTITUTE and other unicode. > settings. I don't even understand this. But if I haven't done something new-fangled to make a string be some new-fangled Unicode thingie, then it's just plain old ASCII, no? Or PHP can just assume that anyway... > PHP6 does not provide backwards compatible functions to work with > bytes. > Provided constructs are not backwards compatible. If scripts want to > do > MIME Q encoding, they must work with bytes. Doing Q encoding with > provided > PHP extensions adds extra dependencies. Another one I don't understand... But since I believe MIME emails are a blight on the universe, I suspect I just don't care either. :-) > ICU does not support HTML target. Text conversion to iso-8859-x or > windows-125x targets will be lossy. Well, yeah, if you down-sample UTF-* to a character set that doesn't have the characters you typed in UTF-*, then those characters won't make it through the translation. Output your HTML in UTF-* or accept the loss. >> Can that be fixed to be BC without resorting to this toggle? > > Unicode and binary typecasting causes E_PARSE error in PHP 5.2.0 and > older. That's fine. PHP 6 code that uses new PHP 6 features needs PHP 6. If that surprises somebody, they have a fundamental misunderstanding of major release version. > PHP6 could introduce new Unicode aware functions, but Unicode > implementation choose to modify existing ones. All low level string > operations ($string[1]) are Unicode aware by default and not when > script > actually asks for it. Such implementation is designed for developers, > who > don't care about Unicode support and want it out of the box without > any > changes in their Unicode unaware scripts. It is not designed for > developers that actually need it and want to have code working in PHP6 > and > PHP4/5. But an old script ought to just work... > Unicode code points can be defined with \u, but PHP6 breaks existing > octal > and hex escape sequences. If you're saying what I think you're saying, that's just daft... Nobody [*] will switch to PHP 6 if I am interpreting these statements correctly... * Nobody == even a slower adoption rate than the glacial PHP 5. > PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer > downcoded for binary stream runtime_encoding", "Warning: > base64_encode() > expects parameter 1 to be strictly a binary string, Unicode string > given") > about data stream and string operations. even when fwrite() or > base64_encode() works only with plain ascii data. PHP script > developers > are not used to strict variable type checks in string functions. Which > functions are modified to require binary typecasting? Do I have to > make a > list myself every time some function freaks out? Hopefully these are going away as the Unicode stuff is finished?... -- Some people have a "gift" link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So?