Newsgroups: php.internals,php.internals Path: news.php.net Xref: news.php.net php.internals:30358 php.internals:30361 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 5692 invoked by uid 1010); 29 Jun 2007 07:55:37 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 5676 invoked from network); 29 Jun 2007 07:55:37 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 29 Jun 2007 07:55:37 -0000 Authentication-Results: pb1.pair.com smtp.mail=rquadling@googlemail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=rquadling@googlemail.com; sender-id=pass; domainkeys=bad Received-SPF: pass (pb1.pair.com: domain googlemail.com designates 64.233.162.233 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: rquadling@googlemail.com X-Host-Fingerprint: 64.233.162.233 nz-out-0506.google.com Received: from [64.233.162.233] ([64.233.162.233:54218] helo=nz-out-0506.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 81/20-04089-8FAB4864 for ; Fri, 29 Jun 2007 03:55:37 -0400 Received: by nz-out-0506.google.com with SMTP id i1so364157nzh for ; Fri, 29 Jun 2007 00:55:34 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=googlemail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=IqoaF9PKQOz2Va3DvFZiV4lqMJHLwQKdQJ+qfY1y+cCYVk77Y7N/x5TMEu+ltyrQ+WYskAVpcAtwhtQzasztCJn7OVN/4Z8+yImpgqf8LOvS3TLMxvoTPU15wb8g2JOVKRLlcXMz3nb7iY90lLzhrWZkDIb0hnzlvu61t889hj0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=beta; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=RPgiQSuoKMR3t8qibSAH4hCFJIGUa9Kk72wkX2Pa84pRoEBaXl+KHc33lNEI1vmhb7Cwdik2t9riwmXyK5s7bN2n6u+DDxv9Fac9vj09eVkGiVf6wnG/YjQLTpug1espaQLWk54nsZKGTGdFUWIjnkombWFGSqwtz585BvVIGQk= Received: by 10.115.111.1 with SMTP id o1mr2356560wam.1183103361427; Fri, 29 Jun 2007 00:49:21 -0700 (PDT) Received: by 10.115.74.10 with HTTP; Fri, 29 Jun 2007 00:49:21 -0700 (PDT) Message-ID: <10845a340706290049h46d70f15u83e38ae63a4a07@mail.gmail.com> Date: Fri, 29 Jun 2007 08:49:21 +0100 Reply-To: RQuadling@GoogleMail.com To: "Tomas Kuliavas" Cc: internals@lists.php.net In-Reply-To: <54557.78.61.224.253.1183098089.squirrel@avilys.eik.lt> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <1181829227.3478.3.camel@localhost.localdomain> <4671F184.2020401@lerdorf.com> <6sof73dj69ldpspfc5ukrc58qr9ckbin2b@4ax.com> <4677E7B1.2080305@lerdorf.com> <4677F5FB.1070206@lerdorf.com> <4678252F.2050803@sci.fi> <46783212.4020900@lerdorf.com> <34654.216.230.84.67.1183064088.squirrel@www.l-i-e.com> <54557.78.61.224.253.1183098089.squirrel@avilys.eik.lt> Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6? From: rquadling@googlemail.com ("Richard Quadling") On 29/06/07, Tomas Kuliavas wrote: > >> It comes down to predicting the future. Whichever way we go, the > >> decision is going to be second-guessed. If we have critical mass for > >> a > >> clean BC break, then I am ok with it. For me personally it would make > >> things a bit easier, but I think it would be a long long time before > >> we > >> saw any large hosts out there switch to a PHP 6 that can't run common > >> PHP 5 apps. > > > > If they switch to 6 with unicode off, and never ever get around to > > turning unicode on, will it really be any better? > > > > They'll just be running some weird-o setup that causes all kinds of > > bugs and issues and you'll have users with php 6 apps that won't work > > in php 6 and who submit bogus bug reports about it, because of the > > setting. > > > > A clean break is probably better, especially if it makes php 6 much > > more maintainable. > > > > Large-scale hosts won't switch to 6 any faster than they switched to > > 5, unless there are ZERO BC breaks. > > > > And nobody can guarantee zero breaks, because there are always buglets. > > buglet = small break and not something that requires massive code rewrite. > Rewritten code is no longer backwards compatible. So developers have to > maintain two code branches or two different sets of libraries. If code is > maintained in one branch, scripts will need wrapper functions for most of > PHP string and stream function calls. Instead of having performance loss > in interpreter, you will force performance loss in portable scripts. > > > The effort to have unicode off in 6 is probably larger than the effort > > to document what needs to be done to a PHP 5 app to make it be > > 6-friendly, or even write tools to auto-convert the buik of a script. > > > > If unicode semantics are "on" what exactly is borked in PHP 5? > > In Unicode mode \[0-7]{1,3} and \x[0-9A-Fa-f]{1,2} refer to unicode code > points and not to octal or hexadecimal byte values. Fix is not backwards > compatible. > > Scripts can't match bytes. How they are supposed to check if string is in > plain ascii or in 8bit? Do conversion to ASCII and check for errors > instead of looking for 8bit byte values? How can scripts replace 8bit > bytes with some other strings? ISO-8859-2 decoding table contains 95 > entries written and evaluated as binary strings. Same thing applies to > other iso-8859 and windows-125x character sets. iso-89859-1 and utf-8 > decoding does not use mapping tables and performs complex calculations > with byte values. multibyte character set decoding might actually benefit > from unicode_encode(), if Table 325 (http://www.php.net/unicode) provides > more information about U_INVALID_SUBSTITUTE and other unicode. settings. > > PHP6 does not provide backwards compatible functions to work with bytes. > Provided constructs are not backwards compatible. If scripts want to do > MIME Q encoding, they must work with bytes. Doing Q encoding with provided > PHP extensions adds extra dependencies. > > ICU does not support HTML target. Text conversion to iso-8859-x or > windows-125x targets will be lossy. > > > Can that be fixed to be BC without resorting to this toggle? > > Unicode and binary typecasting causes E_PARSE error in PHP 5.2.0 and older. > > PHP6 could introduce new Unicode aware functions, but Unicode > implementation choose to modify existing ones. All low level string > operations ($string[1]) are Unicode aware by default and not when script > actually asks for it. Such implementation is designed for developers, who > don't care about Unicode support and want it out of the box without any > changes in their Unicode unaware scripts. It is not designed for > developers that actually need it and want to have code working in PHP6 and > PHP4/5. > > Unicode code points can be defined with \u, but PHP6 breaks existing octal > and hex escape sequences. > > PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer > downcoded for binary stream runtime_encoding", "Warning: base64_encode() > expects parameter 1 to be strictly a binary string, Unicode string given") > about data stream and string operations. even when fwrite() or > base64_encode() works only with plain ascii data. PHP script developers > are not used to strict variable type checks in string functions. Which > functions are modified to require binary typecasting? Do I have to make a > list myself every time some function freaks out? > > > -- > Tomas The more I read about what is in place for PHP6 with regard to Unicode, I feel Unicode should have been an extension included in the core, rather than rewriting the core. Provide a series of useful classes and functions. It is there if you want it and as more and more people get used to it, more use will be made of it. It almost looks like all the time and energy (thank you to you all) that has been put into PHP6 to make it Unicode aware will be wasted if it is disabled by default. I also feel that if it is enabled by default and causes so much BC that no one will upgrade. -- ----- Richard Quadling Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731 "Standing on the shoulders of some very clever giants!"