Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:37488 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 4785 invoked from network); 5 May 2008 19:59:07 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 5 May 2008 19:59:07 -0000 Authentication-Results: pb1.pair.com smtp.mail=foolistbar@googlemail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=foolistbar@googlemail.com; sender-id=pass; domainkeys=bad Received-SPF: pass (pb1.pair.com: domain googlemail.com designates 64.233.182.187 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: foolistbar@googlemail.com X-Host-Fingerprint: 64.233.182.187 nf-out-0910.google.com Received: from [64.233.182.187] ([64.233.182.187:15964] helo=nf-out-0910.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 85/20-03684-A076F184 for ; Mon, 05 May 2008 15:59:07 -0400 Received: by nf-out-0910.google.com with SMTP id b11so983267nfh.13 for ; Mon, 05 May 2008 12:59:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to:content-type:content-transfer-encoding:mime-version:subject:date:x-mailer; bh=2pO1Lv4SbTROutBA/6gszV5TI7y8rSd4+YV01BJP/TE=; b=HYJq1OlQTOTmAgPCSI3WKY4HZ+DdLCCm6MeeHkfM6jIA1/st0MeReyqjDNBIjwtSaErbrLh7pVLLbgWPjRCr1iOGn6ToJh0xZFrLhfMBXkB1rx28snVo4dSktYm1sI1DyY0eWOa3CGj+lHgK4IZiLSCpZ6kFChp3Fg4TZ7L8TD4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=message-id:from:to:content-type:content-transfer-encoding:mime-version:subject:date:x-mailer; b=CwP3esmvDdWi1aNXZPosfpl9hphv9cbUrdt5MJzl+1rpxt+U2DPtHSL5ehj3eahycX1Uy0H+oCMiFCTmK6NzwsMRnYY2uOI13ukMiZPECrprwJaXNN0R0dGGXu81VuwXBqz55o1mqRNVskjCUhpffkCbWeWHQ8U/DUQSWs8nj3c= Received: by 10.210.21.6 with SMTP id 6mr6057076ebu.3.1210017543334; Mon, 05 May 2008 12:59:03 -0700 (PDT) Received: from ?192.168.0.116? ( [217.44.37.113]) by mx.google.com with ESMTPS id 1sm11614212nfv.18.2008.05.05.12.59.01 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 05 May 2008 12:59:02 -0700 (PDT) Message-ID: <9BA38A23-9C3E-43E1-A790-E6137FDE55B0@googlemail.com> To: PHP internals Content-Type: text/plain; charset=WINDOWS-1252; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v919.2) Date: Mon, 5 May 2008 20:58:59 +0100 X-Mailer: Apple Mail (2.919.2) Subject: PHP 6, backwards compatibility, and unicode.semantics From: foolistbar@googlemail.com (Geoffrey Sneddon) Hi, Over the past several months there have been various discussions about =20= PHP 6, and backwards compatibility, and what that entails, and seeming =20= it's come up again, I've finally written my thoughts: Unicode is probably the biggest change, and the one that has the most =20= repercussions for backwards compatibility. If we maintain the =20 unicode.semantics switch (which IMO absolutely must not be kept, =20 regardless of which way becomes the default), then I, and others who =20 have codebases sensitive to such things, will need to deal with four =20 different cases in functions/methods: - Unicode argument, unicode.semantics=3DOff - Binary argument, unicode.semantics=3DOff (and PHP 5) - Unicode argument, unicode.semantics=3DOn - Binary argument, unicode.semantics=3DOn Ending up with four code branches to deal with such things is =20 ludicrous. I can accept what I'm doing will be broken by Unicode, =20 necessitating two code branches if it defaults to Off, or three if On =20= (as I still need the binary/off one for PHP 5), but four is just =20 insane. That said against the greater number of code branches, I do =20 very much think we need to default to On, as there is currently no way =20= to explicitly create a Unicode string (u'', for the sake of argument) =20= without causing a compile time error on PHP 5 (allowing =85'' at a =20 compiler level would be good, IMO, and just throwing an E_FATAL when =20 we actually try and parse it (which, if it in an if statement =20 dependent on version, would be never)), except for doing something =20 like unicode_decode("\x00\x00\xFF\xFD", 'UTF-32'), which gets horrible =20= quickly (I already do that for cases when unicode.semantics is Off in =20= some code I have locally, which really isn't fun). We already have =20 b''. PHP 5.2.1 is pushing it enough for most projects, and adding a =20 u'' to even 5.3 would be a bit too late. Realistically, the only way I =20= can see happening is to default to On. Now, this means we don't have to care about code working on anything =20 less than PHP 5.2.1 in many ways =97 also making it On by default means =20= a fair amount of code will break =97 so the aim really should therefore =20= be that code that currently works on PHP 5.2.1 doesn't need to work =20 verbatim on PHP 6, but it must be possible to have code work on both =20 PHP 5.2.1 and PHP 6, using (almost) all the new features of PHP 6, =20 albeit branching with if statements to keep PHP 5 compatibility (using =20= things like namespaces would inevitably push that up to PHP 5.3 and =20 PHP 6). Now, taking the fact we don't need stuff to work verbatim, we can do =20 all kinds of crazy cleanup (beyond the likes of removing magic_*, =20 safe_mode, register_globals, etc.) like stopping dynamic methods being =20= called statically, and visa-versa (IIRC, this is already in HEAD), as =20= well as getting rid of deprecated stuff that's been around forever. Going back to Unicode briefly, there are some special cases regardless =20= of the default, such as chr() =97 we need a way to have both binary and =20= Unicode chr() functions (else we end up doing hell for Unicode, and =20 something like unicode_encode(chr(42), 'UTF-8') (this matches the =20 behaviour of chr() on the GNU userland). -- Geoffrey Sneddon