Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:30711 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 52627 invoked by uid 1010); 10 Jul 2007 02:38:52 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 52612 invoked from network); 10 Jul 2007 02:38:52 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 10 Jul 2007 02:38:52 -0000 Authentication-Results: pb1.pair.com smtp.mail=andi@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=andi@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 63.205.162.114 as permitted sender) X-PHP-List-Original-Sender: andi@zend.com X-Host-Fingerprint: 63.205.162.114 unknown Windows 2000 SP4, XP SP1 Received: from [63.205.162.114] ([63.205.162.114:4671] helo=us-ex1.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id CC/40-50204-A31F2964 for ; Mon, 09 Jul 2007 22:38:52 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Date: Mon, 9 Jul 2007 19:38:46 -0700 Message-ID: <698DE66518E7CA45812BD18E807866CE647897@us-ex1.zend.net> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6? Thread-Index: AcfCderXXdJKJelhRIqq0hzZbu4cJAAJBwUA References: <1181829227.3478.3.camel@localhost.localdomain> <4678252F.2050803@sci.fi> <46783212.4020900@lerdorf.com> <34654.216.230.84.67.1183064088.squirrel@www.l-i-e.com> <54557.78.61.224.253.1183098089.squirrel@avilys.eik.lt> <4684BB91.4070507@zend.com> <2169.24.1.37.132.1183693664.squirrel@www.l-i-e.com> <1183699755.14343.5.camel@johannes.nop> <7d5a202f0707060224oa64dfeaw2c7ee17a735648f9@mail.gmail.com> <468E1158.2030900@lerdorf.com> <468E13C6.1070109@pooteeweet.org> <468E2009.9000703@zend.com> <468E7180.3020709@zend.com> <468E7256.10905@zend.com> <4692B1A3.1000808@zend.com> To: "Antony Dovgal" , "Andrei Zmievski" Cc: "Stas Malyshev" , Subject: RE: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6? From: andi@zend.com ("Andi Gutmans") The large amount of the dual IS_UNICODE/IS_STRING will need to stay in the code base anyway as we will be supporting binary strings in PHP 6. So it's not accurate that all these maintance issues will be resolved by not supporting unicode_semantics=3Doff. I believe unlike what Andrei said, for a large community of ours (probably the majority) default unicode_semantics=3Don will not be of interest (we don't live in a purists world). Many won't want to run it because it's going to be significantly slower and will be harder for them to work with. This community will be best served to be able to run in native 8bit mode and having some Unicode functionality available if/when needed. Having dual mode in PHP 6 is not the same as forking two code bases. There is still like namespaces automatically reach both audiences. If we're talking from a pure "what is most useful to the majority of our users" I'd actually argue that explicit Unicode strings would be the most convenient, i.e. instead of doing b"8bitstring" you'd do U"unicodestring". Other languages do the same and there are reasons for that. As we've decided on a more aggressive (and risky) approach, I think having this dual mode is extremely important. It will also make the upgrade path easier. Btw, I don't know how many of you have actually tried to port PHP 5 apps to PHP 6 but it's quite a disaster. We made some fixes in the past 2-3 weeks and its getting better but it still requires a lot of work. If we don't make this easy then this is all not worth too much. This project has never been a purists project which is why it's been so successful, let's not start now... Andi