Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:17848 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 19864 invoked by uid 1010); 12 Aug 2005 00:05:20 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 19849 invoked from network); 12 Aug 2005 00:05:20 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Aug 2005 00:05:20 -0000 X-Host-Fingerprint: 216.145.54.171 mrout1.yahoo.com FreeBSD 4.7-5.2 (or MacOS X 10.2-10.3) (2) Received: from ([216.145.54.171:32100] helo=mrout1.yahoo.com) by pb1.pair.com (ecelerity 2.0 beta r(6323M)) with SMTP id 7D/E8-33075-991EBF24 for ; Thu, 11 Aug 2005 19:39:05 -0400 Received: from [66.228.175.145] (borndress-lm.corp.yahoo.com [66.228.175.145]) by mrout1.yahoo.com (8.13.4/8.13.4/y.out) with ESMTP id j7BNaeMe091494 for ; Thu, 11 Aug 2005 16:36:58 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v622) Content-Transfer-Encoding: 7bit Message-ID: Content-Type: text/plain; charset=US-ASCII; format=flowed To: PHP Developers Mailing List Date: Thu, 11 Aug 2005 16:36:56 -0700 X-Mailer: Apple Mail (2.622) Subject: How to get started with Unicode From: andrei@gravitonic.com (Andrei Zmievski) By now, Unicode merge into the public tree has taken place. How do you get started? 1. Take a deep breath. 2. Download and build ICU 3.4. Location: http://www-306.ibm.com/software/globalization/icu/downloads.jsp Extract and cd into icu/source. Execute configure (replacing /usr/local with your prefix): ./configure --prefix=/usr/local --disable-threads --enable-extras --enable-icuio --enable-layout make and make install 2. Update to PHP CVS HEAD (cvs upd -dPA) or better, do a clean check-out. 3. Run ./buildconf. 4. Run ./configure and use --with-icu-dir= if you put ICU in a non-standard location. 5. Hopefully it configures without a problem. 6. Cross your fingers (better, cross your toes too) and run 'make'. 7. Once the smoke dissipates (and if you're on Powerbook, once it cools down from nuclear to just melting hot), you can continue. Since you have not turned unicode_semantics switch on, you should be able to run all the old scripts. If you'd like to dabble in the Unicode land, the suggested way to configure for it is something like this: unicode_semantics = on unicode.runtime_encoding = iso-8859-1 (or your favorite one) unicode.script_encoding = utf-8 unicode.output_encoding = utf-8 unicode.from_error_mode = U_INVALID_SUBSTITUTE unicode.from_error_subst_char = 3f Now you can get your hands dirty. If you'd like to see how certain functions have been upgraded to accommodate Unicode and binary types, check out substr(), explode(), trim(), str_repeat(), and strlen(). ICU API reference is at http://icu.sourceforge.net/ There is a new 'make' target called 'utest' - it is supposed to turn unicode_semantics switch on and run all the tests. I don't think it will quite work, due to changes in the streams and some other things, so that might be the first thing to get fixed. A writeup about new APIs and function upgrade guidelines will be coming up, but not today, because you need time to digest this and I need time to recover. :-) Have fun, -Andrei