Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:72835 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 50137 invoked from network); 27 Feb 2014 06:13:43 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 27 Feb 2014 06:13:43 -0000 Authentication-Results: pb1.pair.com smtp.mail=pierre.php@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=pierre.php@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.43 as permitted sender) X-PHP-List-Original-Sender: pierre.php@gmail.com X-Host-Fingerprint: 209.85.216.43 mail-qa0-f43.google.com Received: from [209.85.216.43] ([209.85.216.43:41340] helo=mail-qa0-f43.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 53/C0-41714-597DE035 for ; Thu, 27 Feb 2014 01:13:42 -0500 Received: by mail-qa0-f43.google.com with SMTP id o15so3586935qap.2 for ; Wed, 26 Feb 2014 22:13:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=LCwnBHdInY39i694m+R7aHbuKEOWS72afo7RWgvArgk=; b=MncFZYBhMwbqGoOA26ojEntL/B/xAxjuH0B5DvE8Q5DBda6n7zeT6xVOkjrPkKsWm3 BrAayZAfvb4+YV+g2uYr+Q2TnJ7thJrtVR8WSKKoFUC9BIMMY9v2dG5IPpY3+/BkqAcP p3ViQiiEO/VSXY59JbTY7MCrbzOHiCNRDeRHiYio1YuLhSIMCDkdUjoL2ZkfuCKxSxyb M8zNXr/hCDFUAvJS/NGasbvQfCA4UsIO7weicDkc/S8GzFNTBjrVsoqg60j3TU6+GyPp oACMysq5y7FzwKM7BIJ+x2LeLJ/quy8R5pdF7s0dq7w13Vhik7oGg1bjFHX0OdsuKOhm bntg== MIME-Version: 1.0 X-Received: by 10.140.26.43 with SMTP id 40mr4750673qgu.86.1393481618832; Wed, 26 Feb 2014 22:13:38 -0800 (PST) Received: by 10.140.18.145 with HTTP; Wed, 26 Feb 2014 22:13:38 -0800 (PST) In-Reply-To: References: Date: Thu, 27 Feb 2014 07:13:38 +0100 Message-ID: To: PHP internals Content-Type: text/plain; charset=UTF-8 Subject: Re: [php6] Unicode support, options? From: pierre.php@gmail.com (Pierre Joye) On Thu, Feb 20, 2014 at 6:54 AM, Pierre Joye wrote: > * ICU: > U_CHARSET_IS_UTF8 allows to force ICU to use UTF-8 by default. It is a > ICU compile time setting.It is is not possible to set it at PHP > configure time. It means that users will have to create their own > build. Alternatively we can bundle ICU but this will be awkward, a > maintenance nightmare for both php and the distros. > > Alternatively UText can be used to create UTF-8 string. APIs accepting > UText allow almost everything we need. However the counterpart is that > a UTF-8 UText is readonly. Any operation altering its content will > require duplication, clones or conversions. That may kill all gains we > got from using UTF-8 only. > > The U_CHARSET_IS_UTF8 is very appealing but to bundle ICU is actually > show stopper. Asking users to custom build ICU is not an option > either. I do not know if the distros will be ready to provide two > different builds of ICU either, it may add a lot of issues with all > projects using ICU. Here is a 1st reply from ICU: http://sourceforge.net/p/icu/mailman/message/32031609/ It sounds like this flag could be a good option for PHP's Unicode support. Btw, I created a sub page for Unicode support: https://wiki.php.net/ideas/php6/unicode > Thoughts, comments or ideas? I found another C++ library to do the basic UTF-8 operations, easl: https://code.google.com/p/easl/ It could be a nice one to use in combination with ICU, small and fast (1st tests). Cheers, -- Pierre @pierrejoye | http://www.libgd.org