Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:72714 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 10529 invoked from network); 20 Feb 2014 16:10:32 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 Feb 2014 16:10:32 -0000 Authentication-Results: pb1.pair.com header.from=ivan.enderlin@hoa-project.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=ivan.enderlin@hoa-project.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain hoa-project.net from 95.130.10.56 cause and error) X-PHP-List-Original-Sender: ivan.enderlin@hoa-project.net X-Host-Fingerprint: 95.130.10.56 host1.ip6-networks.net Received: from [95.130.10.56] ([95.130.10.56:45333] helo=host1.ip6-networks.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 90/18-00813-6F826035 for ; Thu, 20 Feb 2014 11:10:30 -0500 Received: from Hwhost2.local (184-175.106-92.cust.bluewin.ch [92.106.175.184]) by host1.ip6-networks.net (Postfix) with ESMTPSA id 3C60D60382 for ; Thu, 20 Feb 2014 17:10:27 +0100 (CET) Message-ID: <530628F1.6010202@hoa-project.net> Date: Thu, 20 Feb 2014 17:10:25 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:29.0) Gecko/20100101 Thunderbird/29.0a2 MIME-Version: 1.0 To: internals@lists.php.net References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [php6] Unicode support, options? From: ivan.enderlin@hoa-project.net ("Ivan Enderlin @ Hoa") On 20/02/2014 06:54, Pierre Joye wrote: > hi, Hello :-), > Unicode still remains one of the top requested features in PHP. > > However as Rasmus and other stated earlier, it is not a trivial job. > Some of the keys point we need to take care of are: > > - UTF-8 storage > - UTF-8 support for almost (if not all) existing string APIs > - Performance > > As of today, I did not find any library covering at least two of these > key points. > > [snip] > > I would like to begin to discuss our option now already. I am not > asking to get in all implementation details from a userland point of > view (like u"some text" or addng new APIs or not) but only to see what > we can do internally to work with UTF-8 string. Just a little note: using a `u"foobar"` syntax would help to switch from one to another light or heavy implementation internally, and thus, it would help to cover at least two of the key points described above. I would mention the Rust implementation of UTF-8 strings [1, 2]. It's fast, it's safe and it has a nice large API. I don't say I want to see PHP using Rust. I think it would be hard to do (even if it will certainly benefit PHP), but the algorithms they used can be a source of inspiration for us. Maybe we should consider it if we decide to have our own implementation instead of using a third library. Cheers. [1] https://github.com/mozilla/rust/blob/master/src/libstd/str.rs [2] http://static.rust-lang.org/doc/master/std/str/index.html -- Ivan Enderlin Developer of Hoa http://hoa-project.net/ PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis) http://disc.univ-fcomte.fr/ and http://www.inria.fr/ Member of HTML and WebApps Working Group of W3C http://w3.org/