Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:88686 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 71632 invoked from network); 5 Oct 2015 17:06:31 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 5 Oct 2015 17:06:31 -0000 Authentication-Results: pb1.pair.com smtp.mail=fsb@thefsb.org; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=fsb@thefsb.org; sender-id=pass Received-SPF: pass (pb1.pair.com: domain thefsb.org designates 173.203.187.83 as permitted sender) X-PHP-List-Original-Sender: fsb@thefsb.org X-Host-Fingerprint: 173.203.187.83 smtp83.iad3a.emailsrvr.com Linux 2.6 Received: from [173.203.187.83] ([173.203.187.83:59520] helo=smtp83.iad3a.emailsrvr.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 45/00-05723-61EA2165 for ; Mon, 05 Oct 2015 13:06:31 -0400 Received: from smtp11.relay.iad3a.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp11.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id C9EE2100644; Mon, 5 Oct 2015 13:06:27 -0400 (EDT) Received: by smtp11.relay.iad3a.emailsrvr.com (Authenticated sender: fsb-AT-thefsb.org) with ESMTPSA id 8D44910063A; Mon, 5 Oct 2015 13:06:27 -0400 (EDT) X-Sender-Id: fsb@thefsb.org Received: from yossy.local (c-73-4-147-142.hsd1.ma.comcast.net [73.4.147.142]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA) by 0.0.0.0:587 (trex/5.4.2); Mon, 05 Oct 2015 17:06:27 GMT To: Martin Keckeis , Tom Worster References: Cc: php-internals Message-ID: <5612AE0C.2010204@thefsb.org> Date: Mon, 5 Oct 2015 13:06:20 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] PHP 7.0's Unicode version incoherence (mbstring, intl, pcre) From: fsb@thefsb.org (Tom Worster) On 10/5/15 5:20 AM, Martin Keckeis wrote: > Hello, > > 2015-10-01 21:19 GMT+02:00 Tom Worster : >> >> Do people here agree that PHP should have a *policy* of using a consistent >> Unicode version? >> >> This appears to be easy to accomplish for the moment. Moving to Unicode 8 >> will be harder. >> > > I agree with the policy -> good idea. > > But I think there will be a lot of problems, when staying on Unicode 7. > Since Unicode 8 has the new emoji (colors) and that is used more and more. The new emoji are neat and people can and should use them. They just can't expect intl or preg to recognize them until some release after 7.0.0. Using Unicode 8 uniformly involves moving to PCRE2, which doesn't seem easy. It's a big API change for little functional gain. I'd like to see PCRE2 implement a bit more of TR18 but I doubt it's a priority. ICU regex as the next big thing in intl is an interesting idea. I've no idea how hard it is. http://unicode.org/reports/tr18/ So Unicode 8 depends on what PHP does about regex going forwards. I asked about this on Sep 24 but got no response. http://www.serverphorums.com/read.php?7,1303221 Tom