Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:72732 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 94418 invoked from network); 21 Feb 2014 12:00:36 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Feb 2014 12:00:36 -0000 Authentication-Results: pb1.pair.com smtp.mail=lester@lsces.co.uk; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=lester@lsces.co.uk; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 217.147.176.204 cause and error) X-PHP-List-Original-Sender: lester@lsces.co.uk X-Host-Fingerprint: 217.147.176.204 mail4.serversure.net Linux 2.6 Received: from [217.147.176.204] ([217.147.176.204:47541] helo=mail4.serversure.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 62/31-22355-1EF37035 for ; Fri, 21 Feb 2014 07:00:34 -0500 Received: (qmail 18109 invoked by uid 89); 21 Feb 2014 12:00:03 -0000 Received: by simscan 1.3.1 ppid: 18080, pid: 18102, t: 0.0649s scanners: attach: 1.3.1 clamav: 0.96/m:52 Received: from unknown (HELO linux-dev4.lsces.org.uk) (lester@rainbowdigitalmedia.org.uk@81.138.11.136) by mail4.serversure.net with ESMTPA; 21 Feb 2014 12:00:03 -0000 Message-ID: <530740B9.5000509@lsces.co.uk> Date: Fri, 21 Feb 2014 12:04:09 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0 SeaMonkey/2.24 MIME-Version: 1.0 To: internals@lists.php.net References: <53061982.2050901@googlemail.com> <53066DE9.4090809@googlemail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [php6] Unicode support, options? From: lester@lsces.co.uk (Lester Caine) Pierre Joye wrote: >> What do you understand by "storage"? > To have string stored as UTF-8 only, no conversion required for 99% of our use. I think that the first thing that needs to be agreed on is if there will be support for UTF-8 in the core? As has already been said, in many places this currently just works and so blocking that may be more of a problem now? The question surly is "What is the 1% that needs some extra work?" I light library would be most appropriate for filling the gaps currently created by use of UTF-8 strings in the core? It is not until one starts adding the mbstring level of string processing that a more powerful library is required. Something that simply ensures UTF-8 strings are valid and can carry out comparisons as required? The black hole is still 'case sensitivity' and it is perhaps laying down a 'light' set of rules for this which would allow a path forward? As I have indicated, I'd prefer simply dropping case insensitivity, but a compromise might be to retain it where a string length does not change, and a clean reverse transform exists? So a library that provides that comparison as part of the core package? I think that moving forward, ICU support is essential, but it is difficult while the 'wrong' defaults are applied and I am seeing private builds being used in other projects to get around that hurdle. Hence my question as to if people are taking that approach. -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk