Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:72649 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 23558 invoked from network); 17 Feb 2014 08:23:48 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 Feb 2014 08:23:48 -0000 Authentication-Results: pb1.pair.com smtp.mail=lester@lsces.co.uk; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=lester@lsces.co.uk; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 217.147.176.204 cause and error) X-PHP-List-Original-Sender: lester@lsces.co.uk X-Host-Fingerprint: 217.147.176.204 mail4.serversure.net Linux 2.6 Received: from [217.147.176.204] ([217.147.176.204:50124] helo=mail4.serversure.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id DB/DA-56374-217C1035 for ; Mon, 17 Feb 2014 03:23:47 -0500 Received: (qmail 6516 invoked by uid 89); 17 Feb 2014 08:23:44 -0000 Received: by simscan 1.3.1 ppid: 6510, pid: 6513, t: 0.0680s scanners: attach: 1.3.1 clamav: 0.96/m:52 Received: from unknown (HELO linux-dev4.lsces.org.uk) (lester@rainbowdigitalmedia.org.uk@81.138.11.136) by mail4.serversure.net with ESMTPA; 17 Feb 2014 08:23:44 -0000 Message-ID: <5301C7F7.2060203@lsces.co.uk> Date: Mon, 17 Feb 2014 08:27:35 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0 SeaMonkey/2.24 MIME-Version: 1.0 To: PHP internals Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: PHP 6, part 2 ... Unicode From: lester@lsces.co.uk (Lester Caine) 1/ Use of a fast and lite UTF-8 procession libraries for all core string operations When you start looking at the ICU UTF-8 handling then it does provide a stable base. It is conversion to UTF-8 which seems to get in the way. But now that I have had time to dig deeper, UCONFIG_NO_CONVERSION is the key. How many people today only ACTUALLY handle UTF-8 in their content? Conversion is only required when one finds material that is not already UTF-8 ... 2/ Use of intl for any advanced operations, localization or conversion? Simply follows on from 1/ Certainly localizations like currency, timezone and calendar management should use a common base, but I get the impression that 'locale' is still somewhat messy when it comes to 'collations'? This is one of those areas where the language returned in a browser header may or may not give the right information, just as currently we can't guarantee to correctly guess the timezone? But in a way I view this as secondary to the UTF-8 debate? Even just fulldefault support for UTF-8 in the core is better than the current piecemeal support? 3/ Support of UTF-8 for the language itself, as PHP currently allows non ascii encoding in scripts, I would recommend to stop supporting it, except in comments. Million dollar question? If ALL one is processing in UTF-8 is the content, then we are already there? Just use mbstring to manage the content and as Pierre says - stop using UTF-8 in identifiers and the like? While that would not affect me one bit as I don't need anything other than ASCII for my own identifiers, I think it is just this area that other users are looking to upgrade so that they can make scripts more readable in their own languages? I really am getting fed up with the website structure ... having to manually change domain in the address line so you can search documentation when working through the wiki or bugs is hopeless. There should be one search engine covering the whole site - and NOT Google since that does not work at all when you want results in one language! Another example of 'locale' simply not working? -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk