Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73059 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 55845 invoked from network); 11 Mar 2014 10:30:56 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 11 Mar 2014 10:30:56 -0000 Authentication-Results: pb1.pair.com smtp.mail=lester@lsces.co.uk; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=lester@lsces.co.uk; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 217.147.176.204 cause and error) X-PHP-List-Original-Sender: lester@lsces.co.uk X-Host-Fingerprint: 217.147.176.204 mail4.serversure.net Linux 2.6 Received: from [217.147.176.204] ([217.147.176.204:47583] helo=mail4.serversure.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 3A/53-29501-FD5EE135 for ; Tue, 11 Mar 2014 05:30:56 -0500 Received: (qmail 7570 invoked by uid 89); 11 Mar 2014 10:30:52 -0000 Received: by simscan 1.3.1 ppid: 7558, pid: 7565, t: 0.0803s scanners: attach: 1.3.1 clamav: 0.96/m:52 Received: from unknown (HELO linux-dev4.lsces.org.uk) (lester@rainbowdigitalmedia.org.uk@81.138.11.136) by mail4.serversure.net with ESMTPA; 11 Mar 2014 10:30:52 -0000 Message-ID: <531EE602.3090207@lsces.co.uk> Date: Tue, 11 Mar 2014 10:31:30 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0 SeaMonkey/2.24 MIME-Version: 1.0 To: PHP Internals Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Unicode strings? From: lester@lsces.co.uk (Lester Caine) I'm slowly working through a long list of things relating to unicode strings trying to work out just where the main problems are. The very first problem I hit is ICU's limitation to 32bit string lengths. How does the switch to 64bit string length on 64 bit platforms impinge on this. While I can see the advantage of this particular change, would that also now require our own version of ICU capable of also handling longer strings? This probably falls out in the wash of my next point ... Currently strings are simply strings? I'm sure we have already had this discussion, and it will be necessary to switch from simple strings to a string object which can handle the intricacies of unicode? Pierre - I presume that it's this distinction that is where I'm crossing over between variable and similar names which just remain as simple stings while 'data' that is unicode is provided by sting objects. These then need to work nicely with areas that expect a simple string? Where a string object is returned an ASCII version will be created when a simple string is necessary? The 'leak' of unicode currently into name strings is simply that there is nothing currently stopping them from storing UTF-8? That this works is more by luck than design, but results in subtle problems with case conversion and the like which does not expect unicode strings? BUT people can currently use any format data in a string even one using a 64 bit pointer as long as it does not go through a path that does expect ASCII? If the simple string is isolated from UTF-8 and unicode is kept to it's own data type such as an improved integrated mbstring package then this make a suitable 'half way' house for PHP6? I don't NEED unicode variable names, but I can see that this would be a nice to have in non-English speaking countries. In much the same way we provide translated versions of web pages, I can even see the advantage of function name aliases in different languages as having more relevance that simply changing the current English names for picky reasons, but that is not likely to happen in my lifetime! Perhaps PHP10 :) -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk