Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73081 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 37663 invoked from network); 12 Mar 2014 11:48:37 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Mar 2014 11:48:37 -0000 Authentication-Results: pb1.pair.com header.from=lester@lsces.co.uk; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=lester@lsces.co.uk; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 217.147.176.204 cause and error) X-PHP-List-Original-Sender: lester@lsces.co.uk X-Host-Fingerprint: 217.147.176.204 mail4.serversure.net Linux 2.6 Received: from [217.147.176.204] ([217.147.176.204:44388] helo=mail4.serversure.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BF/D3-17005-39940235 for ; Wed, 12 Mar 2014 06:48:36 -0500 Received: (qmail 4681 invoked by uid 89); 12 Mar 2014 11:48:32 -0000 Received: by simscan 1.3.1 ppid: 4675, pid: 4678, t: 0.0772s scanners: attach: 1.3.1 clamav: 0.96/m:52 Received: from unknown (HELO linux-dev4.lsces.org.uk) (lester@rainbowdigitalmedia.org.uk@81.138.11.136) by mail4.serversure.net with ESMTPA; 12 Mar 2014 11:48:32 -0000 Message-ID: <532049BB.2080605@lsces.co.uk> Date: Wed, 12 Mar 2014 11:49:15 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0 SeaMonkey/2.24 MIME-Version: 1.0 To: PHP Developers Mailing List References: <531EE602.3090207@lsces.co.uk> <531EEE2A.2000602@googlemail.com> <531F0146.5010701@lsces.co.uk> <53202DC5.4010306@googlemail.com> <532033E1.60602@lsces.co.uk> <53203687.7090405@googlemail.com> <532037F4.6020204@googlemail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Unicode strings? From: lester@lsces.co.uk (Lester Caine) Pierre Joye wrote: >>> ICU Text Access allows other formats, such as UTF-8 or non-contiguous >>> >>UTF-16 strings, to be placed in a UText wrapper and then passed to ICU >>> >>services. > This is running in circle and does not really help to move forwards... > > Lester has a point with the UTF-8 testing. I am almost done with the > tests code and will publish it soonish. > > Also I do not get your argument earlier in this discussion saying that > we should not implement objects or pseudo-objects for unicode support. > where is the problem? It can work with existing functions as well, > does not break BC, does not introduce weird syntax that prevents code > from running in 5.x and 6.x (u"foo" will f.e.). The more I look at it, > the more I think it is the way. I think we are both heading to the same point from different ends Pierre? That is as far as handling unicode data is concerned. It's not so much running in a circle as the chicken and egg. Select any 3 out of four options to get to the final answer? I'm back on windows platform looking at problems there and I had forgotten just how badly Borland C++ handles widestring, but running ICU there and stripping that code will work for me! I'm not sure that a library in the middle is needed, JUST some pseudo-objects to smooth the transition? ICU running in UTF-8 mode does seem to be the answer, but while I can test C++ builds I'm just not into the PHP codebase enough to do the sort of testing that is needed :( Conversion to C++ is something I could deal with ... Unicode variable names ARE secondary, but if the handling of unicode works as well as it seems to be for me then it may be an option that can be considered. -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk