Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:73081
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 217.147.176.204 cause and error)
Message-ID: <532049BB.2080605@lsces.co.uk>
Date: Wed, 12 Mar 2014 11:49:15 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0 SeaMonkey/2.24
MIME-Version: 1.0
To: PHP Developers Mailing List <internals@lists.php.net>
References: <531EE602.3090207@lsces.co.uk>	<531EEE2A.2000602@googlemail.com>	<531F0146.5010701@lsces.co.uk>	<53202DC5.4010306@googlemail.com>	<532033E1.60602@lsces.co.uk>	<53203687.7090405@googlemail.com>	<532037F4.6020204@googlemail.com> <CAEZPtU5G41OKb2FQF4ajKgszATqtwxL9E_k4Zc78RZGL8B4qnQ@mail.gmail.com>
In-Reply-To: <CAEZPtU5G41OKb2FQF4ajKgszATqtwxL9E_k4Zc78RZGL8B4qnQ@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [PHP-DEV] Unicode strings?
From: lester@lsces.co.uk (Lester Caine)

Pierre Joye wrote:
>>> ICU Text Access allows other formats, such as UTF-8 or non-contiguous
>>> >>UTF-16 strings, to be placed in a UText wrapper and then passed to ICU
>>> >>services.
> This is running in circle and does not really help to move forwards...
>
> Lester has a point with the UTF-8 testing. I am almost done with the
> tests code and will publish it soonish.
>
> Also I do not get your argument earlier in this discussion saying that
> we should not implement objects or pseudo-objects for unicode support.
> where is the problem? It can work with existing functions as well,
> does not break BC, does not introduce weird syntax that prevents code
> from running in 5.x and 6.x (u"foo" will f.e.). The more I look at it,
> the more I think it is the way.

I think we are both heading to the same point from different ends Pierre? That 
is as far as handling unicode data is concerned. It's not so much running in a 
circle as the chicken and egg. Select any 3 out of four options to get to the 
final answer?

I'm back on windows platform looking at problems there and I had forgotten just 
how badly Borland C++ handles widestring, but running ICU there and stripping 
that code will work for me! I'm not sure that a library in the middle is needed, 
JUST some pseudo-objects to smooth the transition? ICU running in UTF-8 mode 
does seem to be the answer, but while I can test C++ builds I'm just not into 
the PHP codebase enough to do the sort of testing that is needed :( Conversion 
to C++ is something I could deal with ...

Unicode variable names ARE secondary, but if the handling of unicode works as 
well as it seems to be for me then it may be an option that can be considered.

-- 
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk