Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:78197 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 28140 invoked from network); 21 Oct 2014 11:31:19 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Oct 2014 11:31:19 -0000 Authentication-Results: pb1.pair.com header.from=lester@lsces.co.uk; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=lester@lsces.co.uk; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 217.147.176.214 cause and error) X-PHP-List-Original-Sender: lester@lsces.co.uk X-Host-Fingerprint: 217.147.176.214 mail4-2.serversure.net Linux 2.6 Received: from [217.147.176.214] ([217.147.176.214:44753] helo=mail4.serversure.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id AD/15-02077-40446445 for ; Tue, 21 Oct 2014 07:31:18 -0400 Received: (qmail 15818 invoked by uid 89); 21 Oct 2014 11:31:13 -0000 Received: by simscan 1.3.1 ppid: 15807, pid: 15814, t: 0.2406s scanners: attach: 1.3.1 clamav: 0.96/m:52/d:10677 Received: from unknown (HELO ?10.0.0.8?) (lester@rainbowdigitalmedia.org.uk@86.178.187.131) by mail4.serversure.net with ESMTPA; 21 Oct 2014 11:31:13 -0000 Message-ID: <54464400.2080602@lsces.co.uk> Date: Tue, 21 Oct 2014 12:31:12 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0 MIME-Version: 1.0 To: internals@lists.php.net CC: rowan.collins@gmail.com References: <1413875212.2624.3.camel@localhost.localdomain> <1413883549.2624.22.camel@localhost.localdomain> <54463347.6020002@lsces.co.uk> <54463F65.8070408@gmail.com> In-Reply-To: <54463F65.8070408@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [RFC] UString From: lester@lsces.co.uk (Lester Caine) On 21/10/14 12:11, Rowan Collins wrote: > Lester Caine wrote (on 21/10/2014): >> If we are going down the root of keeping PHP7 as ascii only in the core, >> then ustring probably makes sense, but it does not address many of the >> areas where unicode is really needed. > > Just a quick point: most of the core is not ASCII. PHP strings are byte > strings, completely divorced from any encoding. A few native functions > assume ISO8859-1 (or possibly Windows CP1252), but mostly they just > juggle which ever bytes you give them. > > The main exception I can think of is that numbers are often handled > specially, with digits and separators as defined by ASCII. But since > we're talking UTF-8, that doesn't need to change. Pierre had proposed restricting that to ascii as a way of addressing the inconsistencies that arise because some areas do not currently make a distinction. >> Handling unicode content outside >> the core is working reasonably at the moment, it is the problems such as >> using unicode keys for arrays which is the main area where unicoe is >> needed in PHP7 and so a more embedded handling is needed which may cut >> across yet another content wrapper? > > I do think this is an important thing to consider, though. If this > extension is genuinely just meant as a more modern and more performant > way of doing things which mbstring and intl can already do, that needs > to be clear in the way it's documented and publicised. If this gets > publicised as "better Unicode support", users are naturally going to > expect UString objects to start appearing in core, and in other > extensions, and be disappointed that it's still just a toolbox for their > own string handling. This is where a proper discussion on just what is trying to be achieved is important, before discussing tangents? -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk