Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:78196 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 26139 invoked from network); 21 Oct 2014 11:11:43 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Oct 2014 11:11:43 -0000 Authentication-Results: pb1.pair.com header.from=rowan.collins@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=rowan.collins@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.41 as permitted sender) X-PHP-List-Original-Sender: rowan.collins@gmail.com X-Host-Fingerprint: 74.125.82.41 mail-wg0-f41.google.com Received: from [74.125.82.41] ([74.125.82.41:39673] helo=mail-wg0-f41.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 4F/B4-02077-E6F36445 for ; Tue, 21 Oct 2014 07:11:42 -0400 Received: by mail-wg0-f41.google.com with SMTP id b13so1077584wgh.24 for ; Tue, 21 Oct 2014 04:11:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=rqAQCN+998dpa5n+QuEbTpTofQs1Kqb7CA/fws7jeuw=; b=Gm+vLeYnI1yTR6uAmXYTkMsPsaQHrTDT3R3fEnZ71QF6Ltb0Hb1+vetmjtzEJXIjpV 1usIVfc2kYe5ZSPZXFPXDcR/rdHXtOEuqafXNsXCiBZfUYQ/AAt9wIdo9kdr+pdZS6fw fSSTGnpTHQV70JDFWYH+5jNsjjQ2gXnFUDyatDN4g8qKo5CXSjVZsi04ujBwZdIgIiIK xtnGrD2XWhS/AsRz3sXxpSgJn7pAkkvCRMfQFqOUWovrTtJnQqW8gWKnwtMiX80G2UuR vVHgvyPqAjTWcsv4awCnPLvr1xqijE7HtooOOJR+I99IkSMpwX+d0I64KHrYR/OqdDIU 2Dzg== X-Received: by 10.181.8.98 with SMTP id dj2mr28058617wid.70.1413889895577; Tue, 21 Oct 2014 04:11:35 -0700 (PDT) Received: from [192.168.0.177] ([62.189.198.114]) by mx.google.com with ESMTPSA id wk5sm15036112wjb.12.2014.10.21.04.11.34 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 21 Oct 2014 04:11:34 -0700 (PDT) Message-ID: <54463F65.8070408@gmail.com> Date: Tue, 21 Oct 2014 12:11:33 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: internals@lists.php.net References: <1413875212.2624.3.camel@localhost.localdomain> <1413883549.2624.22.camel@localhost.localdomain> <54463347.6020002@lsces.co.uk> In-Reply-To: <54463347.6020002@lsces.co.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [RFC] UString From: rowan.collins@gmail.com (Rowan Collins) Lester Caine wrote (on 21/10/2014): > If we are going down the root of keeping PHP7 as ascii only in the core, > then ustring probably makes sense, but it does not address many of the > areas where unicode is really needed. Just a quick point: most of the core is not ASCII. PHP strings are byte strings, completely divorced from any encoding. A few native functions assume ISO8859-1 (or possibly Windows CP1252), but mostly they just juggle which ever bytes you give them. The main exception I can think of is that numbers are often handled specially, with digits and separators as defined by ASCII. But since we're talking UTF-8, that doesn't need to change. > Handling unicode content outside > the core is working reasonably at the moment, it is the problems such as > using unicode keys for arrays which is the main area where unicoe is > needed in PHP7 and so a more embedded handling is needed which may cut > across yet another content wrapper? I do think this is an important thing to consider, though. If this extension is genuinely just meant as a more modern and more performant way of doing things which mbstring and intl can already do, that needs to be clear in the way it's documented and publicised. If this gets publicised as "better Unicode support", users are naturally going to expect UString objects to start appearing in core, and in other extensions, and be disappointed that it's still just a toolbox for their own string handling. -- Rowan Collins [IMSoP]