Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73062 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 61230 invoked from network); 11 Mar 2014 11:06:25 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 11 Mar 2014 11:06:25 -0000 Authentication-Results: pb1.pair.com smtp.mail=cryptocompress@googlemail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=cryptocompress@googlemail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain googlemail.com designates 209.85.214.54 as permitted sender) X-PHP-List-Original-Sender: cryptocompress@googlemail.com X-Host-Fingerprint: 209.85.214.54 mail-bk0-f54.google.com Received: from [209.85.214.54] ([209.85.214.54:55567] helo=mail-bk0-f54.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 8A/74-29501-03EEE135 for ; Tue, 11 Mar 2014 06:06:25 -0500 Received: by mail-bk0-f54.google.com with SMTP id 6so1288554bkj.41 for ; Tue, 11 Mar 2014 04:06:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=t5oTWZgLOTEXp9yo5Og4qAsK4vv5oVsEqlnj+qnSH+s=; b=wiRl9OFyeGq8xYiFWCU92dZbqyLsreKuv7bVetD+Xh7W3KAyMcIv+qUU6ZET8JaXBX pI3LtE9S1M3vC9GtzLJlidh0vltCNguwYRd4Vwub37Q07UDbftXGqvFw/UUfrL1XiprN 7pDHOzb/CMRp1vWZW5S8QuMRy31RG5v9pvTfqNP+gVVTuVB3BGA2qbhvRGjJH37eKjxk WC2QaC4awZHzaopH6sWZ5ZWX1K50mQnUiham0kdZ1zjV/+5HVgoATz37+Xz1pzl8ovHL +PAO7yNEhyKzasnK3vv9m7i+TO0nijfPPk/uGaY41kjZyvK59mABLzjRaQFjjr8i1hJt Vvug== X-Received: by 10.205.6.71 with SMTP id oj7mr702384bkb.37.1394535981859; Tue, 11 Mar 2014 04:06:21 -0700 (PDT) Received: from [192.168.1.115] (mnch-4d04eb39.pool.mediaWays.net. [77.4.235.57]) by mx.google.com with ESMTPSA id f11sm14836732bkj.6.2014.03.11.04.06.20 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 11 Mar 2014 04:06:20 -0700 (PDT) Message-ID: <531EEE2A.2000602@googlemail.com> Date: Tue, 11 Mar 2014 12:06:18 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Lester Caine CC: PHP Developers Mailing List References: <531EE602.3090207@lsces.co.uk> In-Reply-To: <531EE602.3090207@lsces.co.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Unicode strings? From: cryptocompress@googlemail.com (Crypto Compress) Hi, Am 11.03.2014 11:31, schrieb Lester Caine: > I'm slowly working through a long list of things relating to unicode > strings trying to work out just where the main problems are. > > The very first problem I hit is ICU's limitation to 32bit string > lengths. How does the switch to 64bit string length on 64 bit > platforms impinge on this. While I can see the advantage of this > particular change, would that also now require our own version of ICU > capable of also handling longer strings? This probably falls out in > the wash of my next point ... Where have you found this information? Can you please provide source for this? > > Currently strings are simply strings? I'm sure we have already had > this discussion, and it will be necessary to switch from simple > strings to a string object which can handle the intricacies of unicode? Yes, currently we have so called binary strings (simple bytes, 8 bits). No, we should not create an string-object to handle all intricacies of unicode. > > Pierre - I presume that it's this distinction that is where I'm > crossing over between variable and similar names which just remain as > simple stings while 'data' that is unicode is provided by sting > objects. These then need to work nicely with areas that expect a > simple string? Where a string object is returned an ASCII version will > be created when a simple string is necessary? > > The 'leak' of unicode currently into name strings is simply that there > is nothing currently stopping them from storing UTF-8? That this works > is more by luck than design, but results in subtle problems with case > conversion and the like which does not expect unicode strings? BUT > people can currently use any format data in a string even one using a > 64 bit pointer as long as it does not go through a path that does > expect ASCII? > > If the simple string is isolated from UTF-8 and unicode is kept to > it's own data type such as an improved integrated mbstring package > then this make a suitable 'half way' house for PHP6? > > I don't NEED unicode variable names, but I can see that this would be > a nice to have in non-English speaking countries. In much the same way > we provide translated versions of web pages, I can even see the > advantage of function name aliases in different languages as having > more relevance that simply changing the current English names for > picky reasons, but that is not likely to happen in my lifetime! > Perhaps PHP10 :) > We should not discuss this till Pierre/we clarified the core problems. cryptocompress