Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73068 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 94187 invoked from network); 11 Mar 2014 21:55:49 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 11 Mar 2014 21:55:49 -0000 Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.42 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 209.85.215.42 mail-la0-f42.google.com Received: from [209.85.215.42] ([209.85.215.42:42712] helo=mail-la0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D0/C0-22021-4668F135 for ; Tue, 11 Mar 2014 16:55:48 -0500 Received: by mail-la0-f42.google.com with SMTP id ec20so6166977lab.1 for ; Tue, 11 Mar 2014 14:55:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=WLOtoJeyNcckJ/bUsmRZTNtDZRSkVmsGh5Mzs71Aqv0=; b=NdL934KloSK9JpZRhK1SzSDwPQAmv8RqJzpqz41vZ83kNHvEsbx/ujOfkst3t1WCdy BxLdBD9F4EKIXbrEtXcsi96Y1rcLSonGPAA6PL99oA3lV8htu9FS+NQhRbNllEKCknyG J9gu/W4aZhqyPk7iPzQDUhDLzAlG594v2lYPXoFAUHJ26/VwJGf3hBXvwFW2ncMASemP 9xjriVSnR5cAtBrexCSVE/Mhsw5Sos55nR/73AEF5cZHRuiiPqtuf06JQWqPAA2eDBxM ZjGdhmzT4PYWIhkDUakU0jbjgwsNbgPLsHeyOiaz7IOoTNTMadxrnXoeU/gYuZkuMfs7 4AlA== X-Received: by 10.112.150.233 with SMTP id ul9mr14482302lbb.2.1394574944617; Tue, 11 Mar 2014 14:55:44 -0700 (PDT) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.112.205.73 with HTTP; Tue, 11 Mar 2014 14:55:04 -0700 (PDT) In-Reply-To: <531EE602.3090207@lsces.co.uk> References: <531EE602.3090207@lsces.co.uk> Date: Wed, 12 Mar 2014 06:55:04 +0900 X-Google-Sender-Auth: 7KuFyaZhU34noTE9dPlYIKDmLyU Message-ID: To: Lester Caine Cc: PHP Internals Content-Type: multipart/alternative; boundary=047d7b342f6c2fcc1a04f45bc88f Subject: Re: [PHP-DEV] Unicode strings? From: yohgaki@ohgaki.net (Yasuo Ohgaki) --047d7b342f6c2fcc1a04f45bc88f Content-Type: text/plain; charset=UTF-8 Hi all, Just FYI. http://www.w3.org/TR/encoding/ This would be the list of legacy encoding for HTML5. Regards -- Yasuo Ohgaki yohgaki@ohgaki.net On Tue, Mar 11, 2014 at 7:31 PM, Lester Caine wrote: > I'm slowly working through a long list of things relating to unicode > strings trying to work out just where the main problems are. > > The very first problem I hit is ICU's limitation to 32bit string lengths. > How does the switch to 64bit string length on 64 bit platforms impinge on > this. While I can see the advantage of this particular change, would that > also now require our own version of ICU capable of also handling longer > strings? This probably falls out in the wash of my next point ... > > Currently strings are simply strings? I'm sure we have already had this > discussion, and it will be necessary to switch from simple strings to a > string object which can handle the intricacies of unicode? > > Pierre - I presume that it's this distinction that is where I'm crossing > over between variable and similar names which just remain as simple stings > while 'data' that is unicode is provided by sting objects. These then need > to work nicely with areas that expect a simple string? Where a string > object is returned an ASCII version will be created when a simple string is > necessary? > > The 'leak' of unicode currently into name strings is simply that there is > nothing currently stopping them from storing UTF-8? That this works is more > by luck than design, but results in subtle problems with case conversion > and the like which does not expect unicode strings? BUT people can > currently use any format data in a string even one using a 64 bit pointer > as long as it does not go through a path that does expect ASCII? > > If the simple string is isolated from UTF-8 and unicode is kept to it's > own data type such as an improved integrated mbstring package then this > make a suitable 'half way' house for PHP6? > > I don't NEED unicode variable names, but I can see that this would be a > nice to have in non-English speaking countries. In much the same way we > provide translated versions of web pages, I can even see the advantage of > function name aliases in different languages as having more relevance that > simply changing the current English names for picky reasons, but that is > not likely to happen in my lifetime! Perhaps PHP10 :) > > -- > Lester Caine - G8HFL > ----------------------------- > Contact - http://lsces.co.uk/wiki/?page=contact > L.S.Caine Electronic Services - http://lsces.co.uk > EnquirySolve - http://enquirysolve.com/ > Model Engineers Digital Workshop - http://medw.co.uk > Rainbow Digital Media - http://rainbowdigitalmedia.co.uk > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > --047d7b342f6c2fcc1a04f45bc88f--