Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:78271 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 82989 invoked from network); 23 Oct 2014 13:44:51 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 23 Oct 2014 13:44:51 -0000 Authentication-Results: pb1.pair.com smtp.mail=rowan.collins@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=rowan.collins@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.212.174 as permitted sender) X-PHP-List-Original-Sender: rowan.collins@gmail.com X-Host-Fingerprint: 209.85.212.174 mail-wi0-f174.google.com Received: from [209.85.212.174] ([209.85.212.174:51659] helo=mail-wi0-f174.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D2/A6-41150-25609445 for ; Thu, 23 Oct 2014 09:44:51 -0400 Received: by mail-wi0-f174.google.com with SMTP id q5so787823wiv.1 for ; Thu, 23 Oct 2014 06:44:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=8nY9WEfao/czpWuGfsk6NrEzwhT8WBByq9UshmFPMCA=; b=GKMDadrfs+1pLCnMFxUzcmNLcN4M5E6bDT+3ywn3uR5BXyH4k0BKqepmF9DLQ0914U Dt8XHkqyTg7kOCxJIuO9g+g/kYDi9dyrxuC/2i+zSwpitWQXYarGCfRKD0NywbLVHSDJ bE2n2RkM+S6uSAzsYvheOHSfQ4UlByOFCvlGZ+MCCbwUsLKCiQlQjQfezfUhLsmzz0lq wxR0HmYaBNwJMddWOJvpbn07/EQMzyf1NwWlZsVt0bzAZj7MXHHRPbpYEwo6Ar7gyIQi udKn2J5nTqvTYVqrMk4SEduuZOr+G/KFstDfBNKDXHAogAee69xH2nJS/c8sf+eGZFYV GFcw== X-Received: by 10.180.101.39 with SMTP id fd7mr44476911wib.55.1414071888376; Thu, 23 Oct 2014 06:44:48 -0700 (PDT) Received: from [192.168.0.177] ([62.189.198.114]) by mx.google.com with ESMTPSA id bj7sm2211036wjc.33.2014.10.23.06.44.47 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 23 Oct 2014 06:44:47 -0700 (PDT) Message-ID: <5449064E.5020303@gmail.com> Date: Thu, 23 Oct 2014 14:44:46 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: internals@lists.php.net References: <1413875212.2624.3.camel@localhost.localdomain> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [RFC] UString From: rowan.collins@gmail.com (Rowan Collins) Dmitry Stogov wrote on 21/10/2014 10:01: > The "right" approach, would be extending zend_string with "encoding" and > then adopting near all functions working with zend_string to take > "encoding" into account. But, of course, this is going to lead to much more > complicated solution (with some slowdown). Isn't that kind of what ext/mbstring does? I think that treating Unicode as nothing more than an encoding, and trying to hide all its complexity from the user, is not particularly wise. Unicode isn't just "ASCII, but bigger", so keeping the same API but making the implementation "work" with more characters isn't really "Unicode support". For instance, what does "allowing Unicode strings as array keys" actually mean? We already allow pretty much any sequence of bytes as an array key, so what we're actually talking about is that array-handling functions should be somehow "Unicode aware". In the case of sorting functions, that means a mechanism for selecting a collation, even if you know how the strings are encoded. There are a handful of operations which have an obvious meaning under Unicode - strtoupper(), for instance. It might be nice if those worked transparently with UStrings, but I don't think that really constitutes "complete Unicode support" either. I think we're going to keep going round in circles unless we can really pin down what it means for a language to "support Unicode". -- Rowan Collins [IMSoP]