Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:72613 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 75748 invoked from network); 14 Feb 2014 19:35:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Feb 2014 19:35:15 -0000 Authentication-Results: pb1.pair.com header.from=rasmus@lerdorf.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=rasmus@lerdorf.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lerdorf.com from 209.85.216.52 cause and error) X-PHP-List-Original-Sender: rasmus@lerdorf.com X-Host-Fingerprint: 209.85.216.52 mail-qa0-f52.google.com Received: from [209.85.216.52] ([209.85.216.52:54162] helo=mail-qa0-f52.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 3C/98-34645-2FF6EF25 for ; Fri, 14 Feb 2014 14:35:14 -0500 Received: by mail-qa0-f52.google.com with SMTP id j15so18580510qaq.25 for ; Fri, 14 Feb 2014 11:35:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=HVI4Lo+6SpiEEwVvtCyqew/gtXjrdy7atLdNC53FTdA=; b=gE5ml7n8v/JoDYFnE6hHhkgnzKpWzpOET6SvC8WtS7R/gw4a9ZGtIWsIYd7tcYubbP uZGB8/xsjy/XxW/skyMIqBnqT0Rj5NQnkWkG6RaJDoA4QPazeS7AH+6PqjQeEM0kzqGH Urti19Nq8i2AtHwlrru3BnV9h7hIjKV/zQk/xkFD69HGBhDr422htzfi+xKJkSABk1LU G5AGOynJqvz/dTBhPSK6dTkizLrsH+GlAaQotnkMkDYNGo1WphkCYK1/eq1jU+nkhDwZ v4Ot+0fsL8sZubopjCFvu0sx4pazhr+UZWNQ9kNBckaNtv74c0xYVFbIGRMP03nuIWWp wLew== X-Gm-Message-State: ALoCoQmDsv3IHGtiTMBSBbhc6D9VR3o7s6g+TwnpyGcAbsFmlZBuExtvZOr5fQ5IUXD6nsW40/Ef X-Received: by 10.140.38.75 with SMTP id s69mr15366598qgs.62.1392406511770; Fri, 14 Feb 2014 11:35:11 -0800 (PST) Received: from [10.8.209.69] ([65.202.74.2]) by mx.google.com with ESMTPSA id l40sm9090207qga.13.2014.02.14.11.35.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 14 Feb 2014 11:35:11 -0800 (PST) Message-ID: <52FE6FEA.5050204@lerdorf.com> Date: Fri, 14 Feb 2014 14:35:06 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Rowan Collins , internals@lists.php.net References: <50100EC8.3040102@ajf.me> <52FDF7BC.8050408@lsces.co.uk> <52FE46D2.4060903@gmail.com> In-Reply-To: <52FE46D2.4060903@gmail.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] PHP6 wiki page From: rasmus@lerdorf.com (Rasmus Lerdorf) On 02/14/2014 11:39 AM, Rowan Collins wrote: > Lester Caine wrote (on 14/02/2014): >> But more fundamentally I don't think there was agreement on whether we >> simply standardise on unicode in the core, or allow a single byte >> mode? 8 years on, I feel that the amount of utf8 material that is >> floating around, the easiest route IS unicode only? > > The question is not whether to be "Unicode only", it's *how* to > implement Unicode. It's not just a case of making all your strings > wider, every function that manipulates a string in any way has to be > thought through, and every input and output has to be converted to/from > whatever encoding is chosen as the internal implementation. > > While updating the Wikipedia article [1] I came across this slide set > [2], which has a fairly decent explanation of the issues and why the > previous implementation was abandoned. > > If somebody comes up with an implementation proposal of Unicode strings, > whether to have a mode that doesn't use it can be discussed, but right > now there doesn't seem to be such a live proposal. What we really need is an awesome small and fast Unicode library that does everything ICU does but faster and in less code while using UTF-8 as its internal storage so we don't have to convert on each and every operation. There are a ton of non-obvious things beyond simple string manipulation. String collation alone is massively complicated, for example. -Rasmus