Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:78093 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 69924 invoked from network); 15 Oct 2014 10:02:20 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 15 Oct 2014 10:02:20 -0000 Authentication-Results: pb1.pair.com header.from=aleksey.tulinov@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=aleksey.tulinov@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.42 as permitted sender) X-PHP-List-Original-Sender: aleksey.tulinov@gmail.com X-Host-Fingerprint: 74.125.82.42 mail-wg0-f42.google.com Received: from [74.125.82.42] ([74.125.82.42:65443] helo=mail-wg0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 28/00-03764-C264E345 for ; Wed, 15 Oct 2014 06:02:20 -0400 Received: by mail-wg0-f42.google.com with SMTP id z12so944780wgg.1 for ; Wed, 15 Oct 2014 03:02:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=QpTPMMOjRdrskV0JRjGb96VdVvqc88aOFcqRburQysA=; b=y27C9lsg0DJC14IN9krRoPC/7sEbTSsjBWpaMXhMzJIi7+NdybSa7ATNujlygiT/A8 Ot+mrGcbQkfE8PL1dJ7uQRusDldedePp4p8hZ0egd6GaRRiWYvk4FaFelKX3N564e3Qm idMsvXM+ndCYT3kAXoSW1QXuX98nU5GUbL5tEOTXMKuZHvCdnP6Yz7DMJs1YDtRnFEUC 9h0RGn+mAOzqCsCME1LadWu1AmwvgNJdADcYj+zCtQ8xyc9t0Z6djV+GfT/34LXMP1i2 rag5/7aIk9fBYLXVlV96PvxCzGKRtsbCRvZ67ggMli61w/ca9eUun3Y5Gz4jYIA3r2R+ JBqw== X-Received: by 10.180.101.201 with SMTP id fi9mr10971980wib.52.1413367336996; Wed, 15 Oct 2014 03:02:16 -0700 (PDT) Received: from [172.16.0.137] ([195.177.73.61]) by mx.google.com with ESMTPSA id bo14sm18782740wib.13.2014.10.15.03.02.15 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Oct 2014 03:02:16 -0700 (PDT) Message-ID: <543E4626.3000102@gmail.com> Date: Wed, 15 Oct 2014 13:02:14 +0300 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: Rowan Collins , internals@lists.php.net References: <543CE705.7030203@gmail.com> <4575A816-43F4-462D-8150-A2D35516D914@ajf.me> <543D64E5.8000706@gmail.com> <543D8528.1060605@gmail.com> <543D8FFA.8080408@gmail.com> <543DAA29.8040701@gmail.com> <68E97150-8840-4C31-B271-3E8C8BE933DB@gmail.com> In-Reply-To: <68E97150-8840-4C31-B271-3E8C8BE933DB@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] Unicode support From: aleksey.tulinov@gmail.com (Aleksey Tulinov) On 15/10/14 10:04, Rowan Collins wrote: Rowan, > As I said at the top of my first post, the important thing is to capture > what those requirements actually are. Just as you'd choose what array > functions were needed if you were adding "array support" to a language. > I'm sorry for not making myself clear. What i'm essentially saying is that i think "noël" test is synthetic and impractical, it's also solvable with requirement of NFC strings at input and this is not implementation defect. I also believe that Hangul is most likely to be precomposed and will work alright. And i have another opinion on UTF-8 shortest-form. This is my personal opinion of course. That aside. I think requirements is what i was asking about, i'm assuming that your standpoint is that string modification routines are at least required to take into account entire characters, not only code points. Am i correct? What is confusing me is that i think you're seeing it as a major implementation defect. To avoid arguable implementations, i've made short example in Java: System.out.println(new StringBuffer("noël").reverse().toString()); It does produce string "l̈eon" as i would expect. Precomposed "noël" also works as i would expect producing string "lëon". What do you think, is this implementation issue or solely requirements issue?