Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:78182 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 137 invoked from network); 21 Oct 2014 08:48:58 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Oct 2014 08:48:58 -0000 Authentication-Results: pb1.pair.com smtp.mail=nicolas.grekas@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=nicolas.grekas@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.220.178 as permitted sender) X-PHP-List-Original-Sender: nicolas.grekas@gmail.com X-Host-Fingerprint: 209.85.220.178 mail-vc0-f178.google.com Received: from [209.85.220.178] ([209.85.220.178:56250] helo=mail-vc0-f178.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 63/E2-19409-9FD16445 for ; Tue, 21 Oct 2014 04:48:57 -0400 Received: by mail-vc0-f178.google.com with SMTP id hq12so119059vcb.23 for ; Tue, 21 Oct 2014 01:48:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=/3Z6xoEFRblsU3/CJL9isoyWv3fEAZ9ZR3LALodBp7s=; b=Xph6l5xsuTiFCFj8cKd9rYwv2GevJWE4l1gurYnEQpUNfKzbtycofPLODChWw2+QLk 7sPrah2B7cQuwqJc5E4gV/3gQ/3cei2Ja/m+ZPP47uW7Z1ss0VkQdq/39eOpFt/4zmiQ 9N0bznzmvY18dxd6EcEMS2qAyQ3y7djTU8llIpQC2wC6ZVFu7ROD8JPrb2rtxuaoyfwi t29Wa18HbSkHJuWtOUx32m33wz9Fj8Y/W9dJoLR2EMouwPBTzljGA535pNaWQyyR4vn6 Oeq80osmPvyca9aK6ToIS7RkFZCi036Hxw/s0UOJZSP16K4eHrGNAsWpzmSqg7uqwfeo PGJw== X-Received: by 10.52.30.232 with SMTP id v8mr23522688vdh.24.1413881334314; Tue, 21 Oct 2014 01:48:54 -0700 (PDT) MIME-Version: 1.0 Sender: nicolas.grekas@gmail.com Received: by 10.52.29.78 with HTTP; Tue, 21 Oct 2014 01:48:33 -0700 (PDT) In-Reply-To: <1413875212.2624.3.camel@localhost.localdomain> References: <1413875212.2624.3.camel@localhost.localdomain> Date: Tue, 21 Oct 2014 10:48:33 +0200 X-Google-Sender-Auth: qjHef7BmNASNYf1IAKxIZ3hikM4 Message-ID: To: Joe Watkins Cc: "internals@lists.php.net" Content-Type: multipart/alternative; boundary=bcaec51d2d96afbfa40505eae64d Subject: Re: [PHP-DEV] [RFC] UString From: nicolas.grekas+php@gmail.com (Nicolas Grekas) --bcaec51d2d96afbfa40505eae64d Content-Type: text/plain; charset=UTF-8 This is great thanks for the work! I think we should have an opinion on grapheme clusters and tell about it in the RFC. I do support the idea that PHP users need to handle "characters" in term of "graphemes". We need a core way to deal with code points of course, but things like "reverse" have very low value without graphemes. toLower/toUpper also misses the turkish specifics - or is the Ustring class "locale" dependent? Should we add "toCaseFold"? Where are the "i" version of strpos, etc. Do we want them in core PHP7? An other point we should add to the RFC. For reference here is my grapheme cluster aware string handling: https://github.com/nicolas-grekas/Patchwork-UTF8/blob/master/class/Patchwork/Utf8.php and the same but turkish variant: https://github.com/nicolas-grekas/Patchwork-UTF8/blob/master/class/Patchwork/TurkishUtf8.php About unicode equivalence: For all the string matching functions (contains, startsWith, etc.) do they handling unicode equivalence? How do we compare two Ustrings? Does the == operator handle unicode equivalence? What is the way to go otherwise? Normalize is before on our own? The RFC should tell about it also IMHO (and tell that collation/sorting handling is out of scope). Complex topic :) Cheers, NIcolas --bcaec51d2d96afbfa40505eae64d--