Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:84120 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 63725 invoked from network); 1 Mar 2015 20:57:14 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Mar 2015 20:57:14 -0000 Authentication-Results: pb1.pair.com header.from=florian@margaine.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=florian@margaine.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain margaine.com from 209.85.213.170 cause and error) X-PHP-List-Original-Sender: florian@margaine.com X-Host-Fingerprint: 209.85.213.170 mail-ig0-f170.google.com Received: from [209.85.213.170] ([209.85.213.170:33687] helo=mail-ig0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BE/C1-53678-92D73F45 for ; Sun, 01 Mar 2015 15:57:13 -0500 Received: by igbhl2 with SMTP id hl2so11716625igb.0 for ; Sun, 01 Mar 2015 12:57:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=YYzrWWVCScv7sFrVV7z9ciUMDLKrzawdoOFNM5c7gqo=; b=EcRcUBmQFBBK1exwNBSav7tyYJ2Tp6Pa8F6VGxTD3PWlzFYCQtx9etTKAADmKfbyYY Xyzw/4YPFtxDe9DcWgjCg/G0AdxaPVWKN1Zs8+qKfB1JbyKTlJhbhqMWSa1cDI6RIzPX V6WU8uDGKCgvabJew4iV6wwIdLb7jPqlgLxq/3FWFTyAdVHmyzHfHeNmHWhXeKya0UG5 uYYHLbAFGIJRtl5A01O9FN/LzxUUy3Ixc7MabfFJZOoZExfjkHRWnyRwfd8oR2yMd6vr JKV3PrKinapNh2/+7MXHI1BojMPxjHvKEknwmws6jTkyt7O9xRZ4RKtDHx9QMNYGHGvn qgAw== X-Gm-Message-State: ALoCoQklTL2CD5b7+QC0H/SnyO9ahCj29tnRFDC+Q+kAD6LwxigHNoTmFvtc74j2ZZPymYgPCYuB MIME-Version: 1.0 X-Received: by 10.43.100.67 with SMTP id cv3mr27445299icc.92.1425243430512; Sun, 01 Mar 2015 12:57:10 -0800 (PST) Received: by 10.64.229.14 with HTTP; Sun, 1 Mar 2015 12:57:09 -0800 (PST) X-Originating-IP: [90.42.244.243] Received: by 10.64.229.14 with HTTP; Sun, 1 Mar 2015 12:57:09 -0800 (PST) In-Reply-To: References: <1413875212.2624.3.camel@localhost.localdomain> <54469840.3070708@sugarcrm.com> <1414051917.2624.35.camel@localhost.localdomain> <1414060726.2624.60.camel@localhost.localdomain> <1414072403.3228.3.camel@kuechenschabe> <87D717D5-273B-4A32-A3E5-83EBDFD314CB@ajf.me> <1414077690.3228.12.camel@kuechenschabe> <54495CF6.30608@sugarcrm.com> <1414130585.2624.64.camel@localhost.localdomain> Date: Sun, 1 Mar 2015 21:57:09 +0100 Message-ID: To: Derick Rethans Cc: PHP Internals , Joe Watkins Content-Type: multipart/alternative; boundary=bcaec5171e816501510510405828 Subject: Re: [PHP-DEV] [RFC] UString From: florian@margaine.com (Florian Margaine) --bcaec5171e816501510510405828 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Le 1 mars 2015 21:26, "Derick Rethans" a =C3=A9crit : > > Hey Joe, > > I think there are a few issues with the proposal, although I like the > general idea. I've had the tab with the RFC open since October... but > never looked at it until now :-/. So, a few comments: > > - UString as a name. > > I think I am going to prefer "Text" as a class name. Unicode (and > intl/icu) have lots of operators acting on items containing unicode > strings. But they are really pieces of text. For example sentences, word > break iterators, etc. UString *feels* clunky, and not "standard". If > it's going to be part of PHP core, then we should pick a "core" name. (I > might prefer String, but that's going to cause a whole lot of issues > obviously). Isn't this "solved" if we use \php\String? > > - "Needs More Methods" > > I had a look at the API that that links to, and I miss operators like > iterators. Over words, sentences, characters, etc. Basically the > functionality of > http://docs.php.net/manual/en/class.intlbreakiterator.php, > http://docs.php.net/manual/en/class.intlrulebasedbreakiterator.php and > http://docs.php.net/manual/en/class.intlcodepointbreakiterator.php > > I realize intl already immplements, this, but it's really beneficial to > have for a "Text" class - especially for replacing functionality where > people now look over a string - with a character index. > > - "Not a full String API Replacement" > > I would certainly expect more from it than just the UnicodeString API. > Perhaps not for a first iteration, but certainly for subsequent > versions. Things like transliterations, and specifically iterators would > be high on my list. > > - "Patch" > > toUpper/toLower, there is a missing one for toTitle > > - In the code's README: > > "Note: UString is interchangable with zend strings for method parameters > and can be cast for output/conversion to zend strings" > > How does that work? And what would it convert to? > > - How are "characters" counted? > > Is a character a Code Point, or is a character a base character + > combining diacritics. In the first form, A + =C2=B0 is considered as > characters, in the second option, just one. For wordwrap, splice, > substring, it is really important that only the *full sequence* is > considered as a character. And hence, a character really should be the > full sequence. The text in "charAt" seems to contradict that, and that > is a mistake. > > In the original PHP 6 we didn't do that due to perormance reasons, but > that point is moot now as only people who opt into using "Text" will > suffer from this. > > - "trim" > > What is a leading or trailing space? Is it just U+0020, or other Unicode > defined space characters as well? ( , U+00A0 comes to mind here) > > - What is "UG(defaultpad)," about? > > - For the code: > > - there is some interesting, non standard whitespaceing going on: > > - { goes on next line after a func decl > - sometimes 4 spaces in stead of a tab are used for indentation, > > - Why is there no __toString() ? > > - How can other extensions, not really making use of "Text", use there > strings (as UTF8 strings f.e.) > > > cheers, > Derick > > > On Sat, 28 Feb 2015, Joe Watkins wrote: > > > Morning internals, > > > > This is just a quick note to announce my intention to ready this RF= C > > for voting next week. > > > > I know I'm a little late maybe, I was real sick most of last week, so > > couldn't do anything useful. > > > > A couple of us intend to fix outstanding issues on github and those > > raised here, tidy the RFC and open the vote for 7. > > > > I would ask anyone interested to scan through this thread and announce > > concerns that are not mentioned asap. > > > > Cheers > > Joe > > > > On Fri, Oct 24, 2014 at 3:01 PM, Chris Wright wrote: > > > > > On 24 October 2014 07:03, Joe Watkins wrote: > > > > > >> On Thu, 2014-10-23 at 12:54 -0700, Stas Malyshev wrote: > > >> > Hi! > > >> > > > >> > > P.S. u() is a bad name, will break lots of code, i.e. > > >> > > > >> > Maybe __u()? It's a bit ugly but you're not allowed to use __ so it's > > >> safe. > > >> > > > >> > > >> /me cringes ... > > >> > > >> I wonder how much of a problem it really is, usually when we say som= e > > >> function name is a problem is because of hundreds and hundreds of > > >> results on github. > > >> > > >> If it's a huge problem then we should rename it, if we have to dig > > >> around for a single project that's incompatible, or even a handful, then > > >> it's not really a problem. > > >> > > >> Cheers > > >> Joe > > > > > > > > > I can see this being something relatively common. While I personally would > > > never do it, there are a few reasons I can think of that people *might* do > > > it: > > > > > > - Wrapper for creating HTML output > > > - urlencode() shortcut > > > - (obviously) various unicode-related things > > > > > > Searching on codesearch [1] revealed (amongst a few other hits on the > > > first page) another interesting use of it in the hhvm test suite [2]. It's > > > difficult to search for this because all the available public search > > > engines that I know of do fuzzy matching. > > > > > > Sorry. This sucks, because every other option we have for this is sucks. > > > > > > On the bright side, anything chosen could always be aliased at the top of > > > the file: > > > > > > use function __u as u; > > > > > > This also sucks, but it sucks a little bit less because the collisions are > > > avoided - or at least, avoided in such a way that the onus is on the user - > > > and one can still have the sane name. > > > > > > First-class support at the syntax level (presumably $foo =3D u"unicod= e > > > string" since we already have $foo =3D b"binary string") would IMO be better > > > and (hopefully?) a long-term goal, but I am aware that it is - and probably > > > should be - outside the scope of the current proposal. > > > > > > [1] https://searchcode.com/?q=3Dfunction+u+lang%3Aphp > > > [2] > > > https://github.com/facebook/hhvm/blob/master/hphp/test/slow/ext_icu/uspoof.= php#L13 > > > > > > > -- > http://derickrethans.nl | http://xdebug.org > Like Xdebug? Consider a donation: http://xdebug.org/donate.php > twitter: @derickr and @xdebug > Posted with an email client that doesn't mangle email: alpine > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php Cheers, Florian Margaine --bcaec5171e816501510510405828--