Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:75232 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 44833 invoked from network); 3 Jul 2014 13:15:46 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Jul 2014 13:15:46 -0000 Authentication-Results: pb1.pair.com smtp.mail=tjerk.meesters@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=tjerk.meesters@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.220.177 as permitted sender) X-PHP-List-Original-Sender: tjerk.meesters@gmail.com X-Host-Fingerprint: 209.85.220.177 mail-vc0-f177.google.com Received: from [209.85.220.177] ([209.85.220.177:61362] helo=mail-vc0-f177.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 40/15-47713-08755B35 for ; Thu, 03 Jul 2014 09:15:44 -0400 Received: by mail-vc0-f177.google.com with SMTP id ij19so182699vcb.22 for ; Thu, 03 Jul 2014 06:15:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=EwvsUgqyqXF77iLBBuISMEC2wnxr4KyfJj9ZTo7ONU8=; b=Gkv+qlkeQ+GbeUElUMoTwSfc6WZJUUl2STt1BmswTKHGyDCnD8iI40FNyVSRnWefdl bzsCzQOhqAuZOLD0MAc4QfhRTdhMfl6KgDD2aupetKTEsGTGL9cRj78oq9d98ssdAgfX ZB+9WxYIIUVIPDdu/UjKyeEr/9juR0RjXeHq1ZEOzQdAChugrpiJl3tJ3zG5PB2ufwps vQywjpbbmz0E0iT8frMOYBmuQwg95dMGUFL1q/9af9EdVEPRSzi+Ym30iOmQF4XZNMr7 RYiefc9iLpnq/lsBN3Rchnc0KU9CUyzhML/ZUFKJux+ZVZ2gtoI05BfZsjJtlkcVoekm 5FYA== MIME-Version: 1.0 X-Received: by 10.220.80.70 with SMTP id s6mr355545vck.44.1404393341985; Thu, 03 Jul 2014 06:15:41 -0700 (PDT) Received: by 10.58.132.71 with HTTP; Thu, 3 Jul 2014 06:15:41 -0700 (PDT) In-Reply-To: References: <679D0316-74C5-4AEC-9097-5E9793937469@ajf.me> <53B1590F.5070009@gmail.com> Date: Thu, 3 Jul 2014 21:15:41 +0800 Message-ID: To: Peter Cowburn Cc: Kris Craig , Rowan Collins , PHP internals list Content-Type: multipart/alternative; boundary=047d7b3a8f84462e3c04fd49ce5a Subject: Re: [PHP-DEV] Re: ucwords() vs title case From: tjerk.meesters@gmail.com (Tjerk Meesters) --047d7b3a8f84462e3c04fd49ce5a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi! On Thu, Jul 3, 2014 at 8:56 PM, Peter Cowburn wrote: > > > > On 3 July 2014 13:39, Tjerk Meesters wrote: > >> On Wed, Jul 2, 2014 at 1:19 AM, Tjerk Meesters >> wrote: >> >> > Hi Kris, >> > >> > >> > On Tue, Jul 1, 2014 at 7:25 AM, Kris Craig >> wrote: >> > >> >> On Mon, Jun 30, 2014 at 5:33 AM, Rowan Collins < >> rowan.collins@gmail.com> >> >> wrote: >> >> >> >> > Andrea Faulds wrote (on 30/06/2014): >> >> > >> >> >> On 30 Jun 2014, at 12:54, Tjerk Meesters >> >> >> wrote: >> >> >> >> >> >> Hi internals, >> >> >>> >> >> >>> I came across this old bug: https://bugs.php.net/bug.php?id=3D344= 07 >> >> >>> >> >> >>> >> >> >>> >> >> >>> Personally I find that the latter is too much of a departure from >> >> what we >> >> >>> currently have; a compromise could be to treat punctuation as a >> word >> >> >>> delimiter. >> >> >>> >> >> >> Hmm. Why not make it follow what \b in a regex would do, looking f= or >> >> >> =E2=80=9Cword boundaries=E2=80=9D? >> >> >> >> >> > >> >> > Unfortunately, the cleverer you try to be, the more edge cases you >> find. >> >> > For instance, using \b will capitalise the 's' after an apostrophe, >> >> e.g. in >> >> > "Andrea'S Suggestion". >> >> > >> >> > The function we have in our code base at the moment looks like this= : >> >> > >> >> > function smart_uc_words($string) >> >> > { >> >> > $string =3D strtolower(trim($string)); >> >> > // Capitalise any word char preceded by a non-word char oth= er >> >> than >> >> > an apostrophe >> >> > $string =3D preg_replace_callback('/(?> >> function($m){ >> >> > return strtoupper($m[1]); }, $string); >> >> > // Capitalise any word char which comes between an apostrop= he >> >> and >> >> > another word char >> >> > $string =3D preg_replace_callback('/(?<=3D\')(\w)(?=3D\w)/'= , >> >> > function($m){ return strtoupper($m[1]); }, $string); >> >> > >> >> > return $string; >> >> > } >> >> > >> >> >> >> What about leaving the default behavior as-is but adding an optional >> >> argument to specify how to determine these boundaries? So if you did >> >> something like ucwords( "hello, world!", '\b' ) or ucwords( "hello, >> >> world!", array( ' ', '.', ... ) ), the user could control the behavio= r >> >> while existing ucwords( $arg ) code would behave as it does now witho= ut >> >> any >> >> BC. >> >> >> > >> > Yeah, that seems like an option, so basically how `trim()` works too; >> > treat these characters as word boundaries (default is " \t\r\n"). >> > >> > ucwords("hello (new) world", " ()"); >> > >> > I'll prepare a PR for this and see how far that takes us :) let me kno= w >> if >> > you guys have any other ideas. >> > >> >> I've created a PR here: https://github.com/php/php-src/pull/706 > > > Your previous mail mentioned, "so basically how `trim()` works too", but > the PR doesn't quite do that. > That's somewhat embarrassing; I didn't realise that character ranges are supported in trim() =3DS Despite this oversight, I personally don't see a practical need in supporting a character range because the given characters are not likely to be letters, but rather hyphens, braces, punctuation marks, spaces, etc. I was also hoping to keep the function rather simple :) > > Should ucwords() also accept character ranges, just like trim()? i.e., > ucwords("Foo bar", "a..z"); [not a very practical example, I know] > > >> >> >> If there are no objections I would like to commit this into 5.4 onwards >> somewhere next week. >> >> Thanks. >> >> >> > >> > >> > >> >> --Kris >> >> >> > >> > >> > >> > -- >> > -- >> > Tjerk >> > >> >> >> >> -- >> -- >> Tjerk >> > > --=20 -- Tjerk --047d7b3a8f84462e3c04fd49ce5a--