Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:75231 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 42839 invoked from network); 3 Jul 2014 12:56:53 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Jul 2014 12:56:53 -0000 Authentication-Results: pb1.pair.com header.from=petercowburn@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=petercowburn@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.192.50 as permitted sender) X-PHP-List-Original-Sender: petercowburn@gmail.com X-Host-Fingerprint: 209.85.192.50 mail-qg0-f50.google.com Received: from [209.85.192.50] ([209.85.192.50:33391] helo=mail-qg0-f50.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id B4/B4-47713-31355B35 for ; Thu, 03 Jul 2014 08:56:51 -0400 Received: by mail-qg0-f50.google.com with SMTP id j5so146945qga.23 for ; Thu, 03 Jul 2014 05:56:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=Xput+nuUfdNq7wdLeiCQFr37A2zOG5z6ki/Y526Mb9g=; b=dtUBBICpIUhD0vLunm76rAEkFlfklK+VXmh4Oj7Sb4sAJjjxf14MU9iQvoBlRNKnC/ bt/mQ0bm472w0YC53zrmIESFqn2zZz3lVPWPf3xHcHFt7gmhkXr0isjNxCnZ8+2GQmfV sqvTCKawmfkrYXDxDVSOFuZU8qs9kx5aMQSTb8n2xp8DINQVZd4aZE4fe/GxZBdKloGO FKdC2YwCSsRBCctkGh5DoLOjr3E0JGjT0BEYjXa76e/Cf3VF3RbfTFK1DY1mz2CBPoCG wA0ePq8baGvHDlwdwgHrwqBpgtwdwfNxVjj9a2mzd2yYQAtYj1MGf/1DkYsVasK0X8OY ruBg== X-Received: by 10.224.120.193 with SMTP id e1mr7377049qar.42.1404392208792; Thu, 03 Jul 2014 05:56:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.96.173.99 with HTTP; Thu, 3 Jul 2014 05:56:08 -0700 (PDT) In-Reply-To: References: <679D0316-74C5-4AEC-9097-5E9793937469@ajf.me> <53B1590F.5070009@gmail.com> Date: Thu, 3 Jul 2014 13:56:08 +0100 Message-ID: To: Tjerk Meesters Cc: Kris Craig , Rowan Collins , PHP internals list Content-Type: multipart/alternative; boundary=001a11c1bf52bbfb3f04fd498a75 Subject: Re: [PHP-DEV] Re: ucwords() vs title case From: petercowburn@gmail.com (Peter Cowburn) --001a11c1bf52bbfb3f04fd498a75 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 3 July 2014 13:39, Tjerk Meesters wrote: > On Wed, Jul 2, 2014 at 1:19 AM, Tjerk Meesters > wrote: > > > Hi Kris, > > > > > > On Tue, Jul 1, 2014 at 7:25 AM, Kris Craig wrote= : > > > >> On Mon, Jun 30, 2014 at 5:33 AM, Rowan Collins > > >> wrote: > >> > >> > Andrea Faulds wrote (on 30/06/2014): > >> > > >> >> On 30 Jun 2014, at 12:54, Tjerk Meesters > >> >> wrote: > >> >> > >> >> Hi internals, > >> >>> > >> >>> I came across this old bug: https://bugs.php.net/bug.php?id=3D3440= 7 > >> >>> > >> >>> > >> >>> > >> >>> Personally I find that the latter is too much of a departure from > >> what we > >> >>> currently have; a compromise could be to treat punctuation as a wo= rd > >> >>> delimiter. > >> >>> > >> >> Hmm. Why not make it follow what \b in a regex would do, looking fo= r > >> >> =E2=80=9Cword boundaries=E2=80=9D? > >> >> > >> > > >> > Unfortunately, the cleverer you try to be, the more edge cases you > find. > >> > For instance, using \b will capitalise the 's' after an apostrophe, > >> e.g. in > >> > "Andrea'S Suggestion". > >> > > >> > The function we have in our code base at the moment looks like this: > >> > > >> > function smart_uc_words($string) > >> > { > >> > $string =3D strtolower(trim($string)); > >> > // Capitalise any word char preceded by a non-word char othe= r > >> than > >> > an apostrophe > >> > $string =3D preg_replace_callback('/(? >> function($m){ > >> > return strtoupper($m[1]); }, $string); > >> > // Capitalise any word char which comes between an apostroph= e > >> and > >> > another word char > >> > $string =3D preg_replace_callback('/(?<=3D\')(\w)(?=3D\w)/', > >> > function($m){ return strtoupper($m[1]); }, $string); > >> > > >> > return $string; > >> > } > >> > > >> > >> What about leaving the default behavior as-is but adding an optional > >> argument to specify how to determine these boundaries? So if you did > >> something like ucwords( "hello, world!", '\b' ) or ucwords( "hello, > >> world!", array( ' ', '.', ... ) ), the user could control the behavior > >> while existing ucwords( $arg ) code would behave as it does now withou= t > >> any > >> BC. > >> > > > > Yeah, that seems like an option, so basically how `trim()` works too; > > treat these characters as word boundaries (default is " \t\r\n"). > > > > ucwords("hello (new) world", " ()"); > > > > I'll prepare a PR for this and see how far that takes us :) let me know > if > > you guys have any other ideas. > > > > I've created a PR here: https://github.com/php/php-src/pull/706 Your previous mail mentioned, "so basically how `trim()` works too", but the PR doesn't quite do that. Should ucwords() also accept character ranges, just like trim()? i.e., ucwords("Foo bar", "a..z"); [not a very practical example, I know] > > > If there are no objections I would like to commit this into 5.4 onwards > somewhere next week. > > Thanks. > > > > > > > > > >> --Kris > >> > > > > > > > > -- > > -- > > Tjerk > > > > > > -- > -- > Tjerk > --001a11c1bf52bbfb3f04fd498a75--