Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:75163 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 43902 invoked from network); 1 Jul 2014 17:19:25 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Jul 2014 17:19:25 -0000 Authentication-Results: pb1.pair.com smtp.mail=tjerk.meesters@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=tjerk.meesters@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.46 as permitted sender) X-PHP-List-Original-Sender: tjerk.meesters@gmail.com X-Host-Fingerprint: 209.85.215.46 mail-la0-f46.google.com Received: from [209.85.215.46] ([209.85.215.46:39052] helo=mail-la0-f46.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 64/70-38199-B9DE2B35 for ; Tue, 01 Jul 2014 13:19:24 -0400 Received: by mail-la0-f46.google.com with SMTP id el20so6121498lab.19 for ; Tue, 01 Jul 2014 10:19:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=x6VOCXkmDqNPjeqrznWfQaAVvPtRsWXQCOCCrYB3n4g=; b=n9H3ZTFyHVN1U+aXY/+KYVJS05crO8hZiXFhYiEREWeUmBeGd1hJOC6fx30JS+yboj o0Fz2HjL7suIJ3pJFJWJXfKIz9k4IZu3ZiPzcNVc5AjuJD3WTfoT07piAah/kTY5GXd3 F4rVBldHBcLoMIt31VwQzgCzk4i0jabDdx+cX+YpDxa02hxCt1xB00i4dqar5jTAMXkZ MM2yK2S+4w4b/sW67khvUyi1AEoQqnSyTblwkbzVx/wS58vaMsFiTjZb6tMA9CJxHpCB pg8opJlLyZa2BtrKDEPqMZP9Ji7uy1BJFSjk5T5SwJAs0O+CyiLYIfNL+9WihyqaEgTv ZUog== MIME-Version: 1.0 X-Received: by 10.112.148.10 with SMTP id to10mr2379271lbb.77.1404235160656; Tue, 01 Jul 2014 10:19:20 -0700 (PDT) Received: by 10.114.2.238 with HTTP; Tue, 1 Jul 2014 10:19:20 -0700 (PDT) In-Reply-To: References: <679D0316-74C5-4AEC-9097-5E9793937469@ajf.me> <53B1590F.5070009@gmail.com> Date: Wed, 2 Jul 2014 01:19:20 +0800 Message-ID: To: Kris Craig Cc: Rowan Collins , PHP internals list Content-Type: multipart/alternative; boundary=047d7b3a88c0eea13e04fd24f9b9 Subject: Re: [PHP-DEV] Re: ucwords() vs title case From: tjerk.meesters@gmail.com (Tjerk Meesters) --047d7b3a88c0eea13e04fd24f9b9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Kris, On Tue, Jul 1, 2014 at 7:25 AM, Kris Craig wrote: > On Mon, Jun 30, 2014 at 5:33 AM, Rowan Collins > wrote: > > > Andrea Faulds wrote (on 30/06/2014): > > > >> On 30 Jun 2014, at 12:54, Tjerk Meesters > >> wrote: > >> > >> Hi internals, > >>> > >>> I came across this old bug: https://bugs.php.net/bug.php?id=3D34407 > >>> > >>> > >>> > >>> Personally I find that the latter is too much of a departure from wha= t > we > >>> currently have; a compromise could be to treat punctuation as a word > >>> delimiter. > >>> > >> Hmm. Why not make it follow what \b in a regex would do, looking for > >> =E2=80=9Cword boundaries=E2=80=9D? > >> > > > > Unfortunately, the cleverer you try to be, the more edge cases you find= . > > For instance, using \b will capitalise the 's' after an apostrophe, e.g= . > in > > "Andrea'S Suggestion". > > > > The function we have in our code base at the moment looks like this: > > > > function smart_uc_words($string) > > { > > $string =3D strtolower(trim($string)); > > // Capitalise any word char preceded by a non-word char other > than > > an apostrophe > > $string =3D preg_replace_callback('/(? > return strtoupper($m[1]); }, $string); > > // Capitalise any word char which comes between an apostrophe a= nd > > another word char > > $string =3D preg_replace_callback('/(?<=3D\')(\w)(?=3D\w)/', > > function($m){ return strtoupper($m[1]); }, $string); > > > > return $string; > > } > > > > What about leaving the default behavior as-is but adding an optional > argument to specify how to determine these boundaries? So if you did > something like ucwords( "hello, world!", '\b' ) or ucwords( "hello, > world!", array( ' ', '.', ... ) ), the user could control the behavior > while existing ucwords( $arg ) code would behave as it does now without a= ny > BC. > Yeah, that seems like an option, so basically how `trim()` works too; treat these characters as word boundaries (default is " \t\r\n"). ucwords("hello (new) world", " ()"); I'll prepare a PR for this and see how far that takes us :) let me know if you guys have any other ideas. > --Kris > --=20 -- Tjerk --047d7b3a88c0eea13e04fd24f9b9--