Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:75231
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.192.50 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CAHMUw2FzGucbgaXCF9D35qDZJB5ovu-4YOMGWwpVhCHxiddn8A@mail.gmail.com>
References: <CAHMUw2EohSiEszkxER4CsR9L0FWz=P9audQ5o6WxLpn4KQjXfA@mail.gmail.com>
 <679D0316-74C5-4AEC-9097-5E9793937469@ajf.me> <53B1590F.5070009@gmail.com>
 <CAKOpQSw16jVjbFE471A4DBZg5Sr2ONsge_y1zRHPNeO1pORtOg@mail.gmail.com>
 <CAHMUw2GpdLpQsQigEp+WTxp7Vq7HLS6wzZ7o6Pt1e0fYp72z5w@mail.gmail.com> <CAHMUw2FzGucbgaXCF9D35qDZJB5ovu-4YOMGWwpVhCHxiddn8A@mail.gmail.com>
Date: Thu, 3 Jul 2014 13:56:08 +0100
Message-ID: <CAPg3Xx+DyZzc2Orku6kQoFSfHwRjB0CoGL4z+76iTdt1LZoLww@mail.gmail.com>
To: Tjerk Meesters <tjerk.meesters@gmail.com>
Cc: Kris Craig <kris.craig@gmail.com>, Rowan Collins <rowan.collins@gmail.com>, 
	PHP internals list <internals@lists.php.net>
Content-Type: multipart/alternative; boundary=001a11c1bf52bbfb3f04fd498a75
Subject: Re: [PHP-DEV] Re: ucwords() vs title case
From: petercowburn@gmail.com (Peter Cowburn)

--001a11c1bf52bbfb3f04fd498a75
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 3 July 2014 13:39, Tjerk Meesters <tjerk.meesters@gmail.com> wrote:

> On Wed, Jul 2, 2014 at 1:19 AM, Tjerk Meesters <tjerk.meesters@gmail.com>
> wrote:
>
> > Hi Kris,
> >
> >
> > On Tue, Jul 1, 2014 at 7:25 AM, Kris Craig <kris.craig@gmail.com> wrote=
:
> >
> >> On Mon, Jun 30, 2014 at 5:33 AM, Rowan Collins <rowan.collins@gmail.co=
m
> >
> >> wrote:
> >>
> >> > Andrea Faulds wrote (on 30/06/2014):
> >> >
> >> >> On 30 Jun 2014, at 12:54, Tjerk Meesters <tjerk.meesters@gmail.com>
> >> >> wrote:
> >> >>
> >> >>  Hi internals,
> >> >>>
> >> >>> I came across this old bug: https://bugs.php.net/bug.php?id=3D3440=
7
> >> >>>
> >> >>>
> >> >>>
> >> >>> Personally I find that the latter is too much of a departure from
> >> what we
> >> >>> currently have; a compromise could be to treat punctuation as a wo=
rd
> >> >>> delimiter.
> >> >>>
> >> >> Hmm. Why not make it follow what \b in a regex would do, looking fo=
r
> >> >> =E2=80=9Cword boundaries=E2=80=9D?
> >> >>
> >> >
> >> > Unfortunately, the cleverer you try to be, the more edge cases you
> find.
> >> > For instance, using \b will capitalise the 's' after an apostrophe,
> >> e.g. in
> >> > "Andrea'S Suggestion".
> >> >
> >> > The function we have in our code base at the moment looks like this:
> >> >
> >> > function smart_uc_words($string)
> >> > {
> >> >         $string =3D strtolower(trim($string));
> >> >         // Capitalise any word char preceded by a non-word char othe=
r
> >> than
> >> > an apostrophe
> >> >         $string =3D preg_replace_callback('/(?<!\w|\')(\w)/',
> >> function($m){
> >> > return strtoupper($m[1]); }, $string);
> >> >         // Capitalise any word char which comes between an apostroph=
e
> >> and
> >> > another word char
> >> >         $string =3D preg_replace_callback('/(?<=3D\')(\w)(?=3D\w)/',
> >> > function($m){ return strtoupper($m[1]); }, $string);
> >> >
> >> >         return $string;
> >> > }
> >> >
> >>
> >> What about leaving the default behavior as-is but adding an optional
> >> argument to specify how to determine these boundaries?  So if you did
> >> something like ucwords( "hello, world!", '\b' ) or ucwords( "hello,
> >> world!", array( ' ', '.', ... ) ), the user could control the behavior
> >> while existing ucwords( $arg ) code would behave as it does now withou=
t
> >> any
> >> BC.
> >>
> >
> > Yeah, that seems like an option, so basically how `trim()` works too;
> > treat these characters as word boundaries (default is " \t\r\n").
> >
> >     ucwords("hello (new) world", " ()");
> >
> > I'll prepare a PR for this and see how far that takes us :) let me know
> if
> > you guys have any other ideas.
> >
>
> I've created a PR here: https://github.com/php/php-src/pull/706


Your previous mail mentioned, "so basically how `trim()` works too", but
the PR doesn't quite do that.

Should ucwords() also accept character ranges, just like trim()?  i.e.,
ucwords("Foo bar", "a..z");  [not a very practical example, I know]


>
>
> If there are no objections I would like to commit this into 5.4 onwards
> somewhere next week.
>
> Thanks.
>
>
> >
> >
> >
> >> --Kris
> >>
> >
> >
> >
> > --
> > --
> > Tjerk
> >
>
>
>
> --
> --
> Tjerk
>

--001a11c1bf52bbfb3f04fd498a75--