Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:75232
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.220.177 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CAPg3Xx+DyZzc2Orku6kQoFSfHwRjB0CoGL4z+76iTdt1LZoLww@mail.gmail.com>
References: <CAHMUw2EohSiEszkxER4CsR9L0FWz=P9audQ5o6WxLpn4KQjXfA@mail.gmail.com>
	<679D0316-74C5-4AEC-9097-5E9793937469@ajf.me>
	<53B1590F.5070009@gmail.com>
	<CAKOpQSw16jVjbFE471A4DBZg5Sr2ONsge_y1zRHPNeO1pORtOg@mail.gmail.com>
	<CAHMUw2GpdLpQsQigEp+WTxp7Vq7HLS6wzZ7o6Pt1e0fYp72z5w@mail.gmail.com>
	<CAHMUw2FzGucbgaXCF9D35qDZJB5ovu-4YOMGWwpVhCHxiddn8A@mail.gmail.com>
	<CAPg3Xx+DyZzc2Orku6kQoFSfHwRjB0CoGL4z+76iTdt1LZoLww@mail.gmail.com>
Date: Thu, 3 Jul 2014 21:15:41 +0800
Message-ID: <CAHMUw2GZx0nv2SCcchy_3T=f15+WpRczm0Wp3jqq5wWaAf4X7A@mail.gmail.com>
To: Peter Cowburn <petercowburn@gmail.com>
Cc: Kris Craig <kris.craig@gmail.com>, Rowan Collins <rowan.collins@gmail.com>, 
	PHP internals list <internals@lists.php.net>
Content-Type: multipart/alternative; boundary=047d7b3a8f84462e3c04fd49ce5a
Subject: Re: [PHP-DEV] Re: ucwords() vs title case
From: tjerk.meesters@gmail.com (Tjerk Meesters)

--047d7b3a8f84462e3c04fd49ce5a
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi!


On Thu, Jul 3, 2014 at 8:56 PM, Peter Cowburn <petercowburn@gmail.com>
wrote:

>
>
>
> On 3 July 2014 13:39, Tjerk Meesters <tjerk.meesters@gmail.com> wrote:
>
>> On Wed, Jul 2, 2014 at 1:19 AM, Tjerk Meesters <tjerk.meesters@gmail.com=
>
>> wrote:
>>
>> > Hi Kris,
>> >
>> >
>> > On Tue, Jul 1, 2014 at 7:25 AM, Kris Craig <kris.craig@gmail.com>
>> wrote:
>> >
>> >> On Mon, Jun 30, 2014 at 5:33 AM, Rowan Collins <
>> rowan.collins@gmail.com>
>> >> wrote:
>> >>
>> >> > Andrea Faulds wrote (on 30/06/2014):
>> >> >
>> >> >> On 30 Jun 2014, at 12:54, Tjerk Meesters <tjerk.meesters@gmail.com=
>
>> >> >> wrote:
>> >> >>
>> >> >>  Hi internals,
>> >> >>>
>> >> >>> I came across this old bug: https://bugs.php.net/bug.php?id=3D344=
07
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> Personally I find that the latter is too much of a departure from
>> >> what we
>> >> >>> currently have; a compromise could be to treat punctuation as a
>> word
>> >> >>> delimiter.
>> >> >>>
>> >> >> Hmm. Why not make it follow what \b in a regex would do, looking f=
or
>> >> >> =E2=80=9Cword boundaries=E2=80=9D?
>> >> >>
>> >> >
>> >> > Unfortunately, the cleverer you try to be, the more edge cases you
>> find.
>> >> > For instance, using \b will capitalise the 's' after an apostrophe,
>> >> e.g. in
>> >> > "Andrea'S Suggestion".
>> >> >
>> >> > The function we have in our code base at the moment looks like this=
:
>> >> >
>> >> > function smart_uc_words($string)
>> >> > {
>> >> >         $string =3D strtolower(trim($string));
>> >> >         // Capitalise any word char preceded by a non-word char oth=
er
>> >> than
>> >> > an apostrophe
>> >> >         $string =3D preg_replace_callback('/(?<!\w|\')(\w)/',
>> >> function($m){
>> >> > return strtoupper($m[1]); }, $string);
>> >> >         // Capitalise any word char which comes between an apostrop=
he
>> >> and
>> >> > another word char
>> >> >         $string =3D preg_replace_callback('/(?<=3D\')(\w)(?=3D\w)/'=
,
>> >> > function($m){ return strtoupper($m[1]); }, $string);
>> >> >
>> >> >         return $string;
>> >> > }
>> >> >
>> >>
>> >> What about leaving the default behavior as-is but adding an optional
>> >> argument to specify how to determine these boundaries?  So if you did
>> >> something like ucwords( "hello, world!", '\b' ) or ucwords( "hello,
>> >> world!", array( ' ', '.', ... ) ), the user could control the behavio=
r
>> >> while existing ucwords( $arg ) code would behave as it does now witho=
ut
>> >> any
>> >> BC.
>> >>
>> >
>> > Yeah, that seems like an option, so basically how `trim()` works too;
>> > treat these characters as word boundaries (default is " \t\r\n").
>> >
>> >     ucwords("hello (new) world", " ()");
>> >
>> > I'll prepare a PR for this and see how far that takes us :) let me kno=
w
>> if
>> > you guys have any other ideas.
>> >
>>
>> I've created a PR here: https://github.com/php/php-src/pull/706
>
>
> Your previous mail mentioned, "so basically how `trim()` works too", but
> the PR doesn't quite do that.
>

That's somewhat embarrassing; I didn't realise that character ranges are
supported in trim() =3DS

Despite this oversight, I personally don't see a practical need in
supporting a character range because the given characters are not likely to
be letters, but rather hyphens, braces, punctuation marks, spaces, etc. I
was also hoping to keep the function rather simple :)


>
> Should ucwords() also accept character ranges, just like trim()?  i.e.,
> ucwords("Foo bar", "a..z");  [not a very practical example, I know]
>
>
>>
>>
>> If there are no objections I would like to commit this into 5.4 onwards
>> somewhere next week.
>>
>> Thanks.
>>
>>
>> >
>> >
>> >
>> >> --Kris
>> >>
>> >
>> >
>> >
>> > --
>> > --
>> > Tjerk
>> >
>>
>>
>>
>> --
>> --
>> Tjerk
>>
>
>


--=20
--
Tjerk

--047d7b3a8f84462e3c04fd49ce5a--