Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:75233 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 47106 invoked from network); 3 Jul 2014 13:46:23 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Jul 2014 13:46:23 -0000 Authentication-Results: pb1.pair.com header.from=petercowburn@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=petercowburn@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.51 as permitted sender) X-PHP-List-Original-Sender: petercowburn@gmail.com X-Host-Fingerprint: 209.85.216.51 mail-qa0-f51.google.com Received: from [209.85.216.51] ([209.85.216.51:46300] helo=mail-qa0-f51.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id F2/75-47713-CAE55B35 for ; Thu, 03 Jul 2014 09:46:21 -0400 Received: by mail-qa0-f51.google.com with SMTP id j7so189960qaq.38 for ; Thu, 03 Jul 2014 06:46:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=6h8ImBQgHJzOxtvbY6KnYC2UtgKqcCVTemuKr/bHuQE=; b=wKAXOJy0BnZjaqA8O5KgXY1Kp97Q8hx06Cs1LI/2qTgfpkaji+wQOyrSDi2GsdfvbH Co6dT/BOVx8X9TxPfmvhp31Z0ICCJmRjalVSVt5sOqUAlGg5OB79hSORLVhpywFvv+9K hLEn/yxPcKN/En9meifScsufZ5YaT0qT91hHrGqkCTl2wgS0h/TZCvRbh8meLahfllWA shvMzv2w4rBzkhGQzZpmnk14cmAziv/cXzj895h5x//cFu1fuL+l7yrQDalSj1kc+4cP JEEnqGGmLmmd04tN/WaxO0kWIPxTYNySpasiNWNld6QGFTGPirtQhK1ZcHfVRoEZTtcP AfnA== X-Received: by 10.140.108.99 with SMTP id i90mr7503727qgf.56.1404395170469; Thu, 03 Jul 2014 06:46:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.96.173.99 with HTTP; Thu, 3 Jul 2014 06:45:30 -0700 (PDT) In-Reply-To: References: <679D0316-74C5-4AEC-9097-5E9793937469@ajf.me> <53B1590F.5070009@gmail.com> Date: Thu, 3 Jul 2014 14:45:30 +0100 Message-ID: To: Tjerk Meesters Cc: Kris Craig , Rowan Collins , PHP internals list Content-Type: multipart/alternative; boundary=001a11390c9c42bb2604fd4a3b91 Subject: Re: [PHP-DEV] Re: ucwords() vs title case From: petercowburn@gmail.com (Peter Cowburn) --001a11390c9c42bb2604fd4a3b91 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 3 July 2014 14:15, Tjerk Meesters wrote: > Hi! > > > On Thu, Jul 3, 2014 at 8:56 PM, Peter Cowburn > wrote: > >> >> >> >> On 3 July 2014 13:39, Tjerk Meesters wrote: >> >>> On Wed, Jul 2, 2014 at 1:19 AM, Tjerk Meesters >> > >>> wrote: >>> >>> > Hi Kris, >>> > >>> > >>> > On Tue, Jul 1, 2014 at 7:25 AM, Kris Craig >>> wrote: >>> > >>> >> On Mon, Jun 30, 2014 at 5:33 AM, Rowan Collins < >>> rowan.collins@gmail.com> >>> >> wrote: >>> >> >>> >> > Andrea Faulds wrote (on 30/06/2014): >>> >> > >>> >> >> On 30 Jun 2014, at 12:54, Tjerk Meesters >> > >>> >> >> wrote: >>> >> >> >>> >> >> Hi internals, >>> >> >>> >>> >> >>> I came across this old bug: https://bugs.php.net/bug.php?id=3D34= 407 >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> Personally I find that the latter is too much of a departure fro= m >>> >> what we >>> >> >>> currently have; a compromise could be to treat punctuation as a >>> word >>> >> >>> delimiter. >>> >> >>> >>> >> >> Hmm. Why not make it follow what \b in a regex would do, looking >>> for >>> >> >> =E2=80=9Cword boundaries=E2=80=9D? >>> >> >> >>> >> > >>> >> > Unfortunately, the cleverer you try to be, the more edge cases you >>> find. >>> >> > For instance, using \b will capitalise the 's' after an apostrophe= , >>> >> e.g. in >>> >> > "Andrea'S Suggestion". >>> >> > >>> >> > The function we have in our code base at the moment looks like thi= s: >>> >> > >>> >> > function smart_uc_words($string) >>> >> > { >>> >> > $string =3D strtolower(trim($string)); >>> >> > // Capitalise any word char preceded by a non-word char >>> other >>> >> than >>> >> > an apostrophe >>> >> > $string =3D preg_replace_callback('/(?>> >> function($m){ >>> >> > return strtoupper($m[1]); }, $string); >>> >> > // Capitalise any word char which comes between an >>> apostrophe >>> >> and >>> >> > another word char >>> >> > $string =3D preg_replace_callback('/(?<=3D\')(\w)(?=3D\w)/= ', >>> >> > function($m){ return strtoupper($m[1]); }, $string); >>> >> > >>> >> > return $string; >>> >> > } >>> >> > >>> >> >>> >> What about leaving the default behavior as-is but adding an optional >>> >> argument to specify how to determine these boundaries? So if you di= d >>> >> something like ucwords( "hello, world!", '\b' ) or ucwords( "hello, >>> >> world!", array( ' ', '.', ... ) ), the user could control the behavi= or >>> >> while existing ucwords( $arg ) code would behave as it does now >>> without >>> >> any >>> >> BC. >>> >> >>> > >>> > Yeah, that seems like an option, so basically how `trim()` works too; >>> > treat these characters as word boundaries (default is " \t\r\n"). >>> > >>> > ucwords("hello (new) world", " ()"); >>> > >>> > I'll prepare a PR for this and see how far that takes us :) let me >>> know if >>> > you guys have any other ideas. >>> > >>> >>> I've created a PR here: https://github.com/php/php-src/pull/706 >> >> >> Your previous mail mentioned, "so basically how `trim()` works too", but >> the PR doesn't quite do that. >> > > That's somewhat embarrassing; I didn't realise that character ranges are > supported in trim() =3DS > > Despite this oversight, I personally don't see a practical need in > supporting a character range because the given characters are not likely = to > be letters, but rather hyphens, braces, punctuation marks, spaces, etc. I > was also hoping to keep the function rather simple :) > The charmasks aren't limited to letters only. You could go crazy and use "\0../;..@[..`{..\x7F" for everything non-alphanumeric in ASCII, if you really wanted to. That said, I have no particular preference one way or the other (for ucwords()) and was mostly just clarifying the point about working how trim() works, or not. > > >> >> Should ucwords() also accept character ranges, just like trim()? i.e., >> ucwords("Foo bar", "a..z"); [not a very practical example, I know] >> >> >>> >>> >>> If there are no objections I would like to commit this into 5.4 onwards >>> somewhere next week. >>> >>> Thanks. >>> >>> >>> > >>> > >>> > >>> >> --Kris >>> >> >>> > >>> > >>> > >>> > -- >>> > -- >>> > Tjerk >>> > >>> >>> >>> >>> -- >>> -- >>> Tjerk >>> >> >> > > > -- > -- > Tjerk > --001a11390c9c42bb2604fd4a3b91--