Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79507 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 4021 invoked from network); 9 Dec 2014 16:24:44 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Dec 2014 16:24:44 -0000 Authentication-Results: pb1.pair.com header.from=rowan.collins@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=rowan.collins@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.42 as permitted sender) X-PHP-List-Original-Sender: rowan.collins@gmail.com X-Host-Fingerprint: 74.125.82.42 mail-wg0-f42.google.com Received: from [74.125.82.42] ([74.125.82.42:38840] helo=mail-wg0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id FA/C2-23416-C4227845 for ; Tue, 09 Dec 2014 11:24:44 -0500 Received: by mail-wg0-f42.google.com with SMTP id z12so1311367wgg.1 for ; Tue, 09 Dec 2014 08:24:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=/Cgmgf/Z8XqDehGsazVwHaMemkp5Jeqk1ylzv0JuQ60=; b=faTojyit8ZA7X3CG6gcAAO+UR+JX/i32hDy4DVq6joSPEl19oO/L+HAb4GrKYyc7jB FYgWhW19BnRTc/fsoeBcwvOd+MAgAIBcOGZ5B5Pm83sDz3mhcHmz7+i6firukmLWT3yt pdqbzpQRIkUUI8NiVvkLB1yvcHT+BsRSP+0o61XDk53voPIsxlsHsPaTFxLWgh9G2PYK SV9fdP7XIodwjX6F20BS9TSFx5jVdVcytGUrQlV5YKBPs7lrdGYaHRnto2JYyzwmTa2W TUjqftQMaf7ck9Bk8ALGmmNBWz6LweCQG0HylsuMxSEFV2hBOfrPMwcSW3w9zuK2HgEc pYPg== X-Received: by 10.180.80.34 with SMTP id o2mr5707222wix.53.1418142281002; Tue, 09 Dec 2014 08:24:41 -0800 (PST) Received: from [192.168.0.148] ([62.189.198.114]) by mx.google.com with ESMTPSA id ep9sm2853694wid.3.2014.12.09.08.24.39 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 09 Dec 2014 08:24:40 -0800 (PST) Message-ID: <54872241.6070509@gmail.com> Date: Tue, 09 Dec 2014 16:24:33 +0000 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: internals@lists.php.net References: <10EE9A5B-1711-455A-AB6A-6E7EA858D081@ajf.me> <5486AFA3.3000402@lsces.co.uk> <5486FA8B.2070206@lsces.co.uk> <07B2909B-359D-401E-B9CC-DAC0E8F22B19@ajf.me> <5487104E.7020201@lsces.co.uk> <54871578.6050202@gmail.com> <54871CAA.6040600@lsces.co.uk> In-Reply-To: <54871CAA.6040600@lsces.co.uk> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [VOTE][RFC] Unicode Codepoint Escape Syntax From: rowan.collins@gmail.com (Rowan Collins) Lester Caine wrote on 09/12/2014 16:00: > On 09/12/14 15:30, Rowan Collins wrote: >> Lester Caine wrote on 09/12/2014 15:07: >>> On 09/12/14 14:07, Andrea Faulds wrote: >>>>> On 9 Dec 2014, at 13:35, Lester Caine wrote: >>>>> >>>>>> On 09/12/14 13:07, Andrea Faulds wrote: >>>>>> >>>>>>> On 9 Dec 2014, at 08:15, Lester Caine wrote: >>>>>>> >>>>>>> If ICU is to be adopted as the base for unicode support, then surely >>>>>>> everything else should follow those rules? >>>>>>> \uhhhh and \Uhhhhhhhh are defined along with \x{hhhhhh} so does it >>>>>>> make >>>>>>> sense to add something which is not part of ICU? >>>>>> Er, where does ICU define \uXXXX and \UXXXXXX? I don't unferstand. >>>>> http://userguide.icu-project.org/strings/regexp >>>> We aren't using ICU regular expressions, and ICU is merely an >>>> implementation detail anyway. >>> Has THAT been agreed on? Surely if using ICU fully in PHP7 in place of >>> the patchwork of current fixes for unicode then we don't want to be >>> breaking thing again by odd differences from the core code for unicode? >>> I though the agreement was that there was no resource to create an >>> alternative from scratch? >> I think what Andrea's getting at is that the fact that ICU is in use >> under the hood shouldn't be particularly visible to users. If PHP gets >> "Unicode support" (whatever that turns out to mean), what the user >> should see is *PHP's Unicode facilities*; only core devs and package >> maintainers will need to know that those are implemented using ICU. As >> such, there's no automatic need for PHP to do everything the same way as >> ICU. > That was the reason for asking ... > What is the point of all these piecemeal patches when the underlying > base has not yet been agreed on? That we are using ICU in things like > the database interfaces for unicode support would point to it being > somewhat useful if those processes produced the same code as the same > actions in PHP. ICU is well established and it's API already in use in > the same platform as PHP is running on ... so can we please treat all of > these 'patches' in the light of a proper debate on the bigger picture. > Forcing something like this through now simply does not make sense, and > while there may be no 'automatic need' for the database interface to > work the same as other parts, it would perhaps be worth a little > consideration? > I see what you mean, but I think in this case, it would make very little difference what other Unicode pieces are added, since the Unicode escape syntax will only ever be interpreted by the compiler, and no other functions will ever see what it looks like. The only exception would be things like PCRE (not ICU) regexes, where - in a single-quoted string - a visually similar syntax might exist, but there are already lots of differences between what backslash-something means in a regex and what it means in a double-quoted string literal. -- Rowan Collins [IMSoP]