Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73761 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 50712 invoked from network); 22 Apr 2014 07:39:01 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 22 Apr 2014 07:39:01 -0000 Authentication-Results: pb1.pair.com header.from=lester@lsces.co.uk; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=lester@lsces.co.uk; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 217.147.176.204 cause and error) X-PHP-List-Original-Sender: lester@lsces.co.uk X-Host-Fingerprint: 217.147.176.204 mail4.serversure.net Linux 2.6 Received: from [217.147.176.204] ([217.147.176.204:37929] helo=mail4.serversure.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id F3/50-45364-49C16535 for ; Tue, 22 Apr 2014 03:39:01 -0400 Received: (qmail 5575 invoked by uid 89); 22 Apr 2014 07:38:57 -0000 Received: by simscan 1.3.1 ppid: 5569, pid: 5572, t: 0.1165s scanners: attach: 1.3.1 clamav: 0.96/m:52 Received: from unknown (HELO linux-dev4.lsces.org.uk) (lester@rainbowdigitalmedia.org.uk@81.138.11.136) by mail4.serversure.net with ESMTPA; 22 Apr 2014 07:38:57 -0000 Message-ID: <53561D4E.6000609@lsces.co.uk> Date: Tue, 22 Apr 2014 08:42:06 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0 SeaMonkey/2.24 MIME-Version: 1.0 To: "internals@lists.php.net >> PHP internals" References: <52FF3BB7.8030408@lsces.co.uk> <52FF465E.4040400@lsces.co.uk> <5355A48D.7050600@sugarcrm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] utf-8 filenames in phar files. From: lester@lsces.co.uk (Lester Caine) Yasuo Ohgaki wrote: > BTW, without NFC normalization, I sure there will be unhappy users if users use > it with > OSX and Linux/Windows. OSX decomposes Unicode and there will be the same name > path with different unicode string that appears the same on their terminal/etc > on Linux/Windows. I don't think this problem is any different to the simple conflict between upper and lower case 'normalizing' that happens currently? Each OS has it's own standards and quirks which we have to put up with. It is a simple fact that UTF-8 does NOT have a preferred standard, and everything that is valid has to be handled. This is back to the question on case insensitive comparisons, and if even that can be supported going forward. If different OS's 'normalise' a string for their own purposes can we be expected to provide different comparison rules for each? Or is it something that has to be passed back up the chain for a library to handle more generically? Phar should not 'translate' anything ... it is where these strings are used that should handle any additional processing? -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk