Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:72578 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 69816 invoked from network); 14 Feb 2014 01:49:29 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Feb 2014 01:49:29 -0000 Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.217.177 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 209.85.217.177 mail-lb0-f177.google.com Received: from [209.85.217.177] ([209.85.217.177:49791] helo=mail-lb0-f177.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id A0/31-09050-7267DF25 for ; Thu, 13 Feb 2014 20:49:28 -0500 Received: by mail-lb0-f177.google.com with SMTP id 10so7202740lbg.8 for ; Thu, 13 Feb 2014 17:49:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=xGntv9lhh6NxwH67kMTSphqDiu4VzWkzRV5SiYaGO0E=; b=G6yXV8uimtjH1mQe/TppqhkRB51hWHegaJej9z4GQG2cEsJZtfLXytaeZNs+oNhHhH RUw4lhLCiSUrav95h+dFhs/bNRLJuMdMo+Bi8T3T9av87GW/bdsqjUv9NqXkLssufciq OG9hLIP0CXEU7kYatagIqqVzmqAUDPeh8kHWxZZNcsQNLrQkOHA39ah/xSoZXMYwpGnA +a+g6AhWwmlusGGm6+ahAaUdVDzH3dhfhhDtHz85J2sbBxQBELd07NDtLYddbuDpPaqT fzrZ8LhztPZcuDMUKWmGRkO1BKlNVu7sIgF1FR/WEdLWGZNWgztZtF3dKteyFbzFf0kf xwLg== X-Received: by 10.112.114.228 with SMTP id jj4mr2900801lbb.13.1392342564929; Thu, 13 Feb 2014 17:49:24 -0800 (PST) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.112.199.37 with HTTP; Thu, 13 Feb 2014 17:48:44 -0800 (PST) In-Reply-To: References: Date: Fri, 14 Feb 2014 10:48:44 +0900 X-Google-Sender-Auth: Y7PYF9aWIWmBzJkPCs4Qvuh2NR4 Message-ID: To: Dan Ackroyd Cc: "internals@lists.php.net" Content-Type: multipart/alternative; boundary=001a1134cbfcfd06e504f2540362 Subject: Re: [PHP-DEV] utf-8 filenames in phar files. From: yohgaki@ohgaki.net (Yasuo Ohgaki) --001a1134cbfcfd06e504f2540362 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Dan, On Fri, Feb 14, 2014 at 10:00 AM, Dan Ackroyd wrote= : > I'm not sure I understand you. Do you have any example filenames that > I can test against to make sure that different filenames don't 'appear > the same'? > You can create filenames appears the same, but has different representation with NFC/NFD normalization. For instance, "=E3=81=8C=E3=81= =8E=E3=81=90=E3=81=92=E3=81=94.txt" will have different byte pattern, since NFD decomposes =E3=80=8C=E3=81=8C= =E3=80=8Dinto =E3=80=8C=E3=81=8B=E3=80=8Dand =E3=80=8C=E3=82=9B=E3=80=8D, and so on. Windows and Linux's Unicode seems to use NFC, but it is a coincidence as they only use composed form of Unicode. i.e. They don't compose intentionally. OSX decompose intentionally. Decomposed filenames will appear the same on Windows and Linux and possible to have 2 files with the same name semantically. NOTE: OSX's NFD differs from Unicode standard a little. Older subversion/git didn't take care normalization difference and created multiple filenames that appear the same when user uses both OSX and Windows/Linux. Also, the filenames used in phar files are not exposed to the > underlying system. They are held completely within the PHP phar file > and shouldn't be affected by platform. The restriction on characters > was caused by ext/phar explicitly rejecting utf-8 multibyte > characters. > I don't use phar much. It's possible use it as archive, right? https://php.net/phar.extractto I think it's great change even without normalization. It's better if normalization is taken care of. To handle normalization difference, you may apply NFC normalization on OSX. Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net --001a1134cbfcfd06e504f2540362--