Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:72619 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 21479 invoked from network); 15 Feb 2014 01:07:53 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 15 Feb 2014 01:07:53 -0000 Authentication-Results: pb1.pair.com header.from=danack@basereality.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=danack@basereality.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain basereality.com from 74.125.82.47 cause and error) X-PHP-List-Original-Sender: danack@basereality.com X-Host-Fingerprint: 74.125.82.47 mail-wg0-f47.google.com Received: from [74.125.82.47] ([74.125.82.47:59804] helo=mail-wg0-f47.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 82/B1-34645-6EDBEF25 for ; Fri, 14 Feb 2014 20:07:52 -0500 Received: by mail-wg0-f47.google.com with SMTP id k14so888508wgh.14 for ; Fri, 14 Feb 2014 17:07:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=MAqRx8wR+YVCjCCQbCM/JklUopStZHxiKq/6gziUIb8=; b=MrBS2VfF446uiYoH5CXPFBS0xWf0J7NPqryTgPCzk6f5/BfbqtrtO3gMXWVzFWRgSZ mmESCkBCQOzAV6UsHMTzf203ixJm/NF7JQaJZlOkY/vWVBlaxqWu/daQhbBuWZ00q1NW pOb0DPgnYuuhb6+NFxNPIgFd/iVq1UnzpmzurlKQ1ci3tREv87OW48UtwWb8jeAr/JQf 4ealA6vlneJ8QsVYOu95s4yOHKuZaROL3mBCGC6rkxRif76O9L5VHb+SIcAMkOmWwoRp v+2Ac6W8qjwIjGriL43G42znAEfQA0L8tmbgIf1RKDrQ/bbDQD+ylo/qn0OIBNy8h3EK d1iA== X-Gm-Message-State: ALoCoQkpGEB4uJ54CkRPDiRW3JvNKDF0Fe7JVtnSwo6BCbquJCCA1rMbFjEbltYBomRD7yT3baj1 MIME-Version: 1.0 X-Received: by 10.180.80.103 with SMTP id q7mr4565620wix.14.1392426466373; Fri, 14 Feb 2014 17:07:46 -0800 (PST) Received: by 10.216.24.195 with HTTP; Fri, 14 Feb 2014 17:07:46 -0800 (PST) X-Originating-IP: [78.147.8.68] In-Reply-To: References: Date: Sat, 15 Feb 2014 01:07:46 +0000 Message-ID: To: Yasuo Ohgaki Cc: "internals@lists.php.net" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] utf-8 filenames in phar files. From: danack@basereality.com (Dan Ackroyd) Yasuo wrote: > File names in phar may differ by system No. No they won't. They will be exactly as they are specified when they are added by the user to the Phar archive. cheers Dan On 14 February 2014 23:56, Yasuo Ohgaki wrote: > Hi Dan, > > On Sat, Feb 15, 2014 at 1:11 AM, Dan Ackroyd wro= te: >> >> That is not an issue as: >> >> i) Phar files produced on a windows machine should be identical to >> those produced on a Linux or OSX box. >> >> ii) There is a test in the phar code, so that if you do have filenames >> that are degenerate after normalising, the extraction throws an error. >> e.g. for the files >> >> $filename1 =3D "Am\xC3\xA9lie.txt"; >> $filename2 =3D "Am\x65\xCC\x81lie.txt"; >> >> If you add both to a phar archive and then attempt to extract them >> both you get the error: >> >> "Cannot extract "Am=E9lie.txt" to "output/Am=E9lie.txt", path alread= y >> exists" > > > I suppose there is no normalization code in phar, so your system(OS / fil= e > system) normalizes file name. > > Depending on system's normalization is not good. > > - File name could be NFC or NFD > - File names in phar may differ by system > - Systems that do not normalize Unicode actively exist > > I do see file name normalization issue on my Linux/Windows and OSX with g= it. > (core.precomposeunicode=3Dtrue is required for correct operation on OSX) = I > suggest to apply NFC normalization to avoid issue, like git. > > core.precomposeunicode > This option is only used by Mac OS implementation of Git. When > core.precomposeunicode=3Dtrue, Git reverts the unicode decomposition of > filenames done by Mac OS. This is useful when sharing a repository betwee= n > Mac OS and Linux or Windows. (Git for Windows 1.7.10 or higher is needed,= or > Git under cygwin 1.7). When false, file names are handled fully transpare= nt > by Git, which is backward compatible with older versions of Git. > http://git-scm.com/docs/git-config > > As Rowan pointed out, although ICU is detected by acinclude.m4 always, #i= f > should be used for ICU/intl related code. (intl uses ICU, use intl =3D us= e > ICU. I think it's better not to rely on intl. It may be disabled or can b= e > DL module. There are systems without ICU also.) > > Regards, > > -- > Yasuo Ohgaki > yohgaki@ohgaki.net