Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:72617 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 11625 invoked from network); 14 Feb 2014 23:57:21 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Feb 2014 23:57:21 -0000 Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.217.181 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 209.85.217.181 mail-lb0-f181.google.com Received: from [209.85.217.181] ([209.85.217.181:43684] helo=mail-lb0-f181.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BF/AF-34645-F5DAEF25 for ; Fri, 14 Feb 2014 18:57:20 -0500 Received: by mail-lb0-f181.google.com with SMTP id z11so8714384lbi.40 for ; Fri, 14 Feb 2014 15:57:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=WaEikkL477Wv5ssCPALB0g9sa+3ub67P84FkHwGuMMg=; b=jU6Bpf0XhJ0YiUjAWjPGz2bA2a15aZpK9hfIhiI1c7Wd+S86EzGrqVTY+0BUoSuK44 qOfH3uE5TztfRKng4CA9iVPINdGSC1hmkTEw5YVAHecxzSeTMZNidUskH41CwuMn6R1z so8croOS0T0F6T1E35hhi0cr1/sjibIdQpDZPlPi7uIV/jruNVojx3UntyI4jU3KUjyW DZ7b4UF1L1awgWVO1FAEUueaGv7C3N+2WhDp5dwby4vpHzW+J2VNL5qN/s6PfFqCfSmT xW8bdNEjfctO6F4xpApCvRZfbbWGRef76U/072+YcE9kUzydFbsnFSlBQ6k9lMA9o754 BFBg== X-Received: by 10.112.202.105 with SMTP id kh9mr19792lbc.89.1392422236554; Fri, 14 Feb 2014 15:57:16 -0800 (PST) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.112.199.37 with HTTP; Fri, 14 Feb 2014 15:56:36 -0800 (PST) In-Reply-To: References: Date: Sat, 15 Feb 2014 08:56:36 +0900 X-Google-Sender-Auth: 6_xFbXg0cKGOFbsLWTLJz4w8mQI Message-ID: To: Dan Ackroyd Cc: "internals@lists.php.net" Content-Type: multipart/alternative; boundary=001a11c36c94c9894304f2669025 Subject: Re: [PHP-DEV] utf-8 filenames in phar files. From: yohgaki@ohgaki.net (Yasuo Ohgaki) --001a11c36c94c9894304f2669025 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Dan, On Sat, Feb 15, 2014 at 1:11 AM, Dan Ackroyd wrote= : > That is not an issue as: > > i) Phar files produced on a windows machine should be identical to > those produced on a Linux or OSX box. > > ii) There is a test in the phar code, so that if you do have filenames > that are degenerate after normalising, the extraction throws an error. > e.g. for the files > > $filename1 =3D "Am\xC3\xA9lie.txt"; > $filename2 =3D "Am\x65\xCC\x81lie.txt"; > > If you add both to a phar archive and then attempt to extract them > both you get the error: > > "Cannot extract "Am=C3=A9lie.txt" to "output/Am=C3=A9lie.txt", path a= lready > exists" > I suppose there is no normalization code in phar, so your system(OS / file system) normalizes file name. Depending on system's normalization is not good. - File name could be NFC or NFD - File names in phar may differ by system - Systems that do not normalize Unicode actively exist I do see file name normalization issue on my Linux/Windows and OSX with git. (core.precomposeunicode=3Dtrue is required for correct operation on OS= X) I suggest to apply NFC normalization to avoid issue, like git. core.precomposeunicode This option is only used by Mac OS implementation of Git. When core.precomposeunicode=3Dtrue, Git reverts the unicode decomposition of filenames done by Mac OS. This is useful when sharing a repository between Mac OS and Linux or Windows. (Git for Windows 1.7.10 or higher is needed, or Git under cygwin 1.7). When false, file names are handled fully transparent by Git, which is backward compatible with older versions of Git= . http://git-scm.com/docs/git-config As Rowan pointed out, although ICU is detected by acinclude.m4 always, #if should be used for ICU/intl related code. (intl uses ICU, use intl =3D use ICU. I think it's better not to rely on intl. It may be disabled or can be DL module. There are systems without ICU also.) Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net --001a11c36c94c9894304f2669025--