Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:98723 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 77950 invoked from network); 2 Apr 2017 08:09:23 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 2 Apr 2017 08:09:23 -0000 Authentication-Results: pb1.pair.com smtp.mail=php@fleshgrinder.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php@fleshgrinder.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain fleshgrinder.com from 77.244.243.86 cause and error) X-PHP-List-Original-Sender: php@fleshgrinder.com X-Host-Fingerprint: 77.244.243.86 mx105.easyname.com Received: from [77.244.243.86] ([77.244.243.86:39867] helo=mx105.easyname.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 54/87-02743-0B1B0E85 for ; Sun, 02 Apr 2017 04:09:21 -0400 Received: from cable-81-173-132-37.netcologne.de ([81.173.132.37] helo=[192.168.178.20]) by mx.easyname.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1cuaZa-0005zE-Mx; Sun, 02 Apr 2017 08:09:20 +0000 Reply-To: internals@lists.php.net References: <187eb0be-90b9-f7cd-b8bd-888915429796@fleshgrinder.com> To: Anatol Belski , "internals@lists.php.net" , Rasmus Schultz Message-ID: Date: Sun, 2 Apr 2017 10:09:05 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="kJodm1BGtfVxgDaDh4TS4skNxgeT0ccHn" X-DNSBL-PBLSPAMHAUS: YES Subject: Re: [PHP-DEV] Directory separators on Windows From: php@fleshgrinder.com (Fleshgrinder) --kJodm1BGtfVxgDaDh4TS4skNxgeT0ccHn Content-Type: multipart/mixed; boundary="tRb514eaWw5lEs4woQQbwETtFPmECFD81"; protected-headers="v1" From: Fleshgrinder Reply-To: internals@lists.php.net To: Anatol Belski , "internals@lists.php.net" , Rasmus Schultz Message-ID: Subject: Re: [PHP-DEV] Directory separators on Windows References: <187eb0be-90b9-f7cd-b8bd-888915429796@fleshgrinder.com> In-Reply-To: --tRb514eaWw5lEs4woQQbwETtFPmECFD81 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 4/1/2017 6:15 PM, Anatol Belski wrote: > Basically, it is the same as your points 8., 9. and 10. - it deals > with the given path itself, so no symlinks, etc. In the snippet > /a/b/../c it's parsed like follows >=20 > - parse up to /a/b/../ - scroll back to /a - append the remain so it > becomes /a/c >=20 > Similar process is with /a/./b would become /a/b and others. It is > string traversing only. What is done with dirname() uses this > approach. In general one can say - normalization is a path > simplification, no drive access like realpath() does. For example, it > lets to know the path itself would be correct before it comes to > actual file operation, and not bother with I/O otherwise. >=20 Your strategy works in these examples, but the example I gave was different. Imagine that we have `/a/b/../c` which we would normalize to `/a/c`. However, the `b` component is actually a symbolic link to `x/y`. Hence, the real version of the path is `/a/x/c` and not `/a/c` as we would have normalized it to. On 4/1/2017 6:15 PM, Anatol Belski wrote: > As mentioned in an earlier post, in might make sense to have flags to > control the behavior. Maybe a signature like >=20 > string canonicalize_path(string $path, int $flags =3D 0); >=20 > The function OFC knows the current platform. Flags like > PATH_TARGET_WINDOWS | PATH_UNIXIFY would control the path separator > behaviors. Generally, regarding path without drive letter - on > Windows I'd strongely advise to not to use it in configs, etc. > because of multiple root issues mentioned already. But in principle, > say one has same FS structure on different platforms and just wants > to mirror it, that would be ok with flags like PATH_TARGET_LINUX | > PATH_STRIP_DRIVE as Linux implies forward slashes. Or otherwise, fe > the reverse case - generating a path on Linux that is to be used on > Windows, flags might contain only PATH_TARGET_WINDOWS which would > produce backslashes as system default. Maybe that's too much or > unrelated, and only platform targets should be provided, dunno, just > a mind game for now. >=20 I hope you notice how this function is exploding in complexity. I beg for classes, with clear responsibilities and small methods that do one thing. On 4/1/2017 6:15 PM, Anatol Belski wrote: > These last 3 points, as well as above one, are canonicalization. Of > course, in the imaginary function, it could be decoupled like > PATH_NO_CANONIC if it's not wanted, or PATH_CANONICALIZE_ONLY to omit > other conversions. It's only about to have the behaviors sensible. Fe > possible other flags could be PATH_STRIP_TRAILING_SLASH, > PATH_ALLOW_RELATIVE and other fine things. But by default, the > function should do the default thing for the target platform, based > on the current platform. Thus, producing NFD for Mac and NFC > otherwise, backslash for Windows and forward slash otherwise, other > thing that will for sure popup. As mentioned earlier, still this > requires some re-implementations of the platform APIs, even we'd talk > about slashes only - for ASCII paths I'm not sure we even can > differentiate the UTF-8 encoding forms without involving yet another > library, so this might be tricky. Simply exposing the part of > realpath() processing might solve several things for one given > platform, that's for sure. The initial case Rasmus reported was about > crossplatform handling, but the topic is indeed slightly bigger than > just path separators, so IMO the convenient way were to care about a > crossplatform approach. I've no info, how badly such crossplatform > path issues are indeed relevant, so it might be another story to > investigate before one starts any implementation. At least, grouping > some cases and thought, maybe as an RFC, could be good to track the > topic. >=20 I agree mostly: - We should not call it canonicalization (I used the word too), but rather normalization. The former is used in other languages and means realpath there. This could be confusing. - Leaving the stripping of the trailing separator to the user means that other users never know what the get, that is bad. The normalization should always use one strategy here. --=20 Richard "Fleshgrinder" Fussenegger --tRb514eaWw5lEs4woQQbwETtFPmECFD81-- --kJodm1BGtfVxgDaDh4TS4skNxgeT0ccHn Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJY4LGmAAoJEOKkKcqFPVVrhYwP/2cUj5DIk1YRcxfM+mvZSxQI VnDekJFoEuRXARByFqqcy8mr+Fv97X/DvchhtJlK9hsrBCtkKaJ7WSZkd3p85RLP XjAFtqltecr03vvu8A0OFN7RUG8Sqne4ahovfZ6cQd/R7yteYoHnqfME8v3HNfRF g4O9oOFvccUAJpWco4i3tzsCD7HpxEB6+nWw3MTsNWxp5e3c3O5zwI3r1p9bOS3G +t+jpk8JvoumMseLF7fnC59wp84N21s0GlwV9at0Bzs6XOVJp3BaB7CI7IC65MNk FWOwgLHpx3i4eR/RLfB4wdXoauzP2OfKqVAseZAFj2qKGyRgB6ysl9fD7U6bSi0x ZgAW4I5jM6I2kL2eyEh5xeiL55sXl8I6gECD0wlAcyk3urh/Z6bzXsVfrUG6scyk 4vWSNOgGEoHDsC/8803fFiyKh8Crf57rIj0nVtFNLjUcvUSnMCu7/yBxuTsndGIv +nXmN8cL4J8Zkk814I0iwNVRtUcg30uP3/4jusbjnfGkXcJxxYYRygWZxJN9xhQP +M3FJTIwWtZEQnwJJN/M26CiPdiVBAZTNMSH28apUw+f/IlvbS2Wgf5BJB4cmjZS GpZpkacLdGtqnYq024jiaq5ZbD5DZgjVW744pKHvEanUU/NsTn4Vw9bjFhrn3LLv ryxVllzvV8RnqDUthn7O =75HK -----END PGP SIGNATURE----- --kJodm1BGtfVxgDaDh4TS4skNxgeT0ccHn--