Today, I ran into a very hard-to-debug problem, in which paths (to SQL
files, in a database migration script) were kept in a map, persisted to a
JSON file, and this file was moved from a Windows to a Linux file-system -
because the paths on the Linux system had forward slashes, the files
appeared to be missing from the map.
Related questions are very commonly asked by Windows users, indicating that
this is a common problem:
http://stackoverflow.com/questions/14743548/php-on-windows-path-comes-up-with-backward-slash
http://stackoverflow.com/questions/5642785/php-a-good-way-to-universalize-paths-across-oss-slash-directions
http://stackoverflow.com/questions/6510468/is-there-a-way-to-force-php-on-windows-to-provide-paths-with-forward-slashes
The answers that are usually given (use DIRECTORY_SEPARATOR, use
str_replace()
etc.) is that by default you automatically get cross-platform
inconsistencies, and the workarounds end up complicating code everywhere,
and sometimes lead to other (sometimes worse) portability problems.
The problem is worsened by functions like glob()
and the SPL directory/file
traversal objects also producing inconsistent results.
Returning backslashes on Windows seems rather unnecessary in the first
place, since forward slashes work just fine?
Might I suggest changing this behavior, such that file-system paths are
consistently returned with a forward slash?
Though this is more likely to fix rather than create issues, this could be
a breaking change in some cases, so there should probably be an INI setting
that enables the old behavior.
Thoughts?
Returning backslashes on Windows seems rather unnecessary in the first
place, since forward slashes work just fine?
This may be true when using the paths within PHP, but is it true outside of it? If your JSON file had been read in by a .net application, or used to generate a DOS/NT batch file, wouldn't forward slashes there have been just as broken as backslashes on a Linux box?
Sadly, I fear this is like trying to automate line ending conversion - the more you try to avoid being platform-specific, the more awkward cases you introduce.
Regards,
--
Rowan Collins
[IMSoP]
On Thu, Mar 30, 2017 at 8:05 AM, Rowan Collins rowan.collins@gmail.com
wrote:
Returning backslashes on Windows seems rather unnecessary in the first
place, since forward slashes work just fine?This may be true when using the paths within PHP, but is it true outside
of it? If your JSON file had been read in by a .net application, or used to
generate a DOS/NT batch file, wouldn't forward slashes there have been just
as broken as backslashes on a Linux box?
In my experience, forward slashes work just fine in .NET 4.0+ (haven't ever
used less than 4.0, so I won't claim to know), PowerShell and batch files.
Command prompt deals with it just fine.
Sadly, I fear this is like trying to automate line ending conversion - the
more you try to avoid being platform-specific, the more awkward cases you
introduce.
I tend to agree. It's really not that hard to handle in the application
itself, instead of relying on the language to perform some magic. We
generally know that magic features aren't so great, so let's not go adding
more.
Regards,
--
Rowan Collins
[IMSoP]
Today, I ran into a very hard-to-debug problem, in which paths (to SQL
files, in a database migration script) were kept in a map, persisted to a
JSON file, and this file was moved from a Windows to a Linux file-system -
because the paths on the Linux system had forward slashes, the files
appeared to be missing from the map.
Related questions are very commonly asked by Windows users, indicating that
this is a common problem:
http://stackoverflow.com/questions/14743548/php-on-
windows-path-comes-up-with-backward-slash
http://stackoverflow.com/questions/5642785/php-a-good-
way-to-universalize-paths-across-oss-slash-directions
http://stackoverflow.com/questions/6510468/is-there-a-
way-to-force-php-on-windows-to-provide-paths-with-forward-slashes
The answers that are usually given (use DIRECTORY_SEPARATOR, use
str_replace()
etc.) is that by default you automatically get cross-platform
inconsistencies, and the workarounds end up complicating code everywhere,
and sometimes lead to other (sometimes worse) portability problems.
The problem is worsened by functions like glob()
and the SPL directory/file
traversal objects also producing inconsistent results.
Returning backslashes on Windows seems rather unnecessary in the first
place, since forward slashes work just fine?
Might I suggest changing this behavior, such that file-system paths are
consistently returned with a forward slash?
Though this is more likely to fix rather than create issues, this could be
a breaking change in some cases, so there should probably be an INI setting
that enables the old behavior.
Thoughts?
It is true (works) only on Windows because PHP does the conversion
transparently for you.
It will miserably fails if your json string are processed as paths with
other tools or languages not doing this magic for you.
Cheers
Pierre
My first thought is UNC paths. On windows a file server share is
denoted by \host\share . if you combine that with relative paths
produced from PHP, you end up in the dubious situation of
"\host\share/path/to/file" <--- wat?
Overall, it smells of magic.
-Sara
Today, I ran into a very hard-to-debug problem, in which paths (to SQL
files, in a database migration script) were kept in a map, persisted to a
JSON file, and this file was moved from a Windows to a Linux file-system -
because the paths on the Linux system had forward slashes, the files
appeared to be missing from the map.Related questions are very commonly asked by Windows users, indicating that
this is a common problem:http://stackoverflow.com/questions/14743548/php-on-windows-path-comes-up-with-backward-slash
http://stackoverflow.com/questions/5642785/php-a-good-way-to-universalize-paths-across-oss-slash-directions
http://stackoverflow.com/questions/6510468/is-there-a-way-to-force-php-on-windows-to-provide-paths-with-forward-slashesThe answers that are usually given (use DIRECTORY_SEPARATOR, use
str_replace()
etc.) is that by default you automatically get cross-platform
inconsistencies, and the workarounds end up complicating code everywhere,
and sometimes lead to other (sometimes worse) portability problems.The problem is worsened by functions like
glob()
and the SPL directory/file
traversal objects also producing inconsistent results.Returning backslashes on Windows seems rather unnecessary in the first
place, since forward slashes work just fine?Might I suggest changing this behavior, such that file-system paths are
consistently returned with a forward slash?Though this is more likely to fix rather than create issues, this could be
a breaking change in some cases, so there should probably be an INI setting
that enables the old behavior.Thoughts?
My first thought is UNC paths. On windows a file server share is
denoted by \host\share . if you combine that with relative paths
produced from PHP, you end up in the dubious situation of
"\host\share/path/to/file" <--- wat?Overall, it smells of magic.
-Sara
On Thu, Mar 30, 2017 at 8:25 AM, Rasmus Schultz rasmus@mindplay.dk
wrote:Today, I ran into a very hard-to-debug problem, in which paths (to SQL
files, in a database migration script) were kept in a map, persisted to
a
JSON file, and this file was moved from a Windows to a Linux
file-system -
because the paths on the Linux system had forward slashes, the files
appeared to be missing from the map.Related questions are very commonly asked by Windows users, indicating
that
this is a common problem:
http://stackoverflow.com/questions/14743548/php-on-windows-path-comes-up-with-backward-slash
The answers that are usually given (use DIRECTORY_SEPARATOR, use
str_replace()
etc.) is that by default you automatically get
cross-platform
inconsistencies, and the workarounds end up complicating code
everywhere,
and sometimes lead to other (sometimes worse) portability problems.The problem is worsened by functions like
glob()
and the SPL
directory/file
traversal objects also producing inconsistent results.Returning backslashes on Windows seems rather unnecessary in the first
place, since forward slashes work just fine?Might I suggest changing this behavior, such that file-system paths are
consistently returned with a forward slash?Though this is more likely to fix rather than create issues, this could
be
a breaking change in some cases, so there should probably be an INI
setting
that enables the old behavior.Thoughts?
--
Another option would be to create a function that converts all slashes in a
given input string to whatever the directory seperator should be on that
platform. This way, devs wouldn't have to deal with bulky aliases like
DIRECTORY_SEPERATOR cluttering up their code.
For example:
<?php
print convert_seperators( '/some\directory/' );
?>
The above would output "/some/directory" on Linux and "\some\directory" on
Windows.
--Kris
Another option would be to create a function that converts all slashes in a
given input string to whatever the directory seperator should be on that
platform. This way, devs wouldn't have to deal with bulky aliases like
DIRECTORY_SEPERATOR cluttering up their code.For example:
<?php
print convert_seperators( '/some\directory/' );
?>
The above would output "/some/directory" on Linux and "\some\directory" on
Windows.
+1
Can be used, for convert NAMESPACE to filepath in autoload )
<?php
function __autoload($path)
{
include convert_seperators($path);
}
Another option would be to create a function that converts all slashes in
a
given input string to whatever the directory seperator should be on that
platform. This way, devs wouldn't have to deal with bulky aliases like
DIRECTORY_SEPERATOR cluttering up their code.For example:
<?php
print convert_seperators( '/some\directory/' );
?>
The above would output "/some/directory" on Linux and "\some\directory" on
Windows.
+1
Can be used, for convert NAMESPACE to filepath in autoload )
<?php
function __autoload($path)
{
include convert_seperators($path);
}
On Windows, it is what realpath does.
+1
Can be used, for convert NAMESPACE to filepath in autoload )<?php
function __autoload($path)
{
include convert_seperators($path);
}On Windows, it is what realpath does.
No, realpath()
- is not used include_path
Well, this is the opposite of what I'm asking for, and does not address the
case where paths have been persisted in a file or database and the data
gets accessed from different OS.
I understand the reasons given for not changing this behavior in PHP
itself, so maybe we could have a standard function that normalizes paths to
forward slashes? e.g. basically:
/**
-
Normalize a filesystem path.
-
On windows systems, replaces backslashes with forward slashes
-
and ensures drive-letter in upper-case.
-
@param string $path
-
@return string normalized path
*/
function normalize_path( $path ) {
$path = str_replace('\', '/', $path);return $path{1} === ':'
? ucfirst($path)
: $path;
}
At least WordPress, Drupal and probably most major CMS and frameworks have
this function or something equivalent.
This function is too trivial to ship as a separate package, but at the same
time, it's too error-prone and repetitive for every framework/project to
implement (and test) for itself... In my opinion, it's common enough that
it ought to just be built-in?
My first thought is UNC paths. On windows a file server share is
denoted by \host\share . if you combine that with relative paths
produced from PHP, you end up in the dubious situation of
"\host\share/path/to/file" <--- wat?Overall, it smells of magic.
-Sara
On Thu, Mar 30, 2017 at 8:25 AM, Rasmus Schultz rasmus@mindplay.dk
wrote:Today, I ran into a very hard-to-debug problem, in which paths (to SQL
files, in a database migration script) were kept in a map, persisted
to a
JSON file, and this file was moved from a Windows to a Linux
file-system -
because the paths on the Linux system had forward slashes, the files
appeared to be missing from the map.Related questions are very commonly asked by Windows users, indicating
that
this is a common problem:http://stackoverflow.com/questions/14743548/php-on-
windows-path-comes-up-with-backward-slash
http://stackoverflow.com/questions/5642785/php-a-good-
way-to-universalize-paths-across-oss-slash-directions
http://stackoverflow.com/questions/6510468/is-there-a-
way-to-force-php-on-windows-to-provide-paths-with-forward-slashesThe answers that are usually given (use DIRECTORY_SEPARATOR, use
str_replace()
etc.) is that by default you automatically get
cross-platform
inconsistencies, and the workarounds end up complicating code
everywhere,
and sometimes lead to other (sometimes worse) portability problems.The problem is worsened by functions like
glob()
and the SPL
directory/file
traversal objects also producing inconsistent results.Returning backslashes on Windows seems rather unnecessary in the first
place, since forward slashes work just fine?Might I suggest changing this behavior, such that file-system paths are
consistently returned with a forward slash?Though this is more likely to fix rather than create issues, this
could be
a breaking change in some cases, so there should probably be an INI
setting
that enables the old behavior.Thoughts?
--
Another option would be to create a function that converts all slashes in
a given input string to whatever the directory seperator should be on that
platform. This way, devs wouldn't have to deal with bulky aliases like
DIRECTORY_SEPERATOR cluttering up their code.For example:
<?php
print convert_seperators( '/some\directory/' );
?>
The above would output "/some/directory" on Linux and "\some\directory" on
Windows.--Kris
Well, this is the opposite of what I'm asking for, and does not address the
case where paths have been persisted in a file or database and the data
gets accessed from different OS.I understand the reasons given for not changing this behavior in PHP
itself, so maybe we could have a standard function that normalizes paths to
forward slashes? e.g. basically:/**
Normalize a filesystem path.
On windows systems, replaces backslashes with forward slashes
and ensures drive-letter in upper-case.
@param string $path
@return string normalized path
*/
function normalize_path( $path ) {
$path = str_replace('\', '/', $path);return $path{1} === ':'
? ucfirst($path)
: $path;
}
Also ucfirst is useless (or any case operations). realpath goes
further down by solving ugly things like \\\\ or ////// (code
concatenating paths without checking trailing /.
At least WordPress, Drupal and probably most major CMS and frameworks have
this function or something equivalent. .
Now I remember why they have to do that.
realpath is not fully exposed in userland. virtual_file_ex should be
used and provide the option to validate path or not. Right now
realpath will fail if the path does not exist. I would suggest to
expose this functionality/option and that will solve the need to
implement such things in userland.
ps: I discussed that long time with Dmitry and forgot to implement it,
I take the blame for not having that in 7.x :)
Cheers,
Pierre
Also ucfirst is useless (or any case operations)
It's not useless, if you want a normalized path on Windows, it has to
include a drive-letter, and Windows FS isn't case-sensitive.
Right now realpath will fail if the path does not exist
I know, that's one reason I don't use it.
It kind of solves a different problem, e.g. resolves ".." and "." elements
in paths... as a rule, I don't ever use relative paths, but it would
certainly be nice to have a realpath()
that works for files that haven't
been created yet.
I don't think you can simply make realpath()
also normalize the path, as
this would be a breaking change?
I guess an improved realpath()
could be used internally as part of a
normalize_path() function, but it's not enough on it's own, since the real
path will still have platform-specific directory-separators, so a
normalize_path() function would still be useful if realpath()
gets improved.
So to summarize, a normalize_path() function should:
- Fully normalize to an absolute path with no platform-specific separators
- Have corrected case (for files/dirs that do exist.)
- Have normalized (upper-case) drive-letter on Windows
There's also network file-system paths on Windows with a different syntax
to consider? I don't know much about that...
On Fri, Mar 31, 2017 at 3:32 PM, Rasmus Schultz rasmus@mindplay.dk
wrote:Well, this is the opposite of what I'm asking for, and does not address
the
case where paths have been persisted in a file or database and the data
gets accessed from different OS.I understand the reasons given for not changing this behavior in PHP
itself, so maybe we could have a standard function that normalizes paths
to
forward slashes? e.g. basically:/**
Normalize a filesystem path.
On windows systems, replaces backslashes with forward slashes
and ensures drive-letter in upper-case.
@param string $path
@return string normalized path
*/
function normalize_path( $path ) {
$path = str_replace('\', '/', $path);return $path{1} === ':'
? ucfirst($path)
: $path;
}Also ucfirst is useless (or any case operations). realpath goes
further down by solving ugly things like \\\\ or ////// (code
concatenating paths without checking trailing /.At least WordPress, Drupal and probably most major CMS and frameworks
have
this function or something equivalent. .Now I remember why they have to do that.
realpath is not fully exposed in userland. virtual_file_ex should be
used and provide the option to validate path or not. Right now
realpath will fail if the path does not exist. I would suggest to
expose this functionality/option and that will solve the need to
implement such things in userland.ps: I discussed that long time with Dmitry and forgot to implement it,
I take the blame for not having that in 7.x :)Cheers,
Pierre
So to summarize, a normalize_path() function should:
- Fully normalize to an absolute path with no platform-specific separators
- Have corrected case (for files/dirs that do exist.)
- Have normalized (upper-case) drive-letter on Windows
There's also network file-system paths on Windows with a different syntax
to consider? I don't know much about that...
- cannot be guaranteed by a normalization function, because the parts
the dots point to might not exist. Resolving them without knowing if we
are dealing with a symbolic or hard link is impossible.
UNC paths work the same as normal paths, the only difference is their
prefix (e.g. \\ComputerName\
), in other words, they can be treated
like a schemeless URL.
Verbatim paths are not supported by PHP anyways, hence, they can be ignored.
--
Richard "Fleshgrinder" Fussenegger
Hi,
-----Original Message-----
From: Rasmus Schultz [mailto:rasmus@mindplay.dk]
Sent: Saturday, April 1, 2017 11:13 AM
To: Pierre Joye pierre.php@gmail.com
Cc: Kris Craig kris.craig@gmail.com; Sara Golemon pollita@php.net; PHP
internals internals@lists.php.net
Subject: Re: [PHP-DEV] Directory separators on WindowsAlso ucfirst is useless (or any case operations)
It's not useless, if you want a normalized path on Windows, it has to include a
drive-letter, and Windows FS isn't case-sensitive.Right now realpath will fail if the path does not exist
I know, that's one reason I don't use it.
It kind of solves a different problem, e.g. resolves ".." and "." elements in
paths... as a rule, I don't ever use relative paths, but it would certainly be nice to
have arealpath()
that works for files that haven't been created yet.I don't think you can simply make
realpath()
also normalize the path, as this
would be a breaking change?I guess an improved
realpath()
could be used internally as part of a
normalize_path() function, but it's not enough on it's own, since the real path
will still have platform-specific directory-separators, so a
normalize_path() function would still be useful ifrealpath()
gets improved.So to summarize, a normalize_path() function should:
- Fully normalize to an absolute path with no platform-specific separators 2.
Have corrected case (for files/dirs that do exist.) 3. Have normalized (upper-
case) drive-letter on Windows
- optionally - yes, otherwise it should do platform default
- no, this kind of operation is a pure parsing, no I/O related checks needed
- irrelevant, but can be defined
Other points yet I'd care about
- result should be correct for target platform disregarding actual platform, fe target Linux path Windows, or Windows path on Mac, etc.
- validation, particularly for reserved words and chars, also other platform aspects
- encodings have to be respected, or UTF-8 only, to define
- probably should be compatible with PHP stream wrapper namespaces
Thanks
Anatol
There's also network file-system paths on Windows with a different syntax to
consider? I don't know much about that...On Fri, Mar 31, 2017 at 3:32 PM, Rasmus Schultz rasmus@mindplay.dk
wrote:Well, this is the opposite of what I'm asking for, and does not
address
the
case where paths have been persisted in a file or database and the
data gets accessed from different OS.I understand the reasons given for not changing this behavior in PHP
itself, so maybe we could have a standard function that normalizes
paths
to
forward slashes? e.g. basically:/**
Normalize a filesystem path.
On windows systems, replaces backslashes with forward slashes
and ensures drive-letter in upper-case.
@param string $path
@return string normalized path
*/
function normalize_path( $path ) {
$path = str_replace('\', '/', $path);return $path{1} === ':'
? ucfirst($path)
: $path;
}Also ucfirst is useless (or any case operations). realpath goes
further down by solving ugly things like \\\\ or ////// (code
concatenating paths without checking trailing /.At least WordPress, Drupal and probably most major CMS and
frameworks
have
this function or something equivalent. .Now I remember why they have to do that.
realpath is not fully exposed in userland. virtual_file_ex should be
used and provide the option to validate path or not. Right now
realpath will fail if the path does not exist. I would suggest to
expose this functionality/option and that will solve the need to
implement such things in userland.ps: I discussed that long time with Dmitry and forgot to implement it,
I take the blame for not having that in 7.x :)Cheers,
Pierre
- optionally - yes, otherwise it should do platform default
- no, this kind of operation is a pure parsing, no I/O related checks needed
- irrelevant, but can be defined
Other points yet I'd care about
- result should be correct for target platform disregarding actual platform, fe target Linux path Windows, or Windows path on Mac, etc.
- validation, particularly for reserved words and chars, also other platform aspects
- encodings have to be respected, or UTF-8 only, to define
- probably should be compatible with PHP stream wrapper namespaces
Thanks
Anatol
-
How do you envision that? If the path is
/a/b/../c
where only/a
exists right now? It's unresolvable, assuming that../
points to/a
is wrong ifb/
is a symbolic link that points to/x/y
. -
Here I agree, casing cannot be decided without hitting the
filesystem. Some are case-sensitive, some insensitive, and others
configurable. -
Does not matter for Windows itself, it is case-insensitive.
(I continue the numbering for the points you raised.)
-
How would we go about normalizing a Windows path to POSIX?
C:\a
is
not necessarily the same as/a
, or should it produceC:/a
? -
?
-
I vote for UTF-8 only. We already have locale dependent filesystem
functions, which also makes them kind of weird to use, especially in
libraries. Another very important aspect to take care of this point is
normalization forms. Filesystems generally store stuff as is, that means
that we can create to files with the same name, at least by the looks of
it, which are actually different ones. Think ofä
which can also be
ä
. It is generally most advisable to stick to NFC, because that is
also how users usually produce those chars. -
? just forward I'd say.
-
Collapse multiple separators (e.g.
a//b
~>a/b
). -
Resolve self-references, unless they are leading (e.g.
a/./b
~>
a/b
but./a/b
stays./a/b
). -
Trim separators from the end (e.g.
a/
~>a
).
--
Richard "Fleshgrinder" Fussenegger
10 thumbs up ;-)
But this really demonstrates how badly we need this function - I bet any
number of those points may or may not be covered by any number of
implementations in the wild.
It would be so nice to have this done "right", once and for all.
- optionally - yes, otherwise it should do platform default
- no, this kind of operation is a pure parsing, no I/O related checks
needed- irrelevant, but can be defined
Other points yet I'd care about
- result should be correct for target platform disregarding actual
platform, fe target Linux path Windows, or Windows path on Mac, etc.- validation, particularly for reserved words and chars, also other
platform aspects- encodings have to be respected, or UTF-8 only, to define
- probably should be compatible with PHP stream wrapper namespaces
Thanks
Anatol
How do you envision that? If the path is
/a/b/../c
where only/a
exists right now? It's unresolvable, assuming that../
points to/a
is wrong ifb/
is a symbolic link that points to/x/y
.Here I agree, casing cannot be decided without hitting the
filesystem. Some are case-sensitive, some insensitive, and others
configurable.Does not matter for Windows itself, it is case-insensitive.
(I continue the numbering for the points you raised.)
How would we go about normalizing a Windows path to POSIX?
C:\a
is
not necessarily the same as/a
, or should it produceC:/a
??
I vote for UTF-8 only. We already have locale dependent filesystem
functions, which also makes them kind of weird to use, especially in
libraries. Another very important aspect to take care of this point is
normalization forms. Filesystems generally store stuff as is, that means
that we can create to files with the same name, at least by the looks of
it, which are actually different ones. Think ofä
which can also be
ä
. It is generally most advisable to stick to NFC, because that is
also how users usually produce those chars.? just forward I'd say.
Collapse multiple separators (e.g.
a//b
~>a/b
).Resolve self-references, unless they are leading (e.g.
a/./b
~>
a/b
but./a/b
stays./a/b
).Trim separators from the end (e.g.
a/
~>a
).--
Richard "Fleshgrinder" Fussenegger
-----Original Message-----
From: Fleshgrinder [mailto:php@fleshgrinder.com]
Sent: Saturday, April 1, 2017 2:43 PM
To: Anatol Belski weltling@outlook.de; Rasmus Schultz
rasmus@mindplay.dk
Cc: PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] Directory separators on Windows
- optionally - yes, otherwise it should do platform default 2. no,
this kind of operation is a pure parsing, no I/O related checks needed- irrelevant, but can be defined
Other points yet I'd care about
- result should be correct for target platform disregarding actual platform, fe
target Linux path Windows, or Windows path on Mac, etc.- validation, particularly for reserved words and chars, also other
platform aspects- encodings have to be respected, or UTF-8 only, to define
- probably should be compatible with PHP stream wrapper namespaces
Thanks
Anatol
How do you envision that? If the path is
/a/b/../c
where only/a
exists right
now? It's unresolvable, assuming that../
points to/a
is wrong ifb/
is a
symbolic link that points to/x/y
.Here I agree, casing cannot be decided without hitting the filesystem. Some
are case-sensitive, some insensitive, and others configurable.
Basically, it is the same as your points 8., 9. and 10. - it deals with the given path itself, so no symlinks, etc. In the snippet /a/b/../c it's parsed like follows
- parse up to /a/b/../
- scroll back to /a
- append the remain so it becomes /a/c
Similar process is with /a/./b would become /a/b and others. It is string traversing only. What is done with dirname()
uses this approach. In general one can say - normalization is a path simplification, no drive access like realpath()
does. For example, it lets to know the path itself would be correct before it comes to actual file operation, and not bother with I/O otherwise.
- Does not matter for Windows itself, it is case-insensitive.
(I continue the numbering for the points you raised.)
- How would we go about normalizing a Windows path to POSIX?
C:\a
is not
necessarily the same as/a
, or should it produceC:/a
?
As mentioned in an earlier post, in might make sense to have flags to control the behavior. Maybe a signature like
string canonicalize_path(string $path, int $flags = 0);
The function OFC knows the current platform. Flags like PATH_TARGET_WINDOWS | PATH_UNIXIFY would control the path separator behaviors. Generally, regarding path without drive letter - on Windows I'd strongely advise to not to use it in configs, etc. because of multiple root issues mentioned already. But in principle, say one has same FS structure on different platforms and just wants to mirror it, that would be ok with flags like PATH_TARGET_LINUX | PATH_STRIP_DRIVE as Linux implies forward slashes. Or otherwise, fe the reverse case - generating a path on Linux that is to be used on Windows, flags might contain only PATH_TARGET_WINDOWS which would produce backslashes as system default. Maybe that's too much or unrelated, and only platform targets should be provided, dunno, just a mind game for now.
?
I vote for UTF-8 only. We already have locale dependent filesystem functions,
which also makes them kind of weird to use, especially in libraries. Another very
important aspect to take care of this point is normalization forms. Filesystems
generally store stuff as is, that means that we can create to files with the same
name, at least by the looks of it, which are actually different ones. Think ofä
which can also beä
. It is generally most advisable to stick to NFC, because that
is also how users usually produce those chars.
Yeah, probably UTF-8 were the simplest for the cross platform implementation. Regarding the encoding variant - that's where more care would be needed. Fe see https://github.com/aws/aws-cli/issues/1639 , that's where we would care about PATH_TARGET_MAC specific things. Comparable, fe the situation, where you want to escapeshell* something, but it'll be invalid on another platform or possibly with another shell, how it currently works.
? just forward I'd say.
Collapse multiple separators (e.g.
a//b
~>a/b
).Resolve self-references, unless they are leading (e.g.
a/./b
~>a/b
but
./a/b
stays./a/b
).Trim separators from the end (e.g.
a/
~>a
).
These last 3 points, as well as above one, are canonicalization. Of course, in the imaginary function, it could be decoupled like PATH_NO_CANONIC if it's not wanted, or PATH_CANONICALIZE_ONLY to omit other conversions. It's only about to have the behaviors sensible. Fe possible other flags could be PATH_STRIP_TRAILING_SLASH, PATH_ALLOW_RELATIVE and other fine things. But by default, the function should do the default thing for the target platform, based on the current platform. Thus, producing NFD for Mac and NFC otherwise, backslash for Windows and forward slash otherwise, other thing that will for sure popup. As mentioned earlier, still this requires some re-implementations of the platform APIs, even we'd talk about slashes only - for ASCII paths I'm not sure we even can differentiate the UTF-8 encoding forms without involving yet another library, so this might be tricky. Simply exposing the part of realpath()
processing might solve several things for one given platform, that's for sure. The initial case Rasmus reported was about crossplatform handling, but the topic is indeed slightly bigger than just path separators, so IMO the convenient way were to care about a crossplatform approach. I've no info, how badly such crossplatform path issues are indeed relevant, so it might be another story to investigate before one starts any implementation. At least, grouping some cases and thought, maybe as an RFC, could be good to track the topic.
Thanks
Anatol
Basically, it is the same as your points 8., 9. and 10. - it deals
with the given path itself, so no symlinks, etc. In the snippet
/a/b/../c it's parsed like follows
- parse up to /a/b/../ - scroll back to /a - append the remain so it
becomes /a/cSimilar process is with /a/./b would become /a/b and others. It is
string traversing only. What is done withdirname()
uses this
approach. In general one can say - normalization is a path
simplification, no drive access likerealpath()
does. For example, it
lets to know the path itself would be correct before it comes to
actual file operation, and not bother with I/O otherwise.
Your strategy works in these examples, but the example I gave was
different. Imagine that we have /a/b/../c
which we would normalize to
/a/c
. However, the b
component is actually a symbolic link to x/y
.
Hence, the real version of the path is /a/x/c
and not /a/c
as we
would have normalized it to.
As mentioned in an earlier post, in might make sense to have flags to
control the behavior. Maybe a signature likestring canonicalize_path(string $path, int $flags = 0);
The function OFC knows the current platform. Flags like
PATH_TARGET_WINDOWS | PATH_UNIXIFY would control the path separator
behaviors. Generally, regarding path without drive letter - on
Windows I'd strongely advise to not to use it in configs, etc.
because of multiple root issues mentioned already. But in principle,
say one has same FS structure on different platforms and just wants
to mirror it, that would be ok with flags like PATH_TARGET_LINUX |
PATH_STRIP_DRIVE as Linux implies forward slashes. Or otherwise, fe
the reverse case - generating a path on Linux that is to be used on
Windows, flags might contain only PATH_TARGET_WINDOWS which would
produce backslashes as system default. Maybe that's too much or
unrelated, and only platform targets should be provided, dunno, just
a mind game for now.
I hope you notice how this function is exploding in complexity. I beg
for classes, with clear responsibilities and small methods that do one
thing.
These last 3 points, as well as above one, are canonicalization. Of
course, in the imaginary function, it could be decoupled like
PATH_NO_CANONIC if it's not wanted, or PATH_CANONICALIZE_ONLY to omit
other conversions. It's only about to have the behaviors sensible. Fe
possible other flags could be PATH_STRIP_TRAILING_SLASH,
PATH_ALLOW_RELATIVE and other fine things. But by default, the
function should do the default thing for the target platform, based
on the current platform. Thus, producing NFD for Mac and NFC
otherwise, backslash for Windows and forward slash otherwise, other
thing that will for sure popup. As mentioned earlier, still this
requires some re-implementations of the platform APIs, even we'd talk
about slashes only - for ASCII paths I'm not sure we even can
differentiate the UTF-8 encoding forms without involving yet another
library, so this might be tricky. Simply exposing the part of
realpath()
processing might solve several things for one given
platform, that's for sure. The initial case Rasmus reported was about
crossplatform handling, but the topic is indeed slightly bigger than
just path separators, so IMO the convenient way were to care about a
crossplatform approach. I've no info, how badly such crossplatform
path issues are indeed relevant, so it might be another story to
investigate before one starts any implementation. At least, grouping
some cases and thought, maybe as an RFC, could be good to track the
topic.
I agree mostly:
- We should not call it canonicalization (I used the word too), but
rather normalization. The former is used in other languages and means
realpath there. This could be confusing. - Leaving the stripping of the trailing separator to the user means that
other users never know what the get, that is bad. The normalization
should always use one strategy here.
--
Richard "Fleshgrinder" Fussenegger
Your strategy works in these examples, but the example I gave was
different. Imagine that we have/a/b/../c
which we would normalize to
/a/c
. However, theb
component is actually a symbolic link tox/y
.
Hence, the real version of the path is/a/x/c
and not/a/c
as we
would have normalized it to.
Both strategies are equally valid, as long as you know which is in use.
There are many common tools outside PHP which use both approaches, and
situations where you might actually want the string-based approach, even
if filesystem access is available.
See for instance this discussion of pwd:
http://unix.stackexchange.com/q/331208/70530 In summary, POSIX specifies
"-L" (logical) which uses $PWD as set by the shell as you navigate, and
"-P" (physical) which resolves backwards through the ".." links in the
file system.
The same is true for other operations - for instance, the below demo in
bash shows one interpretation in "ls" and the other in "cd".
/tmp/demo$ ls -lR
.:
drwxr-xr-x 2 vagrant vagrant 4096 Apr 2 18:21 foo
drwxr-xr-x 3 vagrant vagrant 4096 Apr 2 18:05 other
./foo:
lrwxrwxrwx 1 vagrant vagrant 21 Apr 2 18:21 bar -> /tmp/demo/other/thing
./other:
drwxr-xr-x 2 vagrant vagrant 4096 Apr 2 18:06 thing
/tmp/demo$ ls foo/bar/..
thing
/tmp/demo$ cd foo/bar/..
/tmp/demo/foo$ ls
bar
Regards,
--
Rowan Collins
[IMSoP]
Your strategy works in these examples, but the example I gave was
different. Imagine that we have/a/b/../c
which we would normalize to
/a/c
. However, theb
component is actually a symbolic link tox/y
.
Hence, the real version of the path is/a/x/c
and not/a/c
as we
would have normalized it to.Both strategies are equally valid, as long as you know which is in use.
There are many common tools outside PHP which use both approaches, and
situations where you might actually want the string-based approach, even
if filesystem access is available.See for instance this discussion of pwd:
http://unix.stackexchange.com/q/331208/70530 In summary, POSIX specifies
"-L" (logical) which uses $PWD as set by the shell as you navigate, and
"-P" (physical) which resolves backwards through the ".." links in the
file system.The same is true for other operations - for instance, the below demo in
bash shows one interpretation in "ls" and the other in "cd"./tmp/demo$ ls -lR
.:
drwxr-xr-x 2 vagrant vagrant 4096 Apr 2 18:21 foo
drwxr-xr-x 3 vagrant vagrant 4096 Apr 2 18:05 other./foo:
lrwxrwxrwx 1 vagrant vagrant 21 Apr 2 18:21 bar -> /tmp/demo/other/thing./other:
drwxr-xr-x 2 vagrant vagrant 4096 Apr 2 18:06 thing/tmp/demo$ ls foo/bar/..
thing/tmp/demo$ cd foo/bar/..
/tmp/demo/foo$ ls
barRegards,
I get your point, and I have to agree here.
normalize_path
/Path::normalize
would be the counterpart to
realpath
/Path::canonicalize
.
?
--
Richard "Fleshgrinder" Fussenegger
My first thought is UNC paths. On windows a file server share is
denoted by \host\share . if you combine that with relative paths
produced from PHP, you end up in the dubious situation of
"\host\share/path/to/file" <--- wat?Overall, it smells of magic.
-Sara
On Thu, Mar 30, 2017 at 8:25 AM, Rasmus Schultz rasmus@mindplay.dk
wrote:Today, I ran into a very hard-to-debug problem, in which paths (to SQL
files, in a database migration script) were kept in a map, persisted to a
JSON file, and this file was moved from a Windows to a Linux file-system
because the paths on the Linux system had forward slashes, the files
appeared to be missing from the map.Related questions are very commonly asked by Windows users, indicating
that
this is a common problem:http://stackoverflow.com/questions/14743548/php-on-
windows-path-comes-up-with-backward-slash
http://stackoverflow.com/questions/5642785/php-a-good-
way-to-universalize-paths-across-oss-slash-directions
http://stackoverflow.com/questions/6510468/is-there-a-
way-to-force-php-on-windows-to-provide-paths-with-forward-slashesThe answers that are usually given (use DIRECTORY_SEPARATOR, use
str_replace()
etc.) is that by default you automatically get
cross-platform
inconsistencies, and the workarounds end up complicating code everywhere,
and sometimes lead to other (sometimes worse) portability problems.The problem is worsened by functions like
glob()
and the SPL
directory/file
traversal objects also producing inconsistent results.Returning backslashes on Windows seems rather unnecessary in the first
place, since forward slashes work just fine?Might I suggest changing this behavior, such that file-system paths are
consistently returned with a forward slash?Though this is more likely to fix rather than create issues, this could
be
a breaking change in some cases, so there should probably be an INI
setting
that enables the old behavior.Thoughts?
--
UNC pathing also works with forward slashes. For example, in powershell the
following is valid and works if your host is named UNC1 and you have admin
rights to the server.
//UNC1/C$/
--
The greatest dangers to liberty lurk in insidious encroachment by men of
zeal, well-meaning but without understanding. -- Justice Louis D. Brandeis
Thoughts?
Windows and paths is a complicated and lengthy story.
TL;DR all versions of Windows are able to deal with slashes, and we
could easily use slashes everywhere all the time.
History
The story why Windows is using the backslash might be of interest, read:
http://blogs.msdn.com/b/larryosterman/archive/2005/06/24/432386.aspx
This also explains that Windows IS supporting forward slashes since at
least the 1990s. However, there are programs that have significant
problems with it, but usually those are old or otherwise shitty programs.
There are various ways paths can be represented in Windows, the so
called path variants. There are 7 in total:
- Root
- Disk
- UNC
- Device Namespace
- Verbatim Disk
- Verbatim UNC
- Verbatim Device Namespace
Root
This works just line on Unix an can be either \
or /
. It always
refers to the root directory of the current drive.
Home
PowerShell also supports the home short-hand ~
like Unix systems,
however, cmd.exe
does not.
Disk
This is the one we all know. The drive letter comes first, followed by a
colon :
, and then continues with the actual path.
C:\Folder\Resource
C:/Folder/Resource
UNC
Is short for Universal Naming Convention or Uniform Naming
Convention allows one to refer to network paths or server shares.
\\ComputerName\SharedFolder\Resource
//ComputerName/SharedFolder/Resource
It also has an extended form for web resource:
\\CompuserName[@SSL][@Port]\SharedFolder\Resource
Device Namespace
This allows one to directly address special devices, or again the disks
themselves.
\\.\Device\Resource
//./Device/Resource
Verbatim *
The verbatim paths work exactly the same way as the respective normal
counterpart, the difference is that the slash to backslash conversion
does NOT happen auto-magically:
\\?\C:\Folder\Resource
\\?\Server\Share
\\?\UNC\Server\Share
https://en.wikipedia.org/wiki/Path_(computing)
I highly recommend you to have a look at Rust's path implementation, as
it takes care of all these things in a very intelligent manner. It is
also capable of dealing with all variants of paths in Windows, unlike
PHP which only supports a few:
https://doc.rust-lang.org/std/path/index.html
--
Richard "Fleshgrinder" Fussenegger
Hi,
-----Original Message-----
From: Fleshgrinder [mailto:php@fleshgrinder.com]
Sent: Thursday, March 30, 2017 8:05 PM
To: Rasmus Schultz rasmus@mindplay.dk; PHP internals
internals@lists.php.net
Subject: Re: [PHP-DEV] Directory separators on WindowsThoughts?
Windows and paths is a complicated and lengthy story.
TL;DR all versions of Windows are able to deal with slashes, and we could easily
use slashes everywhere all the time.History
The story why Windows is using the backslash might be of interest, read:
http://blogs.msdn.com/b/larryosterman/archive/2005/06/24/432386.aspx
This also explains that Windows IS supporting forward slashes since at least the
1990s. However, there are programs that have significant problems with it, but
usually those are old or otherwise shitty programs.There are various ways paths can be represented in Windows, the so called path
variants. There are 7 in total:
- Root
- Disk
- UNC
- Device Namespace
- Verbatim Disk
- Verbatim UNC
- Verbatim Device Namespace
Root
This works just line on Unix an can be either
\
or/
. It always refers to the root
directory of the current drive.Home
PowerShell also supports the home short-hand
~
like Unix systems, however,
cmd.exe
does not.Disk
This is the one we all know. The drive letter comes first, followed by a colon
:
,
and then continues with the actual path.
C:\Folder\Resource
C:/Folder/Resource
UNC
Is short for Universal Naming Convention or Uniform Naming
Convention allows one to refer to network paths or server shares.
\\ComputerName\SharedFolder\Resource
//ComputerName/SharedFolder/Resource
It also has an extended form for web resource:
\\CompuserName[@SSL][@Port]\SharedFolder\Resource
Device Namespace
This allows one to directly address special devices, or again the disks
themselves.
\\.\Device\Resource
//./Device/Resource
Verbatim *
The verbatim paths work exactly the same way as the respective normal
counterpart, the difference is that the slash to backslash conversion does NOT
happen auto-magically:
\\?\C:\Folder\Resource
\\?\Server\Share
\\?\UNC\Server\Share
https://en.wikipedia.org/wiki/Path_(computing)
I highly recommend you to have a look at Rust's path implementation, as it takes
care of all these things in a very intelligent manner. It is also capable of dealing
with all variants of paths in Windows, unlike PHP which only supports a few:
Regarding the path variants support - it's not quite that way. PHP streams abstract many things, for both simplicity and security. The current state has historically grown on these two factors. So far I can tell, the only what we don't support is a drive relative path and don't handle several irrelevant prefixes like device UID.
While in general the info above is correct, things still stay platform dependent in many cases, while supported in PHP, too. Fe using "/" to access drive root ofc works, but might be surprisingly wrong if CWD is changed to another drive. Well, that's the platform nuance, with DOS one can have multiple roots. In other cases, like UNC, links or lately the long path prefix, the handling with PHP streams is completely transparent to the consuming script.
A given case with a generated file is clearly the app responsibility. It is likely, that generated files moved between systems can cause arbitrary issues disregarding the actual platform. The mentioned case belongs to the same group, where I'd say there is no and cannot be a plausible general "fix". In addition to the EOL example by Rowan, another one of same could be escapeshell* functions. Taking in account also
- backward compatibility
- platform specific
- compatibility with dependency libs, especially where it's impossible to integrate PHP streams
- absence of the cross platform specifications, which is IMO the most of issue
Even if we'd abstract ourselves from the initial app responsibility case - there are the portability nuances that are not simply to clear away by just renaming 'a' to 'b'.
Regards
Anatol
Regarding the path variants support - it's not quite that way. PHP
streams abstract many things, for both simplicity and security. The
current state has historically grown on these two factors. So far I
can tell, the only what we don't support is a drive relative path and
don't handle several irrelevant prefixes like device UID.While in general the info above is correct, things still stay
platform dependent in many cases, while supported in PHP, too. Fe
using "/" to access drive root ofc works, but might be surprisingly
wrong if CWD is changed to another drive. Well, that's the platform
nuance, with DOS one can have multiple roots. In other cases, like
UNC, links or lately the long path prefix, the handling with PHP
streams is completely transparent to the consuming script.A given case with a generated file is clearly the app responsibility.
It is likely, that generated files moved between systems can cause
arbitrary issues disregarding the actual platform. The mentioned case
belongs to the same group, where I'd say there is no and cannot be a
plausible general "fix". In addition to the EOL example by Rowan,
another one of same could be escapeshell* functions. Taking in
account also
- backward compatibility - platform specific - compatibility with
dependency libs, especially where it's impossible to integrate PHP
streams - absence of the cross platform specifications, which is IMO
the most of issueEven if we'd abstract ourselves from the initial app responsibility
case - there are the portability nuances that are not simply to clear
away by just renaming 'a' to 'b'.Regards
Anatol
Slow with the horses, we were only talking about backslash vs. slash,
not anything else. I only explained the various paths that are available
on Windows.
We could use slashes everywhere, because every platform that is still in
existence supports it. That's about it, we cannot do much more, well,
maybe some normalization (e.g. self-references like a/./b
to a/b
, or
removing multiple slashes a//b
to a/b
). That's about it. Any other
cross-platform issues are not solvable, and must be handled by applications.
A proper path abstraction would be awesome. Of course I would prefer an
object for it, but offering a path_canonicalize
function as well for
starters is good too.
--
Richard "Fleshgrinder" Fussenegger
-----Original Message-----
From: Fleshgrinder [mailto:php@fleshgrinder.com]
Sent: Friday, March 31, 2017 6:29 PM
To: Anatol Belski ab@php.net; internals@lists.php.net; Rasmus Schultz
rasmus@mindplay.dk
Subject: Re: [PHP-DEV] Directory separators on WindowsRegarding the path variants support - it's not quite that way. PHP
streams abstract many things, for both simplicity and security. The
current state has historically grown on these two factors. So far I
can tell, the only what we don't support is a drive relative path and
don't handle several irrelevant prefixes like device UID.While in general the info above is correct, things still stay platform
dependent in many cases, while supported in PHP, too. Fe using "/" to
access drive root ofc works, but might be surprisingly wrong if CWD is
changed to another drive. Well, that's the platform nuance, with DOS
one can have multiple roots. In other cases, like UNC, links or
lately the long path prefix, the handling with PHP streams is
completely transparent to the consuming script.A given case with a generated file is clearly the app responsibility.
It is likely, that generated files moved between systems can cause
arbitrary issues disregarding the actual platform. The mentioned case
belongs to the same group, where I'd say there is no and cannot be a
plausible general "fix". In addition to the EOL example by Rowan,
another one of same could be escapeshell* functions. Taking in account
also
- backward compatibility - platform specific - compatibility with
dependency libs, especially where it's impossible to integrate PHP
streams - absence of the cross platform specifications, which is IMO
the most of issueEven if we'd abstract ourselves from the initial app responsibility
case - there are the portability nuances that are not simply to clear
away by just renaming 'a' to 'b'.Regards
Anatol
Slow with the horses, we were only talking about backslash vs. slash, not
anything else. I only explained the various paths that are available on Windows.
Well, there was slightly more in your msg, thus the response ?
We could use slashes everywhere, because every platform that is still in
existence supports it. That's about it, we cannot do much more, well, maybe
some normalization (e.g. self-references likea/./b
toa/b
, or removing
multiple slashesa//b
toa/b
). That's about it. Any other cross-platform issues
are not solvable, and must be handled by applications.
Path normalization and forward slash everywhere are two different things. Having forward slash just because it is supported - nope, it's more an issue and should not be done. The path can be used everywhere - in the script itself, passed to external prog, written into a file, etc. The suggested "always forward slash" will cause endless conversion back and forth, in both user space and internally. Please check the 7.1 related parts, or even earlier versions, we already have to do some conversions because of these and similar matters, doing yet more while introducing breakages for existing software doesn't sound necessary. Any individual case in the given app is what matters.
A proper path abstraction would be awesome. Of course I would prefer an
object for it, but offering apath_canonicalize
function as well for starters is
good too.
Yep, a function to normalize path were doable. But again, the current implementations are platform dependent and use platform APIs. Such a function might need a re-implementations of those APIs, to produce results platform independently, that are valid on the target platform. Otherwise, more generalization doesn't look like having a base in absence of a consistent specs, at least I haven't seen any. Well, until someone takes it in the hand and files a draft to IETF ?
Regards
Anatol
Hey :)
Well, there was slightly more in your msg, thus the response ?
Not really:
Windows and paths is a complicated and lengthy story.
TL;DR all versions of Windows are able to deal with slashes, and we
could easily use slashes everywhere all the time.
The rest was under the heading "History".
Path normalization and forward slash everywhere are two different
things. Having forward slash just because it is supported - nope,
it's more an issue and should not be done. The path can be used
everywhere - in the script itself, passed to external prog, written
into a file, etc. The suggested "always forward slash" will cause
endless conversion back and forth, in both user space and internally.
Please check the 7.1 related parts, or even earlier versions, we
already have to do some conversions because of these and similar
matters, doing yet more while introducing breakages for existing
software doesn't sound necessary. Any individual case in the given
app is what matters.
$ php71 -a
php > echo dirname('C:\Folder/Resource\Resource');
C:\Folder/Resource
hmmm... just one example, this is what this whole discussion is about.
We are already super inconsistent. It seems as if this is not producing
any issues with PHP itself, as well as at least every extension I ever
interacted with.
Of course things are very different when it is about outputting paths
and forwarding them to other programs, which might be super shitty. (I
look at you protoc from Google, grrr.) However, that is something
where realpath
/path_canonicalize
/path_normalize
would come into
play, and something I would leave to the applications. Choosing the
right situation where the path requires those actions is impossible.
We could also consistently convert paths to their native form. Hence,
above example would result in C:\Folder\Resource
, or even
\\?\C:\Folder\Resource
(verbatim path, no further fiddling allowed).
Yep, a function to normalize path were doable. But again, the current
implementations are platform dependent and use platform APIs. Such a
function might need a re-implementations of those APIs, to produce
results platform independently, that are valid on the target
platform. Otherwise, more generalization doesn't look like having a
base in absence of a consistent specs, at least I haven't seen any.
Well, until someone takes it in the hand and files a draft to IETF
?Regards
Anatol
Both POSIX and Windows paths are well documented. However, it's not an
easy topic, that is for sure, and using slashes everywhere might be more
destructive than I anticipate.
--
Richard "Fleshgrinder" Fussenegger
-----Original Message-----
From: Fleshgrinder [mailto:php@fleshgrinder.com]
Sent: Friday, March 31, 2017 8:32 PM
To: Anatol Belski ab@php.net; internals@lists.php.net; Rasmus Schultz
rasmus@mindplay.dk
Subject: Re: [PHP-DEV] Directory separators on Windows$ php71 -a php > echo dirname('C:\Folder/Resource\Resource'); C:\Folder/Resource
hmmm... just one example, this is what this whole discussion is about.
We are already super inconsistent. It seems as if this is not producing any issues
with PHP itself, as well as at least every extension I ever interacted with.
I can only link to this ?
http://git.php.net/?p=php-src.git;a=commitdiff;h=ec78507bd46a05f77dbde3fa4091ab4c91e61cad
the new implementation was consistent but had to be reverted in 7.1 partially, because of BC, even the use is inappropriate. Well, still normalization on Windows means having '\' in terms of the platform API used, but just as a show case. The dirname function itself is based on the PHP implementation, not a platform API. But also, it would produce same path with different separators on different platform, if normalized.
Of course things are very different when it is about outputting paths and
forwarding them to other programs, which might be super shitty. (I look at you
protoc from Google, grrr.) However, that is something where
realpath
/path_canonicalize
/path_normalize
would come into play, and
something I would leave to the applications. Choosing the right situation where
the path requires those actions is impossible.We could also consistently convert paths to their native form. Hence, above
example would result inC:\Folder\Resource
, or even\\?\C:\Folder\Resource
(verbatim path, no further fiddling allowed).Both POSIX and Windows paths are well documented. However, it's not an easy
topic, that is for sure, and using slashes everywhere might be more destructive
than I anticipate.
You're right, they both are documented. What is not defined is the cross platform handling. There are some documents, yes, like RFC 3986, or RFC 1738 and RFC 8089 which are still in the proposed state. However there is none I knew that would care about crossplatform nuances in full extent. Particularly an RFC defining all the possible behaviors of the file:// scheme is what were needed, I guess. Thus my conclusion is to take the path of less resistance, as what is not defined is not necessary good but also is not necessary broken. Yeah, it is complex, and particularly in PHP historically grown, and just touching the water surface might already produce some high waves.
The functions mentioned - of course, it were up to an application to decide what to use it in a particular situation, but not forcibly changing the core handling. Like in the snippet above, you would have currently to do dirname(realpath($path)), but that is also not crossplatform and won't work on a nonexistent file. So another function instead of realpath, like dirname(normalize_path($path, UNIXIFY_SLASH)) were in use. The implementation might be tricky in some parts, but in general doable.
Regards
Anatol
I can only link to this ?
http://git.php.net/?p=php-src.git;a=commitdiff;h=ec78507bd46a05f77dbde3fa4091ab4c91e61cad
the new implementation was consistent but had to be reverted in 7.1
partially, because of BC, even the use is inappropriate. Well, still
normalization on Windows means having '\' in terms of the platform
API used, but just as a show case. The dirname function itself is
based on the PHP implementation, not a platform API. But also, it
would produce same path with different separators on different
platform, if normalized.
A good example that showcases that we actually could normalize to
slashes, don't you think. :)
Besides, I still believe that it is very wrong of PHP to treat URIs/URLs
the same as paths. A path can be a URI, but a URI should only be a path
if it has the file://
scheme. The current approach just asks for
remote code inclusion, URL fopen anyone? Different story though.
You're right, they both are documented. What is not defined is the
cross platform handling. There are some documents, yes, like RFC
3986, or RFC 1738 and RFC 8089 which are still in the proposed state.
However there is none I knew that would care about crossplatform
nuances in full extent. Particularly an RFC defining all the possible
behaviors of the file:// scheme is what were needed, I guess. Thus my
conclusion is to take the path of less resistance, as what is not
defined is not necessary good but also is not necessary broken. Yeah,
it is complex, and particularly in PHP historically grown, and just
touching the water surface might already produce some high waves.The functions mentioned - of course, it were up to an application to
decide what to use it in a particular situation, but not forcibly
changing the core handling. Like in the snippet above, you would have
currently to do dirname(realpath($path)), but that is also not
crossplatform and won't work on a nonexistent file. So another
function instead of realpath, like dirname(normalize_path($path,
UNIXIFY_SLASH)) were in use. The implementation might be tricky in
some parts, but in general doable.Regards
Anatol
Well, RFC 8089 has many examples in its appendix regarding Windows. It's
true that they say that it is non-standard, however, it is how Windows
deals with it since IE4.
--
Richard "Fleshgrinder" Fussenegger
-----Original Message-----
From: Fleshgrinder [mailto:php@fleshgrinder.com]
Sent: Saturday, April 1, 2017 12:00 AM
To: Anatol Belski ab@php.net; internals@lists.php.net; Rasmus Schultz
rasmus@mindplay.dk
Subject: Re: [PHP-DEV] Directory separators on WindowsI can only link to this ?
http://git.php.net/?p=php-src.git;a=commitdiff;h=ec78507bd46a05f77dbde
3fa4091ab4c91e61cadthe new implementation was consistent but had to be reverted in 7.1
partially, because of BC, even the use is inappropriate. Well, still
normalization on Windows means having '\' in terms of the platform
API used, but just as a show case. The dirname function itself is
based on the PHP implementation, not a platform API. But also, it
would produce same path with different separators on different
platform, if normalized.A good example that showcases that we actually could normalize to slashes,
don't you think. :)
Nope, actually the opposite. More as an illustration to what shouldn't be done, namely fixing in core what actually would belongs to an app. But for BC, it's another point.
Besides, I still believe that it is very wrong of PHP to treat URIs/URLs the same
as paths. A path can be a URI, but a URI should only be a path if it has the
file://
scheme. The current approach just asks for remote code inclusion, URL
fopen anyone? Different story though.
" A Uniform Resource Identifier (URI) is a compact sequence of
characters that identifies an abstract or physical resource" they say. Fits perfectly with PHP streams.
You're right, they both are documented. What is not defined is the
cross platform handling. There are some documents, yes, like RFC 3986,
or RFC 1738 and RFC 8089 which are still in the proposed state.
However there is none I knew that would care about crossplatform
nuances in full extent. Particularly an RFC defining all the possible
behaviors of the file:// scheme is what were needed, I guess. Thus my
conclusion is to take the path of less resistance, as what is not
defined is not necessary good but also is not necessary broken. Yeah,
it is complex, and particularly in PHP historically grown, and just
touching the water surface might already produce some high waves.The functions mentioned - of course, it were up to an application to
decide what to use it in a particular situation, but not forcibly
changing the core handling. Like in the snippet above, you would have
currently to do dirname(realpath($path)), but that is also not
crossplatform and won't work on a nonexistent file. So another
function instead of realpath, like dirname(normalize_path($path,
UNIXIFY_SLASH)) were in use. The implementation might be tricky in
some parts, but in general doable.Regards
Anatol
Well, RFC 8089 has many examples in its appendix regarding Windows. It's true
that they say that it is non-standard, however, it is how Windows deals with it
since IE4.https://blogs.msdn.microsoft.com/freeassociations/2005/05/19/the-bizarre-
and-unhappy-story-of-file-urls/
Yeah, though that draft still ignores many Windows variants ☹
We went anyway a bit too deep in this complex matter. Probably a separate function is where the opinions could be joined.
Thanks
Anatol
" A Uniform Resource Identifier (URI) is a compact sequence of
characters that identifies an abstract or physical resource" they
say. Fits perfectly with PHP streams.
The problem I was referring to is not semantically. The problem is that
the code cannot easily distinguish between local and remote files. Of
course there are functions for it again, but this would be better
expressed as part of the type system. I know that this is kind of alien
to the primitive obsessive world of PHP, but proper type systems can
help a lot to make code simpler.
That being said, it's totally off topic here. :P
Yeah, though that draft still ignores many Windows variants ☹
We went anyway a bit too deep in this complex matter. Probably a
separate function is where the opinions could be joined.Thanks
Anatol
Agree, this is my last response on this here. :)
--
Richard "Fleshgrinder" Fussenegger