Hi!
Out of the urgent need to access files with a path longer than MAX_PATH on
Windows, I started some research.
At first I thought it might be a good idea to write my own stream wrapper
extension (e.g. file_long://.....) .
Before I started, I tried to find out, why those paths don't work in the
current php code.
According to [1] it is possible to use long_paths, if the path is prefixed
correctly, e.g.
\?\
for a local file path, and
\?\UNC\
for a UNC path.
I checked that fopen()
and even open() in fact do work in C code with such
long paths when using the prefix.
So I bumped up MAXPATHLEN in php.h and tsrm_config_common.h to 32786 and
recompiled a fresh php 5.3.20.
Suprisingly a php script using a long path (including the prefix) did throw
an error.
Tracing that error leads to
plain_wrapper.c:914 expand_filepath ->expand_filepath_ex ->virtual_file_ex
These are the lines, that produce the error (tsrm_virtual_cwd.c:1255):
#ifdef TSRM_WIN32
if (memchr(resolved_path, '*', path_length) ||
memchr(resolved_path, '?', path_length)) {
return 1;
}
#endif
Since there's a '?' in the string from the long path prefix the
virtual_file_ex fails at this point.
I did not quite understand the rationale behind this check.
Of course, both checked characters are invalid for a regular file path.
There seem to be some checking in tsrm_realpath_r()
for paths like
\?\Volume{62d1c3f8-83b9-11de-b108-806e6f6e6963}\foo
If I remove those memchr lines, everything magically works, e.g. fopen()
,
file_get_contents()
, file_put_contents()
, unlink()
, rmdir()
, mkdir()
, etc.
Only thing to do from userspace is to define the path as
$path = "\\?\x:\long_stuff.......\.....\......\file.txt";
There are a few macros that get irritated (e.g. IS_ABSOLUTE_PATH,
IS_UNC_PATH) by the double double backslash in the path...
My questions here:
- What is the rationale behind the memchr checks for ? and *? Just
filtering invalid paths? - Does allowing the "\?" prefix to bubble through the stream wrapper
layer (which effectively makes it usable) break anything? - If not, is it possible to include this in php 5.3. or php 5.4?
It would be indeed nice if the "\?" prefix was not needed in userspace
and php would do the work. But just for now I really would like to see php
support for long paths on windows at all. To my mind the changes needed for
the prefix workaround are function is minimal-invasive. Correct me, if I'm
wrong :)
Any comment is much appreciated, if I can help implementing this "feature",
let me know.
Greetings
Nico
[1]
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx
Any comment is much appreciated, if I can help implementing this "feature",
let me know.
I can't really comment on the feasibility, but from having seen many
people stumble upon problems with too long paths over the years (recent
framework development with namespace-itis + PSR-0 have led to some
really long paths) I can only encourage you to continue the research,
and I sure hope you get somewhere.
Cheers
--
Jordi Boggiano
@seldaek - http://nelm.io/jordi
hi,
Out of the urgent need to access files with a path longer than MAX_PATH on
Windows, I started some research.
At first I thought it might be a good idea to write my own stream wrapper
extension (e.g. file_long://.....) .Before I started, I tried to find out, why those paths don't work in the
current php code.According to [1] it is possible to use long_paths, if the path is prefixed
correctly, e.g.\?\
for a local file path, and
\?\UNC\
for a UNC path.
I checked that
fopen()
and even open() in fact do work in C code with such
long paths when using the prefix.So I bumped up MAXPATHLEN in php.h and tsrm_config_common.h to 32786 and
recompiled a fresh php 5.3.20.Suprisingly a php script using a long path (including the prefix) did throw
an error.Tracing that error leads to
plain_wrapper.c:914 expand_filepath ->expand_filepath_ex ->virtual_file_ex
These are the lines, that produce the error (tsrm_virtual_cwd.c:1255):
#ifdef TSRM_WIN32
if (memchr(resolved_path, '*', path_length) ||
memchr(resolved_path, '?', path_length)) {
return 1;
}
#endifSince there's a '?' in the string from the long path prefix the
virtual_file_ex fails at this point.
I did not quite understand the rationale behind this check.
Of course, both checked characters are invalid for a regular file path.
There seem to be some checking in tsrm_realpath_r()
for paths like\?\Volume{62d1c3f8-83b9-11de-b108-806e6f6e6963}\foo
If I remove those memchr lines, everything magically works, e.g.
fopen()
,
file_get_contents()
,file_put_contents()
,unlink()
,rmdir()
,mkdir()
, etc.Only thing to do from userspace is to define the path as
$path = "\\?\x:\long_stuff.......\.....\......\file.txt";
There are a few macros that get irritated (e.g. IS_ABSOLUTE_PATH,
IS_UNC_PATH) by the double double backslash in the path...
Yes, we do not allow kernel path, on purpose, see my comment below.
My questions here:
- What is the rationale behind the memchr checks for ? and *? Just
filtering invalid paths?
When a path gets resolved (symbolic link, junction and the likes)
there are many variations that need to be dealt with.
- Does allowing the "\?" prefix to bubble through the stream wrapper
layer (which effectively makes it usable) break anything?- If not, is it possible to include this in php 5.3. or php 5.4?
No, not even 5.5 imo, or ever :)
It would be indeed nice if the "\?" prefix was not needed in userspace
and php would do the work. But just for now I really would like to see php
support for long paths on windows at all. To my mind the changes needed for
the prefix workaround are function is minimal-invasive. Correct me, if I'm
wrong :)
I would not ever expose that prefix to userland, the consequences and
how we have to manage it are way too complicated for a user land
scripting languages (even in C apps it is not recommended).
A better solution I work on for previous php version (incl. 5.5 as I
won't make it in time) is an extension which would override existing
functions. Next major version (6) will support unicode filenames,
which will solve the 255 chars horrible limitation.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
functions. Next major version (6) will support unicode filenames,
which will solve the 255 chars horrible limitation.
I thought the Unicode effort was abandoned long time ago. You sound like
someone is still actively working on it?
- Martin
functions. Next major version (6) will support unicode filenames,
which will solve the 255 chars horrible limitation.I thought the Unicode effort was abandoned long time ago. You sound like
someone is still actively working on it?
Not that Unicode but the wild char API on Windows :)
--
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
It would be indeed nice if the "\?" prefix was not needed in userspace
and php would do the work. But just for now I really would like to see
php
support for long paths on windows at all. To my mind the changes needed
for
the prefix workaround are function is minimal-invasive. Correct me, if
I'm
wrong :)I would not ever expose that prefix to userland, the consequences and
how we have to manage it are way too complicated for a user land
scripting languages (even in C apps it is not recommended).
is this about allowing the user to shot him/herself in the foot, or adding
this feature could potentially break some existing functionality (eg. new
trick to bypass open_basedir, etc.)?
A better solution I work on for previous php version (incl. 5.5 as I
won't make it in time) is an extension which would override existing
functions. Next major version (6) will support unicode filenames,
which will solve the 255 chars horrible limitation.
by 6.0 you mean php next after 5.5, or adding this to the core would
require a major version for some reason?
--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
hi,
is this about allowing the user to shot him/herself in the foot, or adding
this feature could potentially break some existing functionality (eg. new
trick to bypass open_basedir, etc.)?
All of them, as the paths are passed right to the kernel APIs, without
any system checks like in the higher level APIs (posix or win32).
A better solution I work on for previous php version (incl. 5.5 as I
won't make it in time) is an extension which would override existing
functions. Next major version (6) will support unicode filenames,
which will solve the 255 chars horrible limitation.by 6.0 you mean php next after 5.5, or adding this to the core would require
a major version for some reason?
I mean 6 as the changes are rather big. It could be possible in 5.6
but I rather prefer to target the next major.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
2013.01.08. 6:48, "Pierre Joye" pierre.php@gmail.com ezt írta:
hi,
is this about allowing the user to shot him/herself in the foot, or
adding
this feature could potentially break some existing functionality (eg.
new
trick to bypass open_basedir, etc.)?All of them, as the paths are passed right to the kernel APIs, without
any system checks like in the higher level APIs (posix or win32).A better solution I work on for previous php version (incl. 5.5 as I
won't make it in time) is an extension which would override existing
functions. Next major version (6) will support unicode filenames,
which will solve the 255 chars horrible limitation.by 6.0 you mean php next after 5.5, or adding this to the core would
require
a major version for some reason?I mean 6 as the changes are rather big. It could be possible in 5.6
but I rather prefer to target the next major.
thanks for the clarification.
Hi!
hi,
is this about allowing the user to shot him/herself in the foot, or
adding
this feature could potentially break some existing functionality (eg. new
trick to bypass open_basedir, etc.)?All of them, as the paths are passed right to the kernel APIs, without
any system checks like in the higher level APIs (posix or win32).
I justed checked that without any modification the userspace functions
unlink()
mkdir()
rmdir()
rename()
do work with "\?" prefixed paths, because there is no call to
expand_filepath() i.e. paths are passed directly.
So the only function we needed to modify is php_plain_files_stream_opener...
What do you think about adding:
zval **ctx_opt = NULL;
...
// basedir checks
...
if (context) {
if ( SUCCESS == php_stream_context_get_option(context, "file",
"assume_realpath", &ctx_opt) && Z_TYPE_PP(ctx_opt) == IS_BOOL &&
Z_LVAL_PP(ctx_opt) == 1) {
options = options | STREAM_ASSUME_REALPATH;
}
}
Usually STREAM_ASSUME_REALPATH is set by _php_stream_open_wrapper_ex when
USE_PATH is set and the path was resolved successfully. Unfortunately, the
following zend_resolve_path fails as before, because of the '?' in the path.
With the context option above, I could do the following:
<?php
$path = "\\?\y:\dummy_folder\dummy.txt";
$options = array( "file" => array( "assume_realpath" => true ) );
$ctx = stream_context_create( $options );
$fh = fopen( $path, "r", false, $ctx );
Which works without the need to touch MAXPATHLEN by just using the already
implemented skip of expand_filepath().
Don't get me wrong, I'm not trying to bend php to allow the usage of this
ugly workaround. But as this is quite well documented, and supported on
windows, I just like to ask if adding the above context option is a
feasable way to prevent php with interfering with the prefix.
As of now, there don't seem to be any stream context options for file://.
Question here is, if it is ok to expose a stream option telling php "do not
interfere with my paths, I'm doing this myself". To my mind, since it
already works for anything not involving php_plain_files_stream_opener,
this could be ok.
There are already a lot of stream context options for other wrappers. And
most of them do very lowlevel adjustments...
Yes, there are better ways of doing this, but these future implementations
will take a lot of time and will not be available within the next months.
Of course I'm trying to prevent finding myself compiling php for production
use incorporating just such a little change...
I currently don't see a way how I could introduce the mentioned context
option inside an extension without duplicating a lot of code. Any pointers
are welcome...
What do you think?
Greeting
Nico
hi,
What do you think?
As I stated earlier, doing so is like opening the pandora box. I would
rather go with mounted directory and the likes to reduce the length of
the path, as long as it is possible.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Hi!
hi,
On Tue, Jan 8, 2013 at 2:24 PM, Nicolai Scheer nicolai.scheer@gmail.com
wrote:What do you think?
As I stated earlier, doing so is like opening the pandora box. I would
rather go with mounted directory and the likes to reduce the length of
the path, as long as it is possible.
I agree, but what about pandora and the other file functions like unlink()
etc.? :)
They currently do not prevent such long and prefixed paths...
And to my mind it is ok to let the user open the box (a little?) when he is
doing so on purpose.
Unfortunately mouting directories is too unflexible for our use case...
Furthermore we only need to read files, and that's the only function
currently not allowing the prefix workaround :(
Greetings
Nico
I agree, but what about pandora and the other file functions like
unlink()
etc.? :)
They currently do not prevent such long and prefixed paths...
A bug then, should be fixed. Yes, you don't want to hear that but... :-)
And to my mind it is ok to let the user open the box (a little?) when he is
doing so on purpose.Unfortunately mouting directories is too unflexible for our use case...
How so? can be easily automated for shared hosts and the likes.
Furthermore we only need to read files, and that's the only function
currently not allowing the prefix workaround :(
--
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Hi!
On Tue, Jan 8, 2013 at 2:38 PM, Nicolai Scheer nicolai.scheer@gmail.com
wrote:I agree, but what about pandora and the other file functions like
unlink()
etc.? :)
They currently do not prevent such long and prefixed paths...A bug then, should be fixed. Yes, you don't want to hear that but... :-)
Could have guessed the answer ;)
And to my mind it is ok to let the user open the box (a little?) when he
is
doing so on purpose.Unfortunately mouting directories is too unflexible for our use case...
How so? can be easily automated for shared hosts and the likes.
We're developing a file system indexer. There's a huge folder structure and
we scan folders in parallel. The users mountpoints are way down the
structure (subfolder for department, team etc.). If the users reach
MAX_PATH, the scanner, using the whole path from root, exceeds MAX_PATH.
We'd need to adjust a lot of code in order to be able to work on mounts
instead of the direct path, and it's not trivial to parallelize this. Of
course it could be done, but I searched for an easier way...
Do you have any advice how I can move lill' pandora to an extension? Of
course I might copy the complete simple_file_wrapper, but I'd rather not do
it that way... I did not find any "add standard stream context
option"-stuff in the API...
Thanks!
Greetings
Nico
Hi again!
Do you have any advice how I can move lill' pandora to an extension? Of
course I might copy the complete simple_file_wrapper, but I'd rather not do
it that way... I did not find any "add standard stream context
option"-stuff in the API...
I did finish to write a small extension this evening, which does what I
want :)
I just saved a copy of the php_plain_files_wrapper struct, deregistered the
file wrapper during my MINIT, injected my own functions into the struct and
registered it again.
My own functions just wrap the original ones (using the saved original
function pointers), adjust STREAM_ASSUME_REALPATH and add the prefix if
necessary.
Adjustments only kick in for paths exceeding MAX_PATH, so if all paths are
within the usually allowed bounds, everything is back to original.
Quick and dirty, but it works very well and is a feasible way for our
project... until there's a better way :)
Greetings
Nico
On Tue, Jan 8, 2013 at 10:06 PM, Nicolai Scheer
nicolai.scheer@gmail.com wrote:
Hi again!
Do you have any advice how I can move lill' pandora to an extension? Of
course I might copy the complete simple_file_wrapper, but I'd rather not do
it that way... I did not find any "add standard stream context option"-stuff
in the API...I did finish to write a small extension this evening, which does what I want
:)I just saved a copy of the php_plain_files_wrapper struct, deregistered the
file wrapper during my MINIT, injected my own functions into the struct and
registered it again.
My own functions just wrap the original ones (using the saved original
function pointers), adjust STREAM_ASSUME_REALPATH and add the prefix if
necessary.Adjustments only kick in for paths exceeding MAX_PATH, so if all paths are
within the usually allowed bounds, everything is back to original.Quick and dirty, but it works very well and is a feasible way for our
project... until there's a better way :)
nice :) that's exactly what I meant earlier, much easier than
implementing a wrapper and safer than allowing special paths. Now be
sure to do not mess with the libs :)
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org