Hi,
I'm writing to ask for a review on the following:
https://gist.github.com/weltling/29779b61db26c62b5ab0
It is quite far from being even near to be ready. It is an approach for the
streams refactoring. After some time playing around, I think that the ideas
I was initially approaching are to the big part not applicable to our code
base in its current state. For one, "isolate the actual concrete low level
implementations by platforms" - it's probably the goal 1. that has to be
done before anything else. However creating an isolated layer of structs for
every platform (libuv alike) most likely is not doable at the current stage,
if at all (and whether makes sense).
So in this patch, I've started to create an I/O layer in win32, with the
goal to create a framework global to the whole PHP. This serves as the first
unavoidable step for the separation by platform. Ideally, we should have all
the abstracted low level I/O code for platforms concentrated at one place,
while currently all possible code parts of win32 I/O are spread over all the
code base - Zend, TSRM, ext/standard, win32.
In the current state it is hard to both maintain and to develop these pieces
of our codebase. So sounds like a gross refactoring which might last for
some longer time. For the Windows part it'll bring a huge functional
advantage of using the native win32 APIs which means support for multibyte
and long filepath and improved security, later probably overlapped I/O. For
the overall codebase it'll mean simpler and better readable code, so more
stability and possibility to extend with new modern features more easily.
Practically in this patch, with the approach I've described, I've
implemented several helpers and routines replacements for Windows that use
the native APIs. Particularly PHP's fopen and mkdir there support multibyte
filepaths and implementing the long path support is then quite easy. For
that to work cross platform, I define macros like php_ioutil_open,
php_ioutil_mkdir, etc. for the low level POSIX surrogates. There, I used
several parts from libuv to not to do everything from scratch. Also I've
started to pull the win32 related code out from zend_virtual_cwd.c, but
that's just a start. The plus of such approach is also that the work can be
done incrementally, moving the functions one by one, as the unported code
will stay compatible with function signatures and functionality.
Please let me know what you think about this approach as it's a very early
stage which would be good to take the corrections.
Thanks
Anatol
Hi,
At current state, I see this not as a whole stream layer refactoring,
but as a low-level replacement of POSIX layer on Windows.
I see only few "visible" changes:
- The layer first checks, if file names are valid UTF-8 strings, and
use UTF-8 names or fall back to system locale. - The patch allows to use long file names (longer than MAX_PATH) using
"//?/******<utf8-absolute-long-path>" - Files now are opened in FILE_SHARE_READ | FILE_SHARE_WRITE |
FILE_SHARE_DELETE mode (I'm not sure about intended better POSIX
compliance and possible side effects).
For me, the approach looks a bit inconsistent, especially the attempt to
use the same string as UTF-8 and then using some code page.
Thanks. Dmitry.
Hi,
I'm writing to ask for a review on the following:
https://gist.github.com/weltling/29779b61db26c62b5ab0
It is quite far from being even near to be ready. It is an approach for the
streams refactoring. After some time playing around, I think that the ideas
I was initially approaching are to the big part not applicable to our code
base in its current state. For one, "isolate the actual concrete low level
implementations by platforms" - it's probably the goal 1. that has to be
done before anything else. However creating an isolated layer of structs for
every platform (libuv alike) most likely is not doable at the current stage,
if at all (and whether makes sense).So in this patch, I've started to create an I/O layer in win32, with the
goal to create a framework global to the whole PHP. This serves as the first
unavoidable step for the separation by platform. Ideally, we should have all
the abstracted low level I/O code for platforms concentrated at one place,
while currently all possible code parts of win32 I/O are spread over all the
code base - Zend, TSRM, ext/standard, win32.In the current state it is hard to both maintain and to develop these pieces
of our codebase. So sounds like a gross refactoring which might last for
some longer time. For the Windows part it'll bring a huge functional
advantage of using the native win32 APIs which means support for multibyte
and long filepath and improved security, later probably overlapped I/O. For
the overall codebase it'll mean simpler and better readable code, so more
stability and possibility to extend with new modern features more easily.Practically in this patch, with the approach I've described, I've
implemented several helpers and routines replacements for Windows that use
the native APIs. Particularly PHP's fopen and mkdir there support multibyte
filepaths and implementing the long path support is then quite easy. For
that to work cross platform, I define macros like php_ioutil_open,
php_ioutil_mkdir, etc. for the low level POSIX surrogates. There, I used
several parts from libuv to not to do everything from scratch. Also I've
started to pull the win32 related code out from zend_virtual_cwd.c, but
that's just a start. The plus of such approach is also that the work can be
done incrementally, moving the functions one by one, as the unported code
will stay compatible with function signatures and functionality.Please let me know what you think about this approach as it's a very early
stage which would be good to take the corrections.Thanks
Anatol
Hi Dmitry,
Many thanks for the comments.
-----Original Message-----
From: Dmitry Stogov [mailto:dmitry@zend.com]
Sent: Monday, February 1, 2016 10:23 AM
To: Anatol Belski anatol.php@belski.net; internals@lists.php.net
Cc: 'Pierre Joye' pierre.php@gmail.com; 'Xinchen Hui'
laruence@gmail.com; 'Nikita Popov' nikita.ppv@gmail.com
Subject: [PHP-DEV] Re: Streams and I/O refactoring approachHi,
At current state, I see this not as a whole stream layer refactoring, but
as a low-
level replacement of POSIX layer on Windows.
I see only few "visible" changes:
Basically it's the result of analyzing the code of particularly APR, libuv
and several other projects. While libuv works with UTF-8 paths only, APR
retains a wider compatibility. Please check the code here for an example
https://github.com/apache/apr/blob/trunk/file_io/win32/open.c#L321
- The layer first checks, if file names are valid UTF-8 strings, and use
UTF-8
names or fall back to system locale.
Yep, it is done for the fallback, and also the non-unicode part is
conditional, so can be turned off on compilation time with
PHP_WIN32_IOUTIL_ANSI_COMPAT_MODE set to 0. For an example how it is done in
APR
https://github.com/apache/apr/blob/trunk/file_io/win32/open.c#L406
Currently it would make sense to keep this fallback, so the current
scripts/codes using non-unicode would keep to work.
- The patch allows to use long file names (longer than MAX_PATH) using
"//?/******<utf8-absolute-long-path>"
Yep, to do is to prepend \?\ transparently for the user. But even without
\?\ multibyte paths are supported.
- Files now are opened in FILE_SHARE_READ | FILE_SHARE_WRITE |
FILE_SHARE_DELETE mode (I'm not sure about intended better POSIX
compliance and possible side effects).
This is done in most projects, fe
https://github.com/apache/apr/blob/trunk/file_io/win32/open.c#L346
But we can remove it if there's something wrong.
For me, the approach looks a bit inconsistent, especially the attempt to
use the
same string as UTF-8 and then using some code page.
The only difference of my approach to the other projects is that I suggest
to keep the POSIX compatible APIs, instead of having a whole scruct
describing a filesystem object, fe
https://ci.apache.org/projects/httpd/trunk/doxygen/structapr__file__t.html
It could be done later, but for now keeping POSIX APIs would be IMHO the
simplest solution. And as I've stated, it is just a preparation to have
everything at the same base, otherwise further improvements are quite
tedious.
Regards
Anatol
Hi,
-----Original Message-----
From: Anatol Belski [mailto:anatol.php@belski.net]
Sent: Monday, February 1, 2016 11:03 AM
To: 'Dmitry Stogov' dmitry@zend.com; internals@lists.php.net
Cc: 'Pierre Joye' pierre.php@gmail.com; 'Xinchen Hui'
laruence@gmail.com; 'Nikita Popov' nikita.ppv@gmail.com
Subject: RE: [PHP-DEV] Re: Streams and I/O refactoring approachHi Dmitry,
Many thanks for the comments.
-----Original Message-----
From: Dmitry Stogov [mailto:dmitry@zend.com]
Sent: Monday, February 1, 2016 10:23 AM
To: Anatol Belski anatol.php@belski.net; internals@lists.php.net
Cc: 'Pierre Joye' pierre.php@gmail.com; 'Xinchen Hui'
laruence@gmail.com; 'Nikita Popov' nikita.ppv@gmail.com
Subject: [PHP-DEV] Re: Streams and I/O refactoring approachHi,
At current state, I see this not as a whole stream layer refactoring,
but
as a low-
level replacement of POSIX layer on Windows.
I see only few "visible" changes:Basically it's the result of analyzing the code of particularly APR, libuv
and
several other projects. While libuv works with UTF-8 paths only, APR
retains a
wider compatibility. Please check the code here for an examplehttps://github.com/apache/apr/blob/trunk/file_io/win32/open.c#L321
- The layer first checks, if file names are valid UTF-8 strings, and
use
UTF-8
names or fall back to system locale.
Yep, it is done for the fallback, and also the non-unicode part is
conditional, so
can be turned off on compilation time with
PHP_WIN32_IOUTIL_ANSI_COMPAT_MODE set to 0. For an example how it is
done in APRhttps://github.com/apache/apr/blob/trunk/file_io/win32/open.c#L406
Currently it would make sense to keep this fallback, so the current
scripts/codes
using non-unicode would keep to work.
- The patch allows to use long file names (longer than MAX_PATH)
using "//?/******<utf8-absolute-long-path>"
Yep, to do is to prepend \?\ transparently for the user. But even without
\?
multibyte paths are supported.
- Files now are opened in FILE_SHARE_READ | FILE_SHARE_WRITE |
FILE_SHARE_DELETE mode (I'm not sure about intended better POSIX
compliance and possible side effects).This is done in most projects, fe
https://github.com/apache/apr/blob/trunk/file_io/win32/open.c#L346
But we can remove it if there's something wrong.
For me, the approach looks a bit inconsistent, especially the attempt
to
use the
same string as UTF-8 and then using some code page.The only difference of my approach to the other projects is that I suggest
to
keep the POSIX compatible APIs, instead of having a whole scruct
describing a
filesystem object, fehttps://ci.apache.org/projects/httpd/trunk/doxygen/structapr__file__t.html
It could be done later, but for now keeping POSIX APIs would be IMHO the
simplest solution. And as I've stated, it is just a preparation to have
everything at
the same base, otherwise further improvements are quite tedious.
After some discussions and rethinking this approach, I came to conclusion
that creating a layer with POSIX compatible signatures for low level IO is
most likely not a good idea. I won't continue on this idea and rather going
for an RFC to target better APIs and refactoring for the low level IO layer.
Though probably it were worth to reuse the parts of this patch to add the
missing features in the current code base.
Thanks.
Anatol