Hi,
the pull request https://github.com/php/php-src/pull/500 fixing the bug
#50333 is ready to review. Manual tests done so far on linux and windows
in TS and NTS mode with CLI and Apache show no regression. The performance
tests are to be done yet.
Regards
Anatol
Hi,
I don't have strong opinion about the patch.
I thought malloc() -> emalloc() change might improve PHP performance in
general, but unfortunately it doesn't.
On the other hand the patch is quite big and introduces source level
incompatibility.
The patch is not complete. At least it misses this chunk:
--- a/sapi/cgi/cgi_main.c
+++ b/sapi/cgi/cgi_main.c
@@ -1396,7 +1396,7 @@ static void init_request_info(fcgi_request *request
TSRMLS_DC)
} else {
SG(request_info).request_uri =
env_script_name;
}
-
free(real_path);
-
efree(real_path); } } else { /* pre 4.3 behaviour, shouldn't be used but
provides BC */
It's also much better to use do_alloca() instead of tsrm_do_alloca() (the
patch changed them to less efficient emalloc()).
May be if you change it, we would see improvement :)
As I said, currently, the patch doesn't significantly affect performance of
non-thread-safe build on Linux.
master patched improvement Blog (req/sec) 106.1 105.1 -0.94% drupal
(req/sec) 1660.5 1668.6 0.49% fw (req/sec) 231.7 227.499 -1.81% hello
(req/sec) 11828.8 11980.5 1.28% qdig (req/sec) 470 477.3 1.55% typo3
(req/sec) 580.1 579.3 -0.14% wordpress (req/sec) 185.9 188.5 1.40% xoops
(req/sec) 130 131.2 0.92% scrum (req/sec) 185.199 185 -0.11% ZF1 Hello
(req/sec) 1154.7 1155.2 0.04% ZF2 Test (req/sec) 248.7 250.7 0.80%
Thanks. Dmitry.
Hi,
the pull request https://github.com/php/php-src/pull/500 fixing the bug
#50333 is ready to review. Manual tests done so far on linux and windows
in TS and NTS mode with CLI and Apache show no regression. The performance
tests are to be done yet.Regards
Anatol
Hi Dmitry,
Hi,
I don't have strong opinion about the patch.
I thought malloc() -> emalloc() change might improve PHP performance in
general, but unfortunately it doesn't. On the other hand the patch is quite
big and introduces source level incompatibility.The patch is not complete. At least it misses this chunk:
--- a/sapi/cgi/cgi_main.c
+++ b/sapi/cgi/cgi_main.c
@@ -1396,7 +1396,7 @@ static void init_request_info(fcgi_request *request
TSRMLS_DC)
} else {
SG(request_info).request_uri =
env_script_name; }
free(real_path);
efree(real_path);
}
} else {
/* pre 4.3 behaviour, shouldn't be used but
provides BC */It's also much better to use do_alloca() instead of tsrm_do_alloca() (the
patch changed them to less efficient emalloc()). May be if you change it,
we would see improvement :)As I said, currently, the patch doesn't significantly affect performance
of non-thread-safe build on Linux.master patched improvement Blog (req/sec) 106.1 105.1 -0.94% drupal
(req/sec) 1660.5 1668.6 0.49% fw (req/sec) 231.7 227.499 -1.81% hello
(req/sec) 11828.8 11980.5 1.28% qdig (req/sec) 470 477.3 1.55% typo3
(req/sec) 580.1 579.3 -0.14% wordpress (req/sec) 185.9 188.5 1.40% xoops
(req/sec) 130 131.2 0.92% scrum (req/sec) 185.199 185 -0.11% ZF1 Hello
(req/sec) 1154.7 1155.2 0.04% ZF2 Test (req/sec) 248.7 250.7 0.80%
Thanks. Dmitry.Hi,
the pull request https://github.com/php/php-src/pull/500 fixing the bug
#50333 is ready to review. Manual tests done so far on linux and
windows in TS and NTS mode with CLI and Apache show no regression. The
performance tests are to be done yet.Regards
Anatol
thanks for the comments, the CGI part is fixed in the same patch.
While investigating on how to go further with your suggestion, I've yet a
couple of open questions.
The reason for moving files into Zend, as I can see from the original
patch is, that the tsrm_init has to happen after Zend memory manager init
in order to use the e* function family. It has to be true also for the
do_alloca() case as when the stack size was exceeded, it'll fallback to
emalloc(). For that reason, I'd rather implement your suggestion into the
existing patch as it's quicker just to see if it'd make sense at all. If
it would, tsrm files can be moved back into TSRM/ and one can look for
magic to initialize it in the right order.
It'd of course make sense to convert everything for do_alloca() usage,
just a replace for tsrm_do_alloca would already work, but ... there are
some places in the original codelike
http://lxr.php.net/xref/PHP_TRUNK/TSRM/tsrm_virtual_cwd.c#493 where malloc
is used. Or virtual_file_ex which is using realloc(), with do_alloca that
memory should be probably just left unfreed on the stack? Do you generally
think i should touch those places, would it be ok to leave them alone for
now for the tests? As otherwise it might need signature change of some
functions, it's about use_heap when the stack size is exceeded to avoid
memory leaks.
When I look here http://lxr.php.net/xref/PHP_TRUNK/Zend/zend.h#200 , only
some GNU based platform profits from alloca() on both TS and NTS. Some
others are only active with NTS only. Is there a particular reason for
that? I think that macros could be refactored at least for the Windows
part, as this page
http://msdn.microsoft.com/en-us/library/5471dc8s(v=vs.110).aspx deprecates
_alloca nowadays.
Also wondering why you was doing NTS perf tests, as the ticket is about
improvement for multi-threaded envs. Nevertheless the overall improvement
were even better of course :)
Regards
Anatol
Hi Anatol,
First of all PHP is mainly used as NTS on Linux (LAMP stack)
It's the reason I tested it first, to check if your patch affects the
performance that is really important for users.
I still think that tsrm_do_alloca() have to be replaced by do_alloca().
They have exactly the same logic, except that the first one fallback to
malloc() and the second one to emalloc(). It must not be affected by
malloc()s and realloc()s if you changed them into emalloc() and erealloc().
May be I missing something...
I'm not sure about alloca() usage on Windows, but you may change zend.h if
you like.
BTW: I don't think we should replace alloca() by _malloca(), because
do_alloca() already does the same in portable way and uses per-request heap.
Thanks. Dmitry.
Hi Dmitry,
Hi,
I don't have strong opinion about the patch.
I thought malloc() -> emalloc() change might improve PHP performance in
general, but unfortunately it doesn't. On the other hand the patch is
quite
big and introduces source level incompatibility.The patch is not complete. At least it misses this chunk:
--- a/sapi/cgi/cgi_main.c
+++ b/sapi/cgi/cgi_main.c
@@ -1396,7 +1396,7 @@ static void init_request_info(fcgi_request *request
TSRMLS_DC)
} else {
SG(request_info).request_uri =
env_script_name; }
free(real_path);
efree(real_path);
}
} else {
/* pre 4.3 behaviour, shouldn't be used but
provides BC */It's also much better to use do_alloca() instead of tsrm_do_alloca() (the
patch changed them to less efficient emalloc()). May be if you change
it,
we would see improvement :)As I said, currently, the patch doesn't significantly affect performance
of non-thread-safe build on Linux.master patched improvement Blog (req/sec) 106.1 105.1 -0.94% drupal
(req/sec) 1660.5 1668.6 0.49% fw (req/sec) 231.7 227.499 -1.81% hello
(req/sec) 11828.8 11980.5 1.28% qdig (req/sec) 470 477.3 1.55% typo3
(req/sec) 580.1 579.3 -0.14% wordpress (req/sec) 185.9 188.5 1.40%
xoops
(req/sec) 130 131.2 0.92% scrum (req/sec) 185.199 185 -0.11% ZF1 Hello
(req/sec) 1154.7 1155.2 0.04% ZF2 Test (req/sec) 248.7 250.7 0.80%
Thanks. Dmitry.Hi,
the pull request https://github.com/php/php-src/pull/500 fixing the bug
#50333 is ready to review. Manual tests done so far on linux and
windows in TS and NTS mode with CLI and Apache show no regression. The
performance tests are to be done yet.Regards
Anatol
thanks for the comments, the CGI part is fixed in the same patch.
While investigating on how to go further with your suggestion, I've yet a
couple of open questions.The reason for moving files into Zend, as I can see from the original
patch is, that the tsrm_init has to happen after Zend memory manager init
in order to use the e* function family. It has to be true also for the
do_alloca() case as when the stack size was exceeded, it'll fallback to
emalloc(). For that reason, I'd rather implement your suggestion into the
existing patch as it's quicker just to see if it'd make sense at all. If
it would, tsrm files can be moved back into TSRM/ and one can look for
magic to initialize it in the right order.It'd of course make sense to convert everything for do_alloca() usage,
just a replace for tsrm_do_alloca would already work, but ... there are
some places in the original codelike
http://lxr.php.net/xref/PHP_TRUNK/TSRM/tsrm_virtual_cwd.c#493 where malloc
is used. Or virtual_file_ex which is using realloc(), with do_alloca that
memory should be probably just left unfreed on the stack? Do you generally
think i should touch those places, would it be ok to leave them alone for
now for the tests? As otherwise it might need signature change of some
functions, it's about use_heap when the stack size is exceeded to avoid
memory leaks.When I look here http://lxr.php.net/xref/PHP_TRUNK/Zend/zend.h#200 , only
some GNU based platform profits from alloca() on both TS and NTS. Some
others are only active with NTS only. Is there a particular reason for
that? I think that macros could be refactored at least for the Windows
part, as this page
http://msdn.microsoft.com/en-us/library/5471dc8s(v=vs.110).aspx deprecates
_alloca nowadays.Also wondering why you was doing NTS perf tests, as the ticket is about
improvement for multi-threaded envs. Nevertheless the overall improvement
were even better of course :)Regards
Anatol
Hi Dmitry,
Hi Anatol,
First of all PHP is mainly used as NTS on Linux (LAMP stack)
It's the reason I tested it first, to check if your patch affects the
performance that is really important for users.I still think that tsrm_do_alloca() have to be replaced by do_alloca().
They have exactly the same logic, except that the first one fallback to
malloc() and the second one to emalloc(). It must not be affected by
malloc()s and realloc()s if you changed them into emalloc() and
erealloc(). May be I missing something...I'm not sure about alloca() usage on Windows, but you may change zend.h
if you like. BTW: I don't think we should replace alloca() by _malloca(),
because do_alloca() already does the same in portable way and uses
per-request heap.
last week I've updated the patch according to your recommendations, now we
have the performance test results
http://windows.php.net/downloads/snaps/ostc/pftt/perf/results-20131025-MasterVanilla-Master50333-3820.html
http://windows.php.net/downloads/snaps/ostc/pftt/perf/results-20131025-MasterVanilla-Master50333-7795.html
Also enabled the stack usage in both TS/NTS for windows. It shows still no
improvement as one can see. Though, there are some functions, like
virtual_file_ex(), which use emalloc() only in the patch, but called very
often within the virtual cwd API. Changing them would require signature
changes os one can efree() in case of the fallback situation, but still
it's doubtful that would tell some bigger difference in performance. Do
you think it's worth a try?
Regards
Anatol
Hi Anatol,
Thank you for update.
I'm surprised you don't see speed difference even with ZTS :(
I didn't get what you propose to change in virtual_file_ex().
I'll do a more careful patch review later on this week.
Thaks. Dmitry.
Hi Dmitry,
Hi Anatol,
First of all PHP is mainly used as NTS on Linux (LAMP stack)
It's the reason I tested it first, to check if your patch affects the
performance that is really important for users.I still think that tsrm_do_alloca() have to be replaced by do_alloca().
They have exactly the same logic, except that the first one fallback to
malloc() and the second one to emalloc(). It must not be affected by
malloc()s and realloc()s if you changed them into emalloc() and
erealloc(). May be I missing something...I'm not sure about alloca() usage on Windows, but you may change zend.h
if you like. BTW: I don't think we should replace alloca() by _malloca(),
because do_alloca() already does the same in portable way and uses
per-request heap.last week I've updated the patch according to your recommendations, now we
have the performance test resultsAlso enabled the stack usage in both TS/NTS for windows. It shows still no
improvement as one can see. Though, there are some functions, like
virtual_file_ex(), which use emalloc() only in the patch, but called very
often within the virtual cwd API. Changing them would require signature
changes os one can efree() in case of the fallback situation, but still
it's doubtful that would tell some bigger difference in performance. Do
you think it's worth a try?Regards
Anatol
Hi Dmitry,
On Tue, 2013-10-29 at 11:45 +0400, Dmitry Stogov wrote:
Hi Anatol,
Thank you for update.
I'm surprised you don't see speed difference even with ZTS :(
I didn't get what you propose to change in virtual_file_ex().
I'll do a more careful patch review later on this week.
the story is that
- the original patch had replacements for all tsrm_do_alloca() and
malloc() to emaloc() - after that, i've turned the places originally having tsrm_do_alloca()
to do_alloca()
Just as we discussed, and that's the current state of the patch. Still,
there are many places using the e*() family of functions, like
https://github.com/weltling/php-src/blob/bug50333/Zend/zend_virtual_cwd.c#L151
https://github.com/weltling/php-src/blob/bug50333/Zend/zend_virtual_cwd.c#L1340
Alone for that two I find 21 occurrences of each elsewhere in the file.
As they are used very often in other filesystem functions like
virtual_copy(), virtual_rename(), etc. That's why I thought it could
make sense to turn any emalloc() usage into do_alloca(). But as the
do_alloca(size, use_heap) requires that use_heap argument to determine
the fallback situation, the signatures in at least that two cases for
virtual_file_ex() or CWD_STATE_COPY() will need to be extended, so the
correct freeing can happen. A simple snippet in pseudo code, just to
illustrate the idea
currently:
some_function()
{
CWD_STATE_COPY(&new_state, &CWDG(cwd));
virtual_file_ex(&new_state, path, NULL, CWD_REALPATH TSRMLS_CC);
CWD_STATE_FREE(&new_state);
}
should be like:
some_function()
{
ALLOCA_FLAG(use_heap)
CWD_STATE_COPY(&new_state, &CWDG(cwd), use_heap);
virtual_file_ex(&new_state, path, NULL, CWD_REALPATH, use_heap
TSRMLS_CC);
CWD_STATE_FREE(&new_state, use_heap);
}
That's of course much deeper change and would affect more code outside
just the scope of that one file, still that's manageable. I'd expect
some more functions needing that, so then we'd have do_alloca() usage
everywhere and drop all e*() invocations. Maybe then we see some
effect :)
Regards
Anatol
I don't think it makes sense to invest into it right now.
Lets finish with existing patch.
Also, alloca() allocates data on CPU stack, so such data is automatically
freed when function returns.
In case you change CWD_STATE_COPY() to use alloca(), the "state" couldn't
be returned from the function.
I'm not sure if it'll work for all use cases.
Thanks. Dmitry.
Hi Dmitry,
Hi Anatol,
Thank you for update.
I'm surprised you don't see speed difference even with ZTS :(
I didn't get what you propose to change in virtual_file_ex().
I'll do a more careful patch review later on this week.
the story is that
- the original patch had replacements for all tsrm_do_alloca() and
malloc() to emaloc()- after that, i've turned the places originally having tsrm_do_alloca()
to do_alloca()Just as we discussed, and that's the current state of the patch. Still,
there are many places using the e*() family of functions, likehttps://github.com/weltling/php-src/blob/bug50333/Zend/zend_virtual_cwd.c#L151
https://github.com/weltling/php-src/blob/bug50333/Zend/zend_virtual_cwd.c#L1340
Alone for that two I find 21 occurrences of each elsewhere in the file.
As they are used very often in other filesystem functions like
virtual_copy(), virtual_rename(), etc. That's why I thought it could
make sense to turn any emalloc() usage into do_alloca(). But as the
do_alloca(size, use_heap) requires that use_heap argument to determine
the fallback situation, the signatures in at least that two cases for
virtual_file_ex() or CWD_STATE_COPY() will need to be extended, so the
correct freeing can happen. A simple snippet in pseudo code, just to
illustrate the ideacurrently:
some_function()
{
CWD_STATE_COPY(&new_state, &CWDG(cwd));
virtual_file_ex(&new_state, path, NULL, CWD_REALPATH TSRMLS_CC);
CWD_STATE_FREE(&new_state);
}should be like:
some_function()
{
ALLOCA_FLAG(use_heap)
CWD_STATE_COPY(&new_state, &CWDG(cwd), use_heap);
virtual_file_ex(&new_state, path, NULL, CWD_REALPATH, use_heap
TSRMLS_CC);
CWD_STATE_FREE(&new_state, use_heap);
}That's of course much deeper change and would affect more code outside
just the scope of that one file, still that's manageable. I'd expect
some more functions needing that, so then we'd have do_alloca() usage
everywhere and drop all e*() invocations. Maybe then we see some
effect :)Regards
Anatol
Hi Anatol,
I've posted few comments at https://github.com/php/php-src/pull/500/files
Otherwise, the patch looks fine.
I think it may be accepted even if it doesn't make visible improvement.
Of course, when the small issues described in comment are fixed.
There were also a minor merging conflict in ext/opcache/ZendAccelerator.c
(it's not a problem at all).
I hope you testd ZTS PHP with patch well, because I mainly care about
non-ZTS builds.
Thanks. Dmitry.
I don't think it makes sense to invest into it right now.
Lets finish with existing patch.Also, alloca() allocates data on CPU stack, so such data is automatically
freed when function returns.
In case you change CWD_STATE_COPY() to use alloca(), the "state" couldn't
be returned from the function.
I'm not sure if it'll work for all use cases.Thanks. Dmitry.
Hi Dmitry,
Hi Anatol,
Thank you for update.
I'm surprised you don't see speed difference even with ZTS :(
I didn't get what you propose to change in virtual_file_ex().
I'll do a more careful patch review later on this week.
the story is that
- the original patch had replacements for all tsrm_do_alloca() and
malloc() to emaloc()- after that, i've turned the places originally having tsrm_do_alloca()
to do_alloca()Just as we discussed, and that's the current state of the patch. Still,
there are many places using the e*() family of functions, likehttps://github.com/weltling/php-src/blob/bug50333/Zend/zend_virtual_cwd.c#L151
https://github.com/weltling/php-src/blob/bug50333/Zend/zend_virtual_cwd.c#L1340
Alone for that two I find 21 occurrences of each elsewhere in the file.
As they are used very often in other filesystem functions like
virtual_copy(), virtual_rename(), etc. That's why I thought it could
make sense to turn any emalloc() usage into do_alloca(). But as the
do_alloca(size, use_heap) requires that use_heap argument to determine
the fallback situation, the signatures in at least that two cases for
virtual_file_ex() or CWD_STATE_COPY() will need to be extended, so the
correct freeing can happen. A simple snippet in pseudo code, just to
illustrate the ideacurrently:
some_function()
{
CWD_STATE_COPY(&new_state, &CWDG(cwd));
virtual_file_ex(&new_state, path, NULL, CWD_REALPATH TSRMLS_CC);
CWD_STATE_FREE(&new_state);
}should be like:
some_function()
{
ALLOCA_FLAG(use_heap)
CWD_STATE_COPY(&new_state, &CWDG(cwd), use_heap);
virtual_file_ex(&new_state, path, NULL, CWD_REALPATH, use_heap
TSRMLS_CC);
CWD_STATE_FREE(&new_state, use_heap);
}That's of course much deeper change and would affect more code outside
just the scope of that one file, still that's manageable. I'd expect
some more functions needing that, so then we'd have do_alloca() usage
everywhere and drop all e*() invocations. Maybe then we see some
effect :)Regards
Anatol
Hi Dmitry,
On Wed, 2013-10-30 at 16:00 +0400, Dmitry Stogov wrote:
Hi Anatol,
I've posted few comments at https://github.com/php/php-src/pull/500/files
Otherwise, the patch looks fine.
I think it may be accepted even if it doesn't make visible improvement.
Of course, when the small issues described in comment are fixed.There were also a minor merging conflict in ext/opcache/ZendAccelerator.c
(it's not a problem at all).I hope you testd ZTS PHP with patch well, because I mainly care about
non-ZTS builds.Thanks. Dmitry.
thanks for the time you spend on this. The remaining fixes are on their
way, they'll be tested well functionally+performance. I'll write back as
soon as all that is done.
What just comes in my mind, the original idea was invented on Solaris,
so maybe there's something specific to that platform what makes it
faster there. In the ticked is also mentioned "scalability on
multi-threaded/multi-core systems", so maybe some more special
conditions are needed. Maybe like some particular CPU cores count
(currently we do it with 2 or 4 cores) or specific server load level.
I'm going to play more with that.
Best regards
Anatol
Hi Dmitry,
Hi Anatol,
I've posted few comments at https://github.com/php/php-src/pull/500/files
Otherwise, the patch looks fine.
I think it may be accepted even if it doesn't make visible improvement.
Of course, when the small issues described in comment are fixed.There were also a minor merging conflict in ext/opcache/ZendAccelerator.c
(it's not a problem at all).I hope you testd ZTS PHP with patch well, because I mainly care about
non-ZTS builds.Thanks. Dmitry.
I've fixed the patch according to your last comments. Just a note about
virtual_cwd_activate() being called twice. That is needed for the threaded
environment, once for SAPI startup and then per request in
zend_activate(), as that calls belong to different threads.
The performance has at least no regressions, here's the report
The functional tests have also no regressions from recent master.
Regards
Anatol
Hi Anatol,
I don't see any technical problems with the patch.
Thanks. Dmitry.
Hi Dmitry,
Hi Anatol,
I've posted few comments at
https://github.com/php/php-src/pull/500/files
Otherwise, the patch looks fine.
I think it may be accepted even if it doesn't make visible improvement.
Of course, when the small issues described in comment are fixed.There were also a minor merging conflict in ext/opcache/ZendAccelerator.c
(it's not a problem at all).I hope you testd ZTS PHP with patch well, because I mainly care about
non-ZTS builds.Thanks. Dmitry.
I've fixed the patch according to your last comments. Just a note about
virtual_cwd_activate() being called twice. That is needed for the threaded
environment, once for SAPI startup and then per request in
zend_activate(), as that calls belong to different threads.The performance has at least no regressions, here's the report
The functional tests have also no regressions from recent master.
Regards
Anatol