ZTS is always going to be slower than non ZTS.
Yes, but it can be faster than it is. PHP doesn't use almost any shared
resources (compiled regexs?) so why it is so slow ? Thread-safe syscalls
? I don't think so.
There's no need for rewriting TSRM, it's roughly as fast as it can be.
Things changes ...
Now we have AD2003 and __thread keyword in gcc. I suppose that if we use
it we could get rid of all this TSRMLS*. I know - PHP is not for linux
only - but we could have it in TSRM implementation.
I'm sure that compiler will optimize it better if __thread keyword will
be used instead of TSRMLS*.
Regards,
Wojtek
At 10:20 PM 3/24/2003 +0100, wmeler@wp-sa.pl wrote:
ZTS is always going to be slower than non ZTS.
Yes, but it can be faster than it is. PHP doesn't use almost any shared
resources (compiled regexs?) so why it is so slow ? Thread-safe syscalls
? I don't think so.
Hmm, not worth arguing with you if you're so sure you know how PHP doesn't
need any per-thread resources. It does...
There's no need for rewriting TSRM, it's roughly as fast as it can be.
Things changes ...
Now we have AD2003 and __thread keyword in gcc. I suppose that if we use
it we could get rid of all this TSRMLS*. I know - PHP is not for linux
only - but we could have it in TSRM implementation.
I'm sure that compiler will optimize it better if __thread keyword will
be used instead of TSRMLS*.
I'm not sure how __thread works but the TSRM implementation already uses
thread local storage to cache the per-thread objects. Not sure __thread
doesn't do the same.
Andi
At 10:20 PM 3/24/2003 +0100, wmeler@wp-sa.pl wrote:
ZTS is always going to be slower than non ZTS.
Yes, but it can be faster than it is. PHP doesn't use almost any shared
resources (compiled regexs?) so why it is so slow ? Thread-safe syscalls
? I don't think so.Hmm, not worth arguing with you if you're so sure you know how PHP doesn't
need any per-thread resources. It does...
You didn't understand me (my english isn't perfect and there are lacks in
vocabulary). PHP uses almost only per-thread resources - global, but only
for thread resources. It doesn't use process-global resources (I found only
compiled regex cache) so it doesn't need to synchronize on them. There are
no lock-contention problems in PHP.
I'm not sure how __thread works but the TSRM implementation already uses
thread local storage to cache the per-thread objects. Not sure __thread
doesn't do the same.
as Sascha said - NPTL is the future for multi-threaded applications - I think
there is a need for review of TSRM.
Regards,
Wojtek
At 13:20 24/03/2003, wmeler@wp-sa.pl wrote:
ZTS is always going to be slower than non ZTS.
Yes, but it can be faster than it is. PHP doesn't use almost any shared
resources (compiled regexs?) so why it is so slow ?
It uses lots of globals. These aren't shared resources, but they're
resources that in ZTS mode, take more time to access than they do in
non-ZTS mode. I mentioned the issues involved with this in my previous email.
Thread-safe syscalls
? I don't think so.
That's one of the reasons, I would guess. The libc memory manager is not
very efficient with multiple threads accessing it at once, it has locks
(which is one of the reasons we implemented our own in ZE2). It's
definitely one of the reasons, and as you know, performance penalties
accumulate.
There's no need for rewriting TSRM, it's roughly as fast as it can be.
Things changes ...
Now we have AD2003 and __thread keyword in gcc. I suppose that if we use
it we could get rid of all this TSRMLS*. I know - PHP is not for linux
only - but we could have it in TSRM implementation.
If __thread is any similar to Tls under Windows (which would be my guess),
then we can't use it directly. We're already using pthread_setspecific so
we're extremely quick with fetches as it is. As I said, I also doubt very
much that our performance penalty is solely due to fetches, but mostly
based on other issues, which __thread will not alter in any way. You're
more than encouraged to try and implement a __thread based solution in
place of the pthread_setspecific solution and see if it makes any
difference. If it does, we can investigate further in that direction and
see if it's usable.
I'm sure that compiler will optimize it better if __thread keyword will
be used instead of TSRMLS*.
Probably not much more than pthread_setspecific. I wouldn't be surprised
if __thread is built around that, actually.
Zeev
If __thread is any similar to Tls under Windows (which would be my guess),
then we can't use it directly. We're already using pthread_setspecific so
we're extremely quick with fetches as it is. As I said, I also doubt very
much that our performance penalty is solely due to fetches, but mostly
based on other issues, which __thread will not alter in any way. You're
more than encouraged to try and implement a __thread based solution in
place of the pthread_setspecific solution and see if it makes any
difference. If it does, we can investigate further in that direction and
see if it's usable.
I suppose that __thread is similar to __declspec( thread ). I really don't
know what is faster - using
__thread / __declspec(thread) struct {..} thread_globals;
and accesing it in code by
thread_globals.variable
which probably relay on CPU's MMU (depends on compiler and libc) is probably
faster than
(((type) (*((void ***) tsrm_ls))[TSRM_UNSHUFFLE_RSRC_ID(id)])->element)
which cannot be well optimized because id is global variable and could change
between function calls.
I won't argue - you have more experience. I'll try to implement TSRM macros,
but I don't have time now. I'll do it in July (if it won't be implemented yet).
Regards,
Wojtek
Wojtek Meler wrote:
If __thread is any similar to Tls under Windows (which would be my guess),
then we can't use it directly. We're already using pthread_setspecific so
we're extremely quick with fetches as it is. As I said, I also doubt very
much that our performance penalty is solely due to fetches, but mostly
based on other issues, which __thread will not alter in any way. You're
more than encouraged to try and implement a __thread based solution in
place of the pthread_setspecific solution and see if it makes any
difference. If it does, we can investigate further in that direction and
see if it's usable.I suppose that __thread is similar to __declspec( thread ). I really don't
know what is faster - using__thread / __declspec(thread) struct {..} thread_globals;
The reason it is not used in windows is because it does not (or at least
at the time thread stuff was started it didn't) work correctly with
dynamicaly loaded dll's. This was, at the time, well documented by MS
to use TLS to get around the limitations of __declspec(thread).
Shane