Hi,
Currently the way globals work forces to pass a thread-local-storage pointer
across function calls, which involves some overhead. Also, not all functions
get the pointer as argument and need to use TSRMLS_FETCH(), which is slow. For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is accessing
globals, using multiple pointers in different locations.
The following patch caches each global address in a native TLS variable so
that accessing a global is as simple as global_name->member. This removes the
requirement of passing the tls pointer across function calls, so that the two
major overheads of ZTS builds are avoided.
Globals can optionally be declared statically, which speeds up things a bit.
Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8s
The patch introduces two new macros: TSRMG_D() (declare) and TSRMG_DH()
(declare, for headers) to declare globals, instead of the current "ts_rsrc_id
foo_global_id". These macros declare the global id, plus the __thread pointer
to the global storage.
ts_allocate_id now takes one more callback function as argument to bind the
global pointer to its storage. This callback is declared in TSRMG_DH.
As all TSRMLS_* macros now does nothing, it is needed to call ts_resource(0)
explicitly at least one time in each thread to initialize its storage. A new
TSRMLS_INIT() macro as been added for this purpose.
All this is disabled by default. --with-tsrm-__thread-tls enables the features
of the patch, and --with-tsrm-full-__thread-tls enables static declaration of
globals.
It as been tested on Linux compiled with --disable-all in CLI and a bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes troubles
to dlopen(), actually Apache wont load the module at runtime (it works with
just --with-tsrm-__thread-tls). - The patch assumes that all resources are ts_allocate_id()'ed before any
other thread calls ts_allocate_id or ts_resource_ex(), which is possibly not
the case.
The patch needs some tweaks and does not pretend to be included in any branch,
but I would like to have some comments on it.
The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
Hi Arnaud,
The patch looks very interesting.
I think it may be committed to the HEAD in the nearest future.
I don't have time to look into all details in the moment.
Could you explain why --with-tsrm-full-__thread-tls doesn't work with
dlopen() however --with-tsrm-__thread-tls does?
Did you test the patch with DSO extensions?
It would be interesting to try the same idea on Windows with VC.
Thanks. Dmitry.
Arnaud Le Blanc wrote:
Hi,
Currently the way globals work forces to pass a thread-local-storage pointer
across function calls, which involves some overhead. Also, not all functions
get the pointer as argument and need to use TSRMLS_FETCH(), which is slow. For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS variable so
that accessing a global is as simple as global_name->member. This removes the
requirement of passing the tls pointer across function calls, so that the two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up things a bit.
Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and TSRMG_DH()
(declare, for headers) to declare globals, instead of the current "ts_rsrc_id
foo_global_id". These macros declare the global id, plus the __thread pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to bind the
global pointer to its storage. This callback is declared in TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call ts_resource(0)
explicitly at least one time in each thread to initialize its storage. A new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables the features
of the patch, and --with-tsrm-full-__thread-tls enables static declaration of
globals.It as been tested on Linux compiled with --disable-all in CLI and a bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes troubles
to dlopen(), actually Apache wont load the module at runtime (it works with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed before any
other thread calls ts_allocate_id or ts_resource_ex(), which is possibly not
the case.The patch needs some tweaks and does not pretend to be included in any branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
Hi,
On Monday 18 August 2008 19:46:46 Dmitry Stogov wrote:
Hi Arnaud,
The patch looks very interesting.
I think it may be committed to the HEAD in the nearest future.
I don't have time to look into all details in the moment.Could you explain why --with-tsrm-full-__thread-tls doesn't work with
dlopen() however --with-tsrm-__thread-tls does?
That's due to the way TLS works internally. Actually I need further reading on
that.
Did you test the patch with DSO extensions?
I will, but I guess that will be behaves like another shared library
dlopen()ed by Apache.
It would be interesting to try the same idea on Windows with VC.
I will try too.
Thanks. Dmitry.
Arnaud Le Blanc wrote:
Hi,
Currently the way globals work forces to pass a thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which is slow.
For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS variable so
that accessing a global is as simple as global_name->member. This removes
the
requirement of passing the tls pointer across function calls, so that the
two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up things a
bit.Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the __thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to bind
the
global pointer to its storage. This callback is declared in TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its storage. A
new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables the
features
of the patch, and --with-tsrm-full-__thread-tls enables static declaration
of
globals.It as been tested on Linux compiled with --disable-all in CLI and a bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed before any
other thread calls ts_allocate_id or ts_resource_ex(), which is possibly
not
the case.The patch needs some tweaks and does not pretend to be included in any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
Regards,
Arnaud
Hi Arnaud,
Arnaud Le Blanc wrote:
Hi,
On Monday 18 August 2008 19:46:46 Dmitry Stogov wrote:
Hi Arnaud,
The patch looks very interesting.
I think it may be committed to the HEAD in the nearest future.
I don't have time to look into all details in the moment.Could you explain why --with-tsrm-full-__thread-tls doesn't work with
dlopen() however --with-tsrm-__thread-tls does?That's due to the way TLS works internally. Actually I need further reading on
that.
I don't see a big difference between --with-tsrm-full-__thread-tls and
--with-tsrm-__thread-tls from TLS point of view (may be I miss it), so I
don't understand why one works and the other doesn't.
The patch looks more difficult for me than it should. I would prefer to
have only --with-tsrm-full-__thread-tls, if it works, as the patch would
be simple and PHP faster.
Another simple solution, which you probably already tested, is to use
only global __thread-ed tsrm_ls (and don't pass/fetch it), however,
access thread-globals in the same way:
((*type)tsrm_ls[global_module_id])->global_fileld
Anyway you did a great job. I would like to see this idea implemented in
HEAD. It is little bit late for 5.3 :(
Thanks. Dmitry.
Did you test the patch with DSO extensions?
I will, but I guess that will be behaves like another shared library
dlopen()ed by Apache.It would be interesting to try the same idea on Windows with VC.
I will try too.
Thanks. Dmitry.
Arnaud Le Blanc wrote:
Hi,
Currently the way globals work forces to pass a thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which is slow.
For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS variable so
that accessing a global is as simple as global_name->member. This removes
the
requirement of passing the tls pointer across function calls, so that the
two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up things a
bit.
Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the __thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to bind
the
global pointer to its storage. This callback is declared in TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its storage. A
new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables the
features
of the patch, and --with-tsrm-full-__thread-tls enables static declaration
of
globals.It as been tested on Linux compiled with --disable-all in CLI and a bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed before any
other thread calls ts_allocate_id or ts_resource_ex(), which is possibly
not
the case.The patch needs some tweaks and does not pretend to be included in any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
Regards,
Arnaud
The original one which only caches the TLS index is bound to php.exe and
does not cross .dll boundaries (if I understand your question
correctly). The new one does and therefore will most likely crash on
Windows.
Andi
-----Original Message-----
From: Dmitry Stogov
Sent: Tuesday, August 19, 2008 12:23 AM
To: Arnaud Le Blanc
Cc: PHP Development; Stas Malyshev; Andi Gutmans
Subject: Re: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi Arnaud,
Arnaud Le Blanc wrote:
Hi,
On Monday 18 August 2008 19:46:46 Dmitry Stogov wrote:
Hi Arnaud,
The patch looks very interesting.
I think it may be committed to the HEAD in the nearest future.
I don't have time to look into all details in the moment.Could you explain why --with-tsrm-full-__thread-tls doesn't work
with
dlopen() however --with-tsrm-__thread-tls does?That's due to the way TLS works internally. Actually I need further
reading
on
that.I don't see a big difference between --with-tsrm-full-__thread-tls and
--with-tsrm-__thread-tls from TLS point of view (may be I miss it), so
I
don't understand why one works and the other doesn't.The patch looks more difficult for me than it should. I would prefer
to
have only --with-tsrm-full-__thread-tls, if it works, as the patch
would
be simple and PHP faster.Another simple solution, which you probably already tested, is to use
only global __thread-ed tsrm_ls (and don't pass/fetch it), however,
access thread-globals in the same way:
((*type)tsrm_ls[global_module_id])->global_fileldAnyway you did a great job. I would like to see this idea implemented
in
HEAD. It is little bit late for 5.3 :(Thanks. Dmitry.
Did you test the patch with DSO extensions?
I will, but I guess that will be behaves like another shared library
dlopen()ed by Apache.It would be interesting to try the same idea on Windows with VC.
I will try too.
Thanks. Dmitry.
Arnaud Le Blanc wrote:
Hi,
Currently the way globals work forces to pass a
thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which
is slow.
For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS
variable so
that accessing a global is as simple as global_name->member. This
removes
the
requirement of passing the tls pointer across function calls, so
that the
two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up
things a
bit.
Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and
TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the
__thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to
bind
the
global pointer to its storage. This callback is declared in
TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its
storage. A
new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables
the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration
of
globals.It as been tested on Linux compiled with --disable-all in CLI and
a bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls)
causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it
works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed
before any
other thread calls ts_allocate_id or ts_resource_ex(), which is
possibly
not
the case.The patch needs some tweaks and does not pretend to be included in
any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
Regards,
Arnaud
Hi,
On Tuesday 19 August 2008 09:22:46 Dmitry Stogov wrote:
Hi Arnaud,
Arnaud Le Blanc wrote:
Hi,
On Monday 18 August 2008 19:46:46 Dmitry Stogov wrote:
Hi Arnaud,
The patch looks very interesting.
I think it may be committed to the HEAD in the nearest future.
I don't have time to look into all details in the moment.Could you explain why --with-tsrm-full-__thread-tls doesn't work with
dlopen() however --with-tsrm-__thread-tls does?That's due to the way TLS works internally. Actually I need further
reading on
that.I don't see a big difference between --with-tsrm-full-__thread-tls and
--with-tsrm-__thread-tls from TLS point of view (may be I miss it), so I
don't understand why one works and the other doesn't.
Badly both was not expected to work with dlopen() :(
http://marc.info/?l=php-internals&m=121912220705964&w=2
It works when using an other TLS model (which requires PIC).
This model is less efficient but it still improves performance. PIC-patched is
faster than non-PIC-unpatched, and the improvement is even greater against
PIC-unpatched.
The patch looks more difficult for me than it should. I would prefer to
have only --with-tsrm-full-__thread-tls, if it works, as the patch would
be simple and PHP faster.Another simple solution, which you probably already tested, is to use
only global __thread-ed tsrm_ls (and don't pass/fetch it), however,
access thread-globals in the same way:
((*type)tsrm_ls[global_module_id])->global_fileld
Yes, I tested and it given 4.7s on bench.php (vs 3.8s with the current patch).
Anyway you did a great job. I would like to see this idea implemented in
HEAD. It is little bit late for 5.3 :(Thanks. Dmitry.
Did you test the patch with DSO extensions?
I will, but I guess that will be behaves like another shared library
dlopen()ed by Apache.It would be interesting to try the same idea on Windows with VC.
I will try too.
Thanks. Dmitry.
Arnaud Le Blanc wrote:
Hi,
Currently the way globals work forces to pass a thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which is
slow.
For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS variable
so
that accessing a global is as simple as global_name->member. This
removes
the
requirement of passing the tls pointer across function calls, so that
the
two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up things a
bit.
Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the __thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to bind
the
global pointer to its storage. This callback is declared in TSRMG_D[H]
().As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its storage. A
new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration
of
globals.It as been tested on Linux compiled with --disable-all in CLI and a bit
in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed before
any
other thread calls ts_allocate_id or ts_resource_ex(), which is possibly
not
the case.The patch needs some tweaks and does not pretend to be included in any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
Regards,
Arnaud
Regards,
Arnaud
Arnaud Le Blanc wrote:
Hi,
On Tuesday 19 August 2008 09:22:46 Dmitry Stogov wrote:
Hi Arnaud,
Arnaud Le Blanc wrote:
Hi,
On Monday 18 August 2008 19:46:46 Dmitry Stogov wrote:
Hi Arnaud,
The patch looks very interesting.
I think it may be committed to the HEAD in the nearest future.
I don't have time to look into all details in the moment.Could you explain why --with-tsrm-full-__thread-tls doesn't work with
dlopen() however --with-tsrm-__thread-tls does?
That's due to the way TLS works internally. Actually I need further
reading on
that.
I don't see a big difference between --with-tsrm-full-__thread-tls and
--with-tsrm-__thread-tls from TLS point of view (may be I miss it), so I
don't understand why one works and the other doesn't.Badly both was not expected to work with dlopen() :(
http://marc.info/?l=php-internals&m=121912220705964&w=2It works when using an other TLS model (which requires PIC).
This model is less efficient but it still improves performance. PIC-patched is
faster than non-PIC-unpatched, and the improvement is even greater against
PIC-unpatched.
This is the thing I was afraid.
The patch looks more difficult for me than it should. I would prefer to
have only --with-tsrm-full-__thread-tls, if it works, as the patch would
be simple and PHP faster.Another simple solution, which you probably already tested, is to use
only global __thread-ed tsrm_ls (and don't pass/fetch it), however,
access thread-globals in the same way:
((*type)tsrm_ls[global_module_id])->global_fileldYes, I tested and it given 4.7s on bench.php (vs 3.8s with the current patch).
Does this model have the same issues with dlopen()?
(We have only one __thread-ed variable).
Thanks. Dmitry.
Anyway you did a great job. I would like to see this idea implemented in
HEAD. It is little bit late for 5.3 :(Thanks. Dmitry.
Did you test the patch with DSO extensions?
I will, but I guess that will be behaves like another shared library
dlopen()ed by Apache.It would be interesting to try the same idea on Windows with VC.
I will try too.Thanks. Dmitry.
Arnaud Le Blanc wrote:
Hi,
Currently the way globals work forces to pass a thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which is
slow.
For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS variable
so
that accessing a global is as simple as global_name->member. This
removes
the
requirement of passing the tls pointer across function calls, so that
the
two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up things a
bit.
Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the __thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to bind
the
global pointer to its storage. This callback is declared in TSRMG_D[H]
().
As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its storage. A
new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration
of
globals.It as been tested on Linux compiled with --disable-all in CLI and a bit
in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed before
any
other thread calls ts_allocate_id or ts_resource_ex(), which is possibly
not
the case.The patch needs some tweaks and does not pretend to be included in any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
Regards,
Arnaud
Regards,
Arnaud
Hi,
On Thursday 21 August 2008 09:37:12 Dmitry Stogov wrote:
Arnaud Le Blanc wrote:
Hi,
On Tuesday 19 August 2008 09:22:46 Dmitry Stogov wrote:
Hi Arnaud,
Arnaud Le Blanc wrote:
Hi,
On Monday 18 August 2008 19:46:46 Dmitry Stogov wrote:
Hi Arnaud,
The patch looks very interesting.
I think it may be committed to the HEAD in the nearest future.
I don't have time to look into all details in the moment.Could you explain why --with-tsrm-full-__thread-tls doesn't work with
dlopen() however --with-tsrm-__thread-tls does?
That's due to the way TLS works internally. Actually I need further
reading on
that.
I don't see a big difference between --with-tsrm-full-__thread-tls and
--with-tsrm-__thread-tls from TLS point of view (may be I miss it), so I
don't understand why one works and the other doesn't.Badly both was not expected to work with dlopen() :(
http://marc.info/?l=php-internals&m=121912220705964&w=2It works when using an other TLS model (which requires PIC).
This model is less efficient but it still improves performance. PIC-patched
is
faster than non-PIC-unpatched, and the improvement is even greater against
PIC-unpatched.This is the thing I was afraid.
The patch looks more difficult for me than it should. I would prefer to
have only --with-tsrm-full-__thread-tls, if it works, as the patch would
be simple and PHP faster.Another simple solution, which you probably already tested, is to use
only global __thread-ed tsrm_ls (and don't pass/fetch it), however,
access thread-globals in the same way:
((*type)tsrm_ls[global_module_id])->global_fileldYes, I tested and it given 4.7s on bench.php (vs 3.8s with the current
patch).Does this model have the same issues with dlopen()?
(We have only one __thread-ed variable).
Yes and no. By reading the loader/linker code in the glibc I seen that it
reserves a number of memory especially for the dlopen() case. This number of
memory it too small for --with-tsrm-full-__thread-tls, but several times
larger than needed for --with-tsrm-__thread-tls. So the memory found by
dlopen() is not random and I think we can expect that to work on Linux.
However things are different on Windows and FreeBSD. I will post a RFC about
that to clarify things.
Thanks. Dmitry.
Anyway you did a great job. I would like to see this idea implemented in
HEAD. It is little bit late for 5.3 :(Thanks. Dmitry.
Did you test the patch with DSO extensions?
I will, but I guess that will be behaves like another shared library
dlopen()ed by Apache.It would be interesting to try the same idea on Windows with VC.
I will try too.Thanks. Dmitry.
Arnaud Le Blanc wrote:
Hi,
Currently the way globals work forces to pass a thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which is
slow.
For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS
variable
so
that accessing a global is as simple as global_name->member. This
removes
the
requirement of passing the tls pointer across function calls, so that
the
two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up things
a
bit.
Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and
TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the __thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to
bind
the
global pointer to its storage. This callback is declared in TSRMG_D[H]
().
As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its storage.
A
new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration
of
globals.It as been tested on Linux compiled with --disable-all in CLI and a
bit
in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed before
any
other thread calls ts_allocate_id or ts_resource_ex(), which is
possibly
not
the case.The patch needs some tweaks and does not pretend to be included in any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
Regards,
Arnaud
Regards,
Arnaud
Hi!
The following patch caches each global address in a native TLS variable so
that accessing a global is as simple as global_name->member. This removes the
requirement of passing the tls pointer across function calls, so that the two
major overheads of ZTS builds are avoided.
I think it would be great to use __thread there. But I think if we have
working __thread, why not have real globals use it, without all that
TSRMG stuff? Having 3 different variants of TSRM support seems excessive.
Now, the question is can we reliably detect if we have working __thread
- or, in other words, are there compilers which would accept __thread
but do not implement it correctly, and can those be identified
automatically?
If we use static declaration with __thread, then as far as I can see
there is no need for separate IDs and all complications following from
that.
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes troubles
to dlopen(), actually Apache wont load the module at runtime (it works with
just --with-tsrm-__thread-tls).
What is the problem there, could you elaborate?
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
Hi,
On Monday 18 August 2008 22:26:20 Stanislav Malyshev wrote:
Hi!
The following patch caches each global address in a native TLS variable so
that accessing a global is as simple as global_name->member. This removes
the
requirement of passing the tls pointer across function calls, so that the
two
major overheads of ZTS builds are avoided.I think it would be great to use __thread there. But I think if we have
working __thread, why not have real globals use it, without all that
TSRMG stuff? Having 3 different variants of TSRM support seems excessive.
I'm agree with you, but actually TSRM does more that just allocating and
storing globals. For instance it keeps track of constructors and destructors
so that it can call them automatically when a new thread starts or stops. It
also allows to retrieve the globals of an other thread, etc.
Now, the question is can we reliably detect if we have working __thread
- or, in other words, are there compilers which would accept __thread
but do not implement it correctly, and can those be identified
automatically?
For that I checked how the glibc chooses to use __thread or not. Actually it
just tries to compile a source like I do in the patch. But as TLS handling
needs the help of the libc itself, and the glibc knows it can handle TLS, I
guess we will need to do write a more complete test to check if it works and
if it works as we expect it to work.
If we use static declaration with __thread, then as far as I can see
there is no need for separate IDs and all complications following from
that.
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it works
with
just --with-tsrm-__thread-tls).What is the problem there, could you elaborate?
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
Regards,
Arnaud
Hi!
I'm agree with you, but actually TSRM does more that just allocating and
storing globals. For instance it keeps track of constructors and destructors
so that it can call them automatically when a new thread starts or stops. It
also allows to retrieve the globals of an other thread, etc.
OK, so we still need to register ctors/dtors, but we don't need to
access globals in a roundabout way anymore.
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
Hi Arnaud,
I remember that at the time we looked at thread local storage and there
were some real issues with it. I can't remember what as it was about 7+
years ago.
I will ask Zeev if he remembers and if not search my archives (don't
have years prior to 2007 indexed :'( ).
Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Saturday, August 16, 2008 7:19 PM
To: PHP Development
Subject: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Currently the way globals work forces to pass a thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which is
slow. For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS
variable so
that accessing a global is as simple as global_name->member. This
removes the
requirement of passing the tls pointer across function calls, so that
the two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up things
a bit.Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and
TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the __thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to
bind the
global pointer to its storage. This callback is declared in
TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its storage.
A new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration of
globals.It as been tested on Linux compiled with --disable-all in CLI and a
bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed before
any
other thread calls ts_allocate_id or ts_resource_ex(), which is
possibly not
the case.The patch needs some tweaks and does not pretend to be included in any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
Hi,
Yes, I have looked for the issue with --with-tsrm-full-__thread-tls and there
are effectively some issues.
When building PIC code, the used TLS model is a static model which does not
allow modules to be loaded at run-time. glibc's dlopen() sometimes allow such
code to be loaded at runtime when it finds some free memory, that's why --with-
tsrm-__thread-tls works, but it is not expected to always work.
So when building PIC code that's expected to work only when the
server/application links the PHP module, or when using LD_PRELOAD, etc.
Building non-PIC code allows to use a dynamic TLS model, which allows to load
modules a run-time, but it is less efficient (4.8s in bench.php, still faster
than unpatched version, even non-PIC, but less efficient).
Regards,
Arnaud
On Tuesday 19 August 2008 06:18:51 Andi Gutmans wrote:
Hi Arnaud,
I remember that at the time we looked at thread local storage and there
were some real issues with it. I can't remember what as it was about 7+
years ago.
I will ask Zeev if he remembers and if not search my archives (don't
have years prior to 2007 indexed :'( ).Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Saturday, August 16, 2008 7:19 PM
To: PHP Development
Subject: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Currently the way globals work forces to pass a thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which is
slow. For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS
variable so
that accessing a global is as simple as global_name->member. This
removes the
requirement of passing the tls pointer across function calls, so that
the two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up things
a bit.Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and
TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the __thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to
bind the
global pointer to its storage. This callback is declared in
TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its storage.
A new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration of
globals.It as been tested on Linux compiled with --disable-all in CLI and a
bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed before
any
other thread calls ts_allocate_id or ts_resource_ex(), which is
possibly not
the case.The patch needs some tweaks and does not pretend to be included in any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
I do not know exactly how this will behave with other web servers. I do the
module for Sun Java System Webservers (NSAPI) which normally uses pthreads
(on Solaris, Linux) but not on AIX (this is why PHP as NSAPI module does not
work on AIX).
The correct way for this webserver would be to use the macros/functions from
the NSAPI library to handle threads, which is implemented in TSRM [see
#ifdef NSAPI], but never used, because when compiling PHP the "NSAPI"
defines are only known to the NSAPI module, but not to the whole source tree
- so TSRM uses pthreads (which is clear then). The parallel CLI version of
PHP compiled with NSAPI threads would not work, too (if not using pthreads).
But for newer SJSWS servers, this is not a problem, if PHP's NSAPI always
uses pthreads - only AIX is broken (but this since years).
So: may there be issues with this server, too? I would like to test it, too,
but I need to eventually add code to sapi/nsapi, you may help me :)
Uwe Schindler
thetaphi@php.net - http://www.php.net
NSAPI SAPI developer
Bremen, Germany
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Tuesday, August 19, 2008 7:00 AM
To: Andi Gutmans
Cc: PHP Development
Subject: Re: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Yes, I have looked for the issue with --with-tsrm-full-__thread-tls and
there
are effectively some issues.When building PIC code, the used TLS model is a static model which does
not
allow modules to be loaded at run-time. glibc's dlopen() sometimes allow
such
code to be loaded at runtime when it finds some free memory, that's why --
with-
tsrm-__thread-tls works, but it is not expected to always work.So when building PIC code that's expected to work only when the
server/application links the PHP module, or when using LD_PRELOAD, etc.Building non-PIC code allows to use a dynamic TLS model, which allows to
load
modules a run-time, but it is less efficient (4.8s in bench.php, still
faster
than unpatched version, even non-PIC, but less efficient).Regards,
Arnaud
On Tuesday 19 August 2008 06:18:51 Andi Gutmans wrote:
Hi Arnaud,
I remember that at the time we looked at thread local storage and there
were some real issues with it. I can't remember what as it was about 7+
years ago.
I will ask Zeev if he remembers and if not search my archives (don't
have years prior to 2007 indexed :'( ).Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Saturday, August 16, 2008 7:19 PM
To: PHP Development
Subject: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Currently the way globals work forces to pass a thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which is
slow. For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS
variable so
that accessing a global is as simple as global_name->member. This
removes the
requirement of passing the tls pointer across function calls, so that
the two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up things
a bit.Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and
TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the __thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to
bind the
global pointer to its storage. This callback is declared in
TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its storage.
A new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration of
globals.It as been tested on Linux compiled with --disable-all in CLI and a
bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed before
any
other thread calls ts_allocate_id or ts_resource_ex(), which is
possibly not
the case.The patch needs some tweaks and does not pretend to be included in any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
Hi,
On Tuesday 19 August 2008 09:52:24 Uwe Schindler wrote:
I do not know exactly how this will behave with other web servers. I do the
module for Sun Java System Webservers (NSAPI) which normally uses pthreads
(on Solaris, Linux) but not on AIX (this is why PHP as NSAPI module does not
work on AIX).The correct way for this webserver would be to use the macros/functions from
the NSAPI library to handle threads, which is implemented in TSRM [see
#ifdef NSAPI], but never used, because when compiling PHP the "NSAPI"
defines are only known to the NSAPI module, but not to the whole source tree
- so TSRM uses pthreads (which is clear then). The parallel CLI version of
PHP compiled with NSAPI threads would not work, too (if not using pthreads).But for newer SJSWS servers, this is not a problem, if PHP's NSAPI always
uses pthreads - only AIX is broken (but this since years).So: may there be issues with this server, too? I would like to test it, too,
but I need to eventually add code to sapi/nsapi, you may help me :)
I think there will be nothing to change in SAPIs to work with that as long as
the platform supports __thread. I think the patch should work as-is on Linux
(with PHP compiled with --with-pic). It seems Solaris has an implementation
very close to the one used on Linux, so it should work on Solaris too.
Regards,
Arnaud
Uwe Schindler
thetaphi@php.net - http://www.php.net
NSAPI SAPI developer
Bremen, Germany-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Tuesday, August 19, 2008 7:00 AM
To: Andi Gutmans
Cc: PHP Development
Subject: Re: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Yes, I have looked for the issue with --with-tsrm-full-__thread-tls and
there
are effectively some issues.When building PIC code, the used TLS model is a static model which does
not
allow modules to be loaded at run-time. glibc's dlopen() sometimes allow
such
code to be loaded at runtime when it finds some free memory, that's why --
with-
tsrm-__thread-tls works, but it is not expected to always work.So when building PIC code that's expected to work only when the
server/application links the PHP module, or when using LD_PRELOAD, etc.Building non-PIC code allows to use a dynamic TLS model, which allows to
load
modules a run-time, but it is less efficient (4.8s in bench.php, still
faster
than unpatched version, even non-PIC, but less efficient).Regards,
Arnaud
On Tuesday 19 August 2008 06:18:51 Andi Gutmans wrote:
Hi Arnaud,
I remember that at the time we looked at thread local storage and there
were some real issues with it. I can't remember what as it was about 7+
years ago.
I will ask Zeev if he remembers and if not search my archives (don't
have years prior to 2007 indexed :'( ).Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Saturday, August 16, 2008 7:19 PM
To: PHP Development
Subject: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Currently the way globals work forces to pass a thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which is
slow. For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS
variable so
that accessing a global is as simple as global_name->member. This
removes the
requirement of passing the tls pointer across function calls, so that
the two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up things
a bit.Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and
TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the __thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to
bind the
global pointer to its storage. This callback is declared in
TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its storage.
A new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration of
globals.It as been tested on Linux compiled with --disable-all in CLI and a
bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls) causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed before
any
other thread calls ts_allocate_id or ts_resource_ex(), which is
possibly not
the case.The patch needs some tweaks and does not pretend to be included in any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
OK checked with Zeev. It seems there are some significant limitations at
least on Windows including that it doesn't work with LoadLibrary()
(which is our bread and butter).
There may also be some size limitations on thread local storage.
In any case, this is Windows-only feedback and we may find additional
limitations/compatibility issues on other platforms.
As there are clear benefits to an increased use of TLS (we already use
it for index caching today) I definitely suggest to revisit this issue
and try and figure out for the variety of platforms whether there's some
middle ground that we can make work. It sounds like a non-trivial
project though.
Arnaud, are you setup to also play around with Windows and possibly some
other OSes to look into this further?
Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Monday, August 18, 2008 10:00 PM
To: Andi Gutmans
Cc: PHP Development
Subject: Re: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Yes, I have looked for the issue with --with-tsrm-full-__thread-tls
and there
are effectively some issues.When building PIC code, the used TLS model is a static model which
does not
allow modules to be loaded at run-time. glibc's dlopen() sometimes
allow such
code to be loaded at runtime when it finds some free memory, that's
why --
with-
tsrm-__thread-tls works, but it is not expected to always work.So when building PIC code that's expected to work only when the
server/application links the PHP module, or when using LD_PRELOAD,
etc.Building non-PIC code allows to use a dynamic TLS model, which allows
to load
modules a run-time, but it is less efficient (4.8s in bench.php, still
faster
than unpatched version, even non-PIC, but less efficient).Regards,
Arnaud
On Tuesday 19 August 2008 06:18:51 Andi Gutmans wrote:
Hi Arnaud,
I remember that at the time we looked at thread local storage and
there
were some real issues with it. I can't remember what as it was about
7+
years ago.
I will ask Zeev if he remembers and if not search my archives (don't
have years prior to 2007 indexed :'( ).Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Saturday, August 16, 2008 7:19 PM
To: PHP Development
Subject: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Currently the way globals work forces to pass a
thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which
is
slow. For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS
variable so
that accessing a global is as simple as global_name->member. This
removes the
requirement of passing the tls pointer across function calls, so
that
the two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up
things
a bit.Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and
TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the
__thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to
bind the
global pointer to its storage. This callback is declared in
TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its
storage.
A new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables
the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration of
globals.It as been tested on Linux compiled with --disable-all in CLI and
a
bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls)
causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it
works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed
before
any
other thread calls ts_allocate_id or ts_resource_ex(), which is
possibly not
the case.The patch needs some tweaks and does not pretend to be included in
any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
Hi,
On Tuesday 19 August 2008 18:22:44 Andi Gutmans wrote:
OK checked with Zeev. It seems there are some significant limitations at
least on Windows including that it doesn't work with LoadLibrary()
(which is our bread and butter).
Ok, so there is the same limitations than on Linux with dlopen() :(
There may also be some size limitations on thread local storage.
In any case, this is Windows-only feedback and we may find additional
limitations/compatibility issues on other platforms.As there are clear benefits to an increased use of TLS (we already use
it for index caching today) I definitely suggest to revisit this issue
and try and figure out for the variety of platforms whether there's some
middle ground that we can make work. It sounds like a non-trivial
project though.
Arnaud, are you setup to also play around with Windows and possibly some
other OSes to look into this further?
Yes, I will try at least with Windows (XP, so no IIS) and FreeBSD.
Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Monday, August 18, 2008 10:00 PM
To: Andi Gutmans
Cc: PHP Development
Subject: Re: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Yes, I have looked for the issue with --with-tsrm-full-__thread-tls
and there
are effectively some issues.When building PIC code, the used TLS model is a static model which
does not
allow modules to be loaded at run-time. glibc's dlopen() sometimes
allow such
code to be loaded at runtime when it finds some free memory, that's
why --
with-
tsrm-__thread-tls works, but it is not expected to always work.So when building PIC code that's expected to work only when the
server/application links the PHP module, or when using LD_PRELOAD,
etc.Building non-PIC code allows to use a dynamic TLS model, which allows
to load
modules a run-time, but it is less efficient (4.8s in bench.php, still
faster
than unpatched version, even non-PIC, but less efficient).Regards,
Arnaud
On Tuesday 19 August 2008 06:18:51 Andi Gutmans wrote:
Hi Arnaud,
I remember that at the time we looked at thread local storage and
there
were some real issues with it. I can't remember what as it was about
7+
years ago.
I will ask Zeev if he remembers and if not search my archives (don't
have years prior to 2007 indexed :'( ).Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Saturday, August 16, 2008 7:19 PM
To: PHP Development
Subject: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Currently the way globals work forces to pass a
thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which
is
slow. For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS
variable so
that accessing a global is as simple as global_name->member. This
removes the
requirement of passing the tls pointer across function calls, so
that
the two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up
things
a bit.Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and
TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the
__thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to
bind the
global pointer to its storage. This callback is declared in
TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its
storage.
A new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables
the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration of
globals.It as been tested on Linux compiled with --disable-all in CLI and
a
bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls)
causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it
works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed
before
any
other thread calls ts_allocate_id or ts_resource_ex(), which is
possibly not
the case.The patch needs some tweaks and does not pretend to be included in
any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
--
Regards,
Arnaud
2008/8/20 Arnaud Le Blanc arnaud.lb@gmail.com:
Hi,
On Tuesday 19 August 2008 18:22:44 Andi Gutmans wrote:
OK checked with Zeev. It seems there are some significant limitations at
least on Windows including that it doesn't work with LoadLibrary()
(which is our bread and butter).Ok, so there is the same limitations than on Linux with dlopen() :(
There may also be some size limitations on thread local storage.
In any case, this is Windows-only feedback and we may find additional
limitations/compatibility issues on other platforms.As there are clear benefits to an increased use of TLS (we already use
it for index caching today) I definitely suggest to revisit this issue
and try and figure out for the variety of platforms whether there's some
middle ground that we can make work. It sounds like a non-trivial
project though.
Arnaud, are you setup to also play around with Windows and possibly some
other OSes to look into this further?Yes, I will try at least with Windows (XP, so no IIS) and FreeBSD.
Windows XP Professional have IIS, all you need is your CD and then
install it from the Add/Remove programs. It's IIS 5.1, which should be
good enough.
Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Monday, August 18, 2008 10:00 PM
To: Andi Gutmans
Cc: PHP Development
Subject: Re: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Yes, I have looked for the issue with --with-tsrm-full-__thread-tls
and there
are effectively some issues.When building PIC code, the used TLS model is a static model which
does not
allow modules to be loaded at run-time. glibc's dlopen() sometimes
allow such
code to be loaded at runtime when it finds some free memory, that's
why --
with-
tsrm-__thread-tls works, but it is not expected to always work.So when building PIC code that's expected to work only when the
server/application links the PHP module, or when using LD_PRELOAD,
etc.Building non-PIC code allows to use a dynamic TLS model, which allows
to load
modules a run-time, but it is less efficient (4.8s in bench.php, still
faster
than unpatched version, even non-PIC, but less efficient).Regards,
Arnaud
On Tuesday 19 August 2008 06:18:51 Andi Gutmans wrote:
Hi Arnaud,
I remember that at the time we looked at thread local storage and
there
were some real issues with it. I can't remember what as it was about
7+
years ago.
I will ask Zeev if he remembers and if not search my archives (don't
have years prior to 2007 indexed :'( ).Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Saturday, August 16, 2008 7:19 PM
To: PHP Development
Subject: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Currently the way globals work forces to pass a
thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which
is
slow. For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS
variable so
that accessing a global is as simple as global_name->member. This
removes the
requirement of passing the tls pointer across function calls, so
that
the two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up
things
a bit.Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and
TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the
__thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to
bind the
global pointer to its storage. This callback is declared in
TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its
storage.
A new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables
the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration of
globals.It as been tested on Linux compiled with --disable-all in CLI and
a
bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls)
causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it
works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed
before
any
other thread calls ts_allocate_id or ts_resource_ex(), which is
possibly not
the case.The patch needs some tweaks and does not pretend to be included in
any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
--
Regards,
Arnaud
--
--
Kalle Sommer Nielsen
Hi,
On Wednesday 20 August 2008 21:51:05 Kalle Sommer Nielsen wrote:
2008/8/20 Arnaud Le Blanc arnaud.lb@gmail.com:
Hi,
On Tuesday 19 August 2008 18:22:44 Andi Gutmans wrote:
OK checked with Zeev. It seems there are some significant limitations at
least on Windows including that it doesn't work with LoadLibrary()
(which is our bread and butter).Ok, so there is the same limitations than on Linux with dlopen() :(
There may also be some size limitations on thread local storage.
In any case, this is Windows-only feedback and we may find additional
limitations/compatibility issues on other platforms.As there are clear benefits to an increased use of TLS (we already use
it for index caching today) I definitely suggest to revisit this issue
and try and figure out for the variety of platforms whether there's some
middle ground that we can make work. It sounds like a non-trivial
project though.
Arnaud, are you setup to also play around with Windows and possibly some
other OSes to look into this further?Yes, I will try at least with Windows (XP, so no IIS) and FreeBSD.
Windows XP Professional have IIS, all you need is your CD and then
install it from the Add/Remove programs. It's IIS 5.1, which should be
good enough.
Thanks, I didn't known that. I did not have a Professional version of Windows
but I will try to get one.
Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Monday, August 18, 2008 10:00 PM
To: Andi Gutmans
Cc: PHP Development
Subject: Re: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Yes, I have looked for the issue with --with-tsrm-full-__thread-tls
and there
are effectively some issues.When building PIC code, the used TLS model is a static model which
does not
allow modules to be loaded at run-time. glibc's dlopen() sometimes
allow such
code to be loaded at runtime when it finds some free memory, that's
why --
with-
tsrm-__thread-tls works, but it is not expected to always work.So when building PIC code that's expected to work only when the
server/application links the PHP module, or when using LD_PRELOAD,
etc.Building non-PIC code allows to use a dynamic TLS model, which allows
to load
modules a run-time, but it is less efficient (4.8s in bench.php, still
faster
than unpatched version, even non-PIC, but less efficient).Regards,
Arnaud
On Tuesday 19 August 2008 06:18:51 Andi Gutmans wrote:
Hi Arnaud,
I remember that at the time we looked at thread local storage and
there
were some real issues with it. I can't remember what as it was about
7+
years ago.
I will ask Zeev if he remembers and if not search my archives (don't
have years prior to 2007 indexed :'( ).Andi
-----Original Message-----
From: Arnaud Le Blanc [mailto:arnaud.lb@gmail.com]
Sent: Saturday, August 16, 2008 7:19 PM
To: PHP Development
Subject: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Currently the way globals work forces to pass a
thread-local-storage
pointer
across function calls, which involves some overhead. Also, not all
functions
get the pointer as argument and need to use TSRMLS_FETCH(), which
is
slow. For
instance emalloc() involves a TSRMLS_FETCH(). An other overhead is
accessing
globals, using multiple pointers in different locations.The following patch caches each global address in a native TLS
variable so
that accessing a global is as simple as global_name->member. This
removes the
requirement of passing the tls pointer across function calls, so
that
the two
major overheads of ZTS builds are avoided.Globals can optionally be declared statically, which speeds up
things
a bit.Results in bench.php:
non-ZTS: 3.7s
ZTS unpatched: 5.2s
ZTS patched: 4.0s
ZTS patched and static globals: 3.8sThe patch introduces two new macros: TSRMG_D() (declare) and
TSRMG_DH()
(declare, for headers) to declare globals, instead of the current
"ts_rsrc_id
foo_global_id". These macros declare the global id, plus the
__thread
pointer
to the global storage.ts_allocate_id now takes one more callback function as argument to
bind the
global pointer to its storage. This callback is declared in
TSRMG_DH.As all TSRMLS_* macros now does nothing, it is needed to call
ts_resource(0)
explicitly at least one time in each thread to initialize its
storage.
A new
TSRMLS_INIT() macro as been added for this purpose.All this is disabled by default. --with-tsrm-__thread-tls enables
the
features
of the patch, and --with-tsrm-full-__thread-tls enables static
declaration of
globals.It as been tested on Linux compiled with --disable-all in CLI and
a
bit in
Apache2 with the worker MPM. Known issues:
- Declaring globals statically (--with-tsrm-full-__thread-tls)
causes
troubles
to dlopen(), actually Apache wont load the module at runtime (it
works
with
just --with-tsrm-__thread-tls).- The patch assumes that all resources are ts_allocate_id()'ed
before
any
other thread calls ts_allocate_id or ts_resource_ex(), which is
possibly not
the case.The patch needs some tweaks and does not pretend to be included in
any
branch,
but I would like to have some comments on it.The patch: http://arnaud.lb.s3.amazonaws.com/__thread-tls.patch
Regards,
Arnaud
--
Regards,
Arnaud
--
--
Kalle Sommer Nielsen
Regards,
Arnaud
Hi,
Just as a clarification, it is Johannes and my understanding that this
is not targeted for inclusion into 5.3.0.
regards,
Lukas
I think that's a fair assumption given there's a lot more research that has to be done on this topic.
Andi
-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Wednesday, August 20, 2008 9:40 AM
To: Arnaud Le Blanc
Cc: PHP Development; Johannes Schlüter
Subject: Re: [PHP-DEV] [PATCH] ZTS as fast as non-ZTSHi,
Just as a clarification, it is Johannes and my understanding that this
is not targeted for inclusion into 5.3.0.regards,
Lukas