When running a benchmarking workload on PHP that was configured with
multi-threading support (--enable-maintainer-zts) I noticed that
pthread_get_specific is invoked many times during the processing of a
request. PHP code invokes TSRMLS_FETCH() (which ends up invoking
ts_resource_ex) in a number of places.
Caller/callee data from Sun Studio's collector/analyzer showed the following:
Attr. Excl. Incl. Name
User CPU User CPU User CPU
sec. sec. sec.
178.105 185.460 363.564 _emalloc
96.568 114.320 210.888 _efree
27.960 89.232 343.901 _zval_ptr_dtor
19.544 6.685 26.228 php_body_write_wrapper
6.925 36.806 224.617 _zval_dtor_func
4.263 2.902 7.165 safe_free_zval_ptr_rel
4.163 11.898 16.061 zend_get_parameters_ex
4.013 14.690 174.682 my_copy_zval
3.963 6.775 10.738 _erealloc
3.502 12.399 978.444 apc_copy_function_for_execution
2.732 4.143 9.647 do_inherit_method_check
2.592 21.565 225.137 _zval_copy_ctor_func
0.881 22.095 106.535 virtual_file_ex
0.600 1.961 6.855 list_entry_destructor
0.470 1.301 24.397 zend_file_handle_dtor
0.410 1.781 2.712 zend_function_dtor
0.270 0.350 0.620 convert_to_array
0.220 0.991 15.831 apc_search_paths
0.150 0.490 3.362 zend_register_resource
0.140 1.581 10.137 zend_alter_ini_entry
0.130 4.833 9023.272 php5_execute
0.110 0.500 3.502 zend_ini_long
0.070 0.530 0.600 _zend_bailout
0.050 0.320 4.513 zend_error
0.040 0.690 3.913 php_error_cb
0.040 0.560 2.852 zend_alter_ini_entry_ex
-
3.202 584.369 php_request_shutdown
274.252 274.252 357.910 *ts_resource_ex
83.659 84.749 84.749 pthread_getspecific
Propagating the value of TSRMLS_CC will avoid the overhead of having
to invoke tsrm_tls_get/pthread_get_specific. The following patch
(against trunk) does this:
http://bitbucket.org/arvi/arviq/src/tip/svn-TSRM-patch.txt
The above patch is mostly the result of making zend_alloc.[ch] TSRMLS_C-aware.
While modifying the APIs to add propagate TSRMLS_C, I didn't quite
know the best/preferred way to make the changes so I tried to use the
following guidelines:
(1) If the number of occurrences of the API to be modified are
relatively small, then make the change directly.
e.g.
-int fcgi_accept_request(fcgi_request *req);
-int fcgi_finish_request(fcgi_request *req, int force_close);
+int fcgi_accept_request(fcgi_request *req TSRMLS_DC);
+int fcgi_finish_request(fcgi_request *req, int force_close TSRMLS_DC);
(2) If the number of occurrences is large, then rename the function
(prefix with _) and add a macro that passes TSRMLS_C
e.g.
-ZEND_API void zend_hash_destroy(HashTable *ht);
+ZEND_API void _zend_hash_destroy(HashTable *ht TSRMLS_DC);
+#define zend_hash_destroy(ht) _zend_hash_destroy((ht) TSRMLS_CC)
(3) For functions that have varargs, pass TSRMLS_C as the last but one parameter
e.g.
-ZEND_API int zend_get_parameters(int ht, int param_count, ...);
+ZEND_API int zend_get_parameters(int ht TSRMLS_DC, int param_count, ...);
(4) For single argument functions that have varags, pass TSRMLS_C as
the first parameter
e.g.
-ZEND_API int zend_get_parameters_ex(int param_count, ...) /* {{{ /
+ZEND_API int zend_get_parameters_ex(TSRMLS_DC1 int param_count, ...) / {{{ */
where
+#define TSRMLS_DC1 TSRMLS_D,
+#define TSRMLS_CC1 TSRMLS_C,
Is (2) an acceptable thing to do or should I avoid renaming functions
and do a mongo search-replace?
With respect to (4), I wasn't sure how else to tackle varargs
functions that take a single parameter. Any direction on what I should
do for (3), (4) would be much appreciated.
Out of curiosity, I counted the number of times
thread_resources = tsrm_tls_get();
in ts_resource_ex is invoked for a simple command line invocation of
hello.php (<?php echo "Hello World"?>).
Before patch: 22125 times
After patch: 119 times
Most of this is probably in one-time initialization stuff. I hope to
have my trunk tree hooked up to my benchmarking setup soon and then
I'll have more meaningful numbers.
Thanks,
Arvi
hi Arvind!
Yes, to get rid of TSRMLS_FETCH is a very good thing as it is horribly
slow (as you noticed). However it can't be done in 5.3 as it will
break the ABI.
About the patch itself, as it will break ABI anyway, I would go wit
the solution 1) only. That's cleaner and consistent with what we
usually do.
thanks for your work :)
When running a benchmarking workload on PHP that was configured with
multi-threading support (--enable-maintainer-zts) I noticed that
pthread_get_specific is invoked many times during the processing of a
request. PHP code invokes TSRMLS_FETCH() (which ends up invoking
ts_resource_ex) in a number of places.
(1) If the number of occurrences of the API to be modified are
relatively small, then make the change directly.
e.g.
-int fcgi_accept_request(fcgi_request *req);
-int fcgi_finish_request(fcgi_request *req, int force_close);
+int fcgi_accept_request(fcgi_request *req TSRMLS_DC);
+int fcgi_finish_request(fcgi_request *req, int force_close TSRMLS_DC);
Cheers,
Pierre
Hi Pierre,
slow (as you noticed). However it can't be done in 5.3 as it will
break the ABI.
I noticed that a recent fix (http://bugs.php.net/bug.php?id=49936)
added TSRMLS_DC to an API in 5.3.
http://svn.php.net/viewvc/php/php-src/branches/PHP_5_3/main/streams/php_stream_context.h?r1=290796&r2=290795&pathrev=290796
Wouldn't that need to be reverted as it breaks the ABI?
About the patch itself, as it will break ABI anyway, I would go wit
the solution 1) only. That's cleaner and consistent with what we
usually do.
Ok, I'll get rid of the _ prefix diffs. Do my approaches (3) (4) for
varargs look ok?
Thanks for your review.
arvi
2009/11/30 Arvind Srinivasan yoarvi@gmail.com
Hi Pierre,
slow (as you noticed). However it can't be done in 5.3 as it will
break the ABI.I noticed that a recent fix (http://bugs.php.net/bug.php?id=49936)
added TSRMLS_DC to an API in 5.3.Wouldn't that need to be reverted as it breaks the ABI?
It was reverted.
--
Regards,
Felipe Pena
It was reverted.
ah..i missed that. sorry.
Hi Pierre,
slow (as you noticed). However it can't be done in 5.3 as it will
break the ABI.I noticed that a recent fix (http://bugs.php.net/bug.php?id=49936)
added TSRMLS_DC to an API in 5.3.
http://svn.php.net/viewvc/php/php-src/branches/PHP_5_3/main/streams/php_stream_context.h?r1=290796&r2=290795&pathrev=290796Wouldn't that need to be reverted as it breaks the ABI?
How about the idea below:
you can introduce new functions/macros with TSRMLSx propagated in, in v5.3
without changing existing API exposed to extensions, and make core using new
API.
Old functions/macros can be marked as deprecated. This will preseve ABI in
v5.3 :)
As of v6, you don't have to preserve ABI, so you can do replace the
functions with new ones at once (introduce new, remove old, get all parts
using new).
Hmm and the diff will be big...
BTW, did you try to avoid pthread_get_specific() calls now and forever?
Something like just one pointer in static TLS would do.
http://people.redhat.com/drepper/tls.pdf
http://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html
Since static TLS storage is quite a limited resource under some OSes, I
think it will be wise to use only one thread-specific pointer in TLS and
this pointer will point to the thread-specific array of pointers (not in TLS
but in regular heap) and these pointers will point to all thread-globals
like CG(), EG(), etc. The index of the array will be well-known id returned
by ts_allocate_id.
These arrays should be big enough to allow new globals to be added in
run-time without re-allocating all the arrays in all currently running
threads (see how ts_allocate_id() extends the arrays with pointers one by
one, locking all threads).
Structure and dereferencing will be exactly the same as currently
implemented in TSRM.h, only storage will be declared global with __thread
prefix and all related calls like TlsAlloc(), pthread_key_create would be
removed.
-jv
Hi,
(1) If the number of occurrences of the API to be modified are
relatively small, then make the change directly.
e.g.
-int fcgi_accept_request(fcgi_request *req);
-int fcgi_finish_request(fcgi_request *req, int force_close);
+int fcgi_accept_request(fcgi_request *req TSRMLS_DC);
+int fcgi_finish_request(fcgi_request *req, int force_close TSRMLS_DC);
right
(2) If the number of occurrences is large, then rename the function
(prefix with _) and add a macro that passes TSRMLS_C
e.g.
-ZEND_API void zend_hash_destroy(HashTable *ht);
+ZEND_API void _zend_hash_destroy(HashTable *ht TSRMLS_DC);
+#define zend_hash_destroy(ht) _zend_hash_destroy((ht) TSRMLS_CC)
I'm not too happy about hiding the TSRM macro with another macro as it
makes reading the code harder. When writing code I want to spot easily
whether my function needs TSRM or not.
When adding such a macro that should be done for making the code better
readable, not worse.
(3) For functions that have varargs, pass TSRMLS_C as the last but one parameter
e.g.
-ZEND_API int zend_get_parameters(int ht, int param_count, ...);
+ZEND_API int zend_get_parameters(int ht TSRMLS_DC, int param_count, ...);
right.
(4) For single argument functions that have varags, pass TSRMLS_C as
the first parameter
e.g.
-ZEND_API int zend_get_parameters_ex(int param_count, ...) /* {{{ /
+ZEND_API int zend_get_parameters_ex(TSRMLS_DC1 int param_count, ...) / {{{ */where
+#define TSRMLS_DC1 TSRMLS_D,
+#define TSRMLS_CC1 TSRMLS_C,
Are there more cases of this? - If zend_get_parameters is the only case
I'd keep the TSRMLS_FETCH in it. zend_get_parameters is deprecated and
shouldn't be used anymore. But adding a new macro makes it harder for
new people to get started with the code.
johannes
hi johannes,
I'm not too happy about hiding the TSRM macro with another macro as it
makes reading the code harder. When writing code I want to spot easily
whether my function needs TSRM or not.When adding such a macro that should be done for making the code better
readable, not worse.
I'm currently working on getting rid of this change and instead
explicitly passing TSRMLS_C.
where
+#define TSRMLS_DC1 TSRMLS_D,
+#define TSRMLS_CC1 TSRMLS_C,Are there more cases of this? - If zend_get_parameters is the only case
multi_convert_to_long_ex
multi_convert_to_double_ex
multi_convert_to_string_ex
are the other use cases of this.
thanks,
arvi
(2) If the number of occurrences is large, then rename the function
(prefix with _) and add a macro that passes TSRMLS_C
e.g.
-ZEND_API void zend_hash_destroy(HashTable *ht);
+ZEND_API void _zend_hash_destroy(HashTable *ht TSRMLS_DC);
+#define zend_hash_destroy(ht) _zend_hash_destroy((ht) TSRMLS_CC)
With the exception of the APIs mentioned below, I've undone this
change and instead done a global search/replace of all the functions
that needed to pass TSRMLS_C as a parameter.
The only cases that I couldn't quite do this for are:
convert_to_long
convert_to_double
convert_to_null
convert_to_boolean
convert_to_object
convert_to_array
in zend_operators.h
The problem is that currently convert_to_string is already a macro
that calls _convert_to_string passing 2 parameters whereas the above
are functions that take only 1 parameter. conv_object_to_type expects
all these APIs to have the same signature.
(4) For single argument functions that have varags, pass TSRMLS_C as
the first parameter
e.g.
-ZEND_API int zend_get_parameters_ex(int param_count, ...) /* {{{ /
+ZEND_API int zend_get_parameters_ex(TSRMLS_DC1 int param_count, ...) / {{{ */where
+#define TSRMLS_DC1 TSRMLS_D,
+#define TSRMLS_CC1 TSRMLS_C,Are there more cases of this? - If zend_get_parameters is the only case
I'd keep the TSRMLS_FETCH in it. zend_get_parameters is deprecated and
shouldn't be used anymore. But adding a new macro makes it harder for
new people to get started with the code.
I've backed out this change as well and instead used TSRMLS_FETCH() in
those APIs that used TSRMLS_DC1/TSRMLS_CC1 (in my previous diff). I'll
revisit those if they show up as performance hotspots.
http://bitbucket.org/arvi/arviq/src/37f1d6b045b0/svn-TSRM-patch.txt
contains the updated diff against trunk.
I've compiled and run 'make test' on this after configuring it with
the default module set and with/without --enable-maintainer-zts.
Thanks,
Arvi