Today I got the error from bug #25916 several times on our webserver.
Looking through the code I found out the following:
- It depends NOT on the fact if there is a parameter to
get_browser()
or not - It happens sometimes when server is very heavy loaded, the homepage of
the domain uses theget_browser()
function and is the most visited page.
So it must be a multithreading issue (NSAPI is a multithreading webserver).
And I have an idea:
Line 257 uses:
zend_hash_apply_with_arguments(&browser_hash, (apply_func_args_t)
browser_reg_compare, 2, lookup_browser_name, &found_browser_entry);
This is the only function in this context in zend_hash.c which uses the
Recursion protection with
#define
HASH_PROTECT_RECURSION(ht)
if ((ht)->bApplyProtection)
{
if ((ht)->nApplyCount++ >= 3)
{
zend_error(E_ERROR, "Nesting level too deep -
recursive dependency?");
}
}
The browser hashtable is a global variable in browscap.c and can be used by
more than one call to get_browser()
even at the same time. So if one
zend_hash_apply_with_arguments() locks the hashtable and a second and third
thread tries to do that you will get the error, because (ht)->nApplyCount++
raises and raises...
This evening I will try to put a mutex at the beginning of get_browser to
prevent more threads running at the same time there. But as I see this,
this zend_hash_apply function is used very often could there be other
effects if a global variable is a hashtable?
Only one question: Is there a special PHP way to use mutexes? I am not
familar in Zend programming (I do only SAPI...)
Uwe Schindler
thetaphi@php.net - http://www.php.net
NSAPI SAPI developer
Erlangen, Germany
We have thread-safe hashes in php5; browscap should probably
use one of those there. If you want to roll your own protection,
take a look at tsrm_mutex_lock() and tsrm_mutex_unlock() and how
they are used in ext/yaz.
--Wez.
----- Original Message -----
From: "Uwe Schindler" uwe@thetaphi.de
To: jay@php.net; internals@lists.php.net
Sent: Wednesday, December 03, 2003 5:06 PM
Subject: [PHP-DEV] browscap and nesting level too deep (bug #25916)
Today I got the error from bug #25916 several times on our webserver.
Looking through the code I found out the following:
- It depends NOT on the fact if there is a parameter to
get_browser()
or
not- It happens sometimes when server is very heavy loaded, the homepage of
the domain uses theget_browser()
function and is the most visited page.So it must be a multithreading issue (NSAPI is a multithreading
webserver).
And I have an idea:
Line 257 uses:
zend_hash_apply_with_arguments(&browser_hash, (apply_func_args_t)
browser_reg_compare, 2, lookup_browser_name, &found_browser_entry);This is the only function in this context in zend_hash.c which uses the
Recursion protection with
#define
HASH_PROTECT_RECURSION(ht)
if ((ht)->bApplyProtection)
{
if ((ht)->nApplyCount++ >= 3)
{
zend_error(E_ERROR, "Nesting level too deep -
recursive dependency?");
}
}The browser hashtable is a global variable in browscap.c and can be used
by
more than one call toget_browser()
even at the same time. So if one
zend_hash_apply_with_arguments() locks the hashtable and a second and
third
thread tries to do that you will get the error, because
(ht)->nApplyCount++
raises and raises...This evening I will try to put a mutex at the beginning of get_browser to
prevent more threads running at the same time there. But as I see this,
this zend_hash_apply function is used very often could there be other
effects if a global variable is a hashtable?Only one question: Is there a special PHP way to use mutexes? I am not
familar in Zend programming (I do only SAPI...)
Hello Uwe,
Wednesday, December 3, 2003, 6:06:13 PM, you wrote:
Today I got the error from bug #25916 several times on our webserver.
Looking through the code I found out the following:
- It depends NOT on the fact if there is a parameter to
get_browser()
or not- It happens sometimes when server is very heavy loaded, the homepage of
the domain uses theget_browser()
function and is the most visited page.
So it must be a multithreading issue (NSAPI is a multithreading webserver).
And I have an idea:
Line 257 uses:
zend_hash_apply_with_arguments(&browser_hash, (apply_func_args_t)
browser_reg_compare, 2, lookup_browser_name, &found_browser_entry);
This is the only function in this context in zend_hash.c which uses the
Recursion protection with
#define
HASH_PROTECT_RECURSION(ht)
if ((ht)->bApplyProtection)
{
if ((ht)->nApplyCount++ >= 3)
{
zend_error(E_ERROR, "Nesting level too deep -
recursive dependency?");
}
}
The browser hashtable is a global variable in browscap.c and can be used by
more than one call toget_browser()
even at the same time. So if one
zend_hash_apply_with_arguments() locks the hashtable and a second and third
thread tries to do that you will get the error, because (ht)->nApplyCount++
raises and raises...
This evening I will try to put a mutex at the beginning of get_browser to
prevent more threads running at the same time there. But as I see this,
this zend_hash_apply function is used very often could there be other
effects if a global variable is a hashtable?
Only one question: Is there a special PHP way to use mutexes? I am not
familar in Zend programming (I do only SAPI...)
Why not simply use external iteration? You don't add or modify the browser
list right?
--
Best regards,
Marcus mailto:helly@php.net
This evening I will try to put a mutex at the beginning of get_browser
to prevent more threads running at the same time there. But as I see
this, this zend_hash_apply function is used very often could there be
other effects if a global variable is a hashtable?Only one question: Is there a special PHP way to use mutexes? I am not
familar in Zend programming (I do only SAPI...)
Did you take a look at zend_ts_hash (only in php5) and tsrm_mutex_*() ?
I'm not quite sure if the facility is enough tested and
really thread-safe though.
Moriyoshi
One solution (attached is the patch, if nobody has someone against it I
will apply it):
I switch off the recursion protection for the browscap hash in
zend_hash_init_ex because this hash has no recursive things in it and is
not modified after it is created.
Uwe
At 18:06 03.12.2003, Uwe Schindler wrote:
Today I got the error from bug #25916 several times on our webserver.
Looking through the code I found out the following:
- It depends NOT on the fact if there is a parameter to
get_browser()
or not- It happens sometimes when server is very heavy loaded, the homepage of
the domain uses theget_browser()
function and is the most visited page.So it must be a multithreading issue (NSAPI is a multithreading
webserver). And I have an idea:
Line 257 uses:
zend_hash_apply_with_arguments(&browser_hash, (apply_func_args_t)
browser_reg_compare, 2, lookup_browser_name, &found_browser_entry);This is the only function in this context in zend_hash.c which uses the
Recursion protection with
#define HASH_PROTECT_RECURSION(ht)
if ((ht)->bApplyProtection) {
if ((ht)->nApplyCount++ >= 3) {
zend_error(E_ERROR, "Nesting level too deep -
recursive dependency?");
}
}The browser hashtable is a global variable in browscap.c and can be used
by more than one call toget_browser()
even at the same time. So if one
zend_hash_apply_with_arguments() locks the hashtable and a second and
third thread tries to do that you will get the error, because
(ht)->nApplyCount++ raises and raises...This evening I will try to put a mutex at the beginning of get_browser to
prevent more threads running at the same time there. But as I see this,
this zend_hash_apply function is used very often could there be other
effects if a global variable is a hashtable?Only one question: Is there a special PHP way to use mutexes? I am not
familar in Zend programming (I do only SAPI...)
Uwe Schindler
thetaphi@php.net - http://www.php.net
NSAPI SAPI developer
Erlangen, Germany
Uwe Schindler wrote:
One solution (attached is the patch, if nobody has someone against it I
will apply it):
I switch off the recursion protection for the browscap hash in
zend_hash_init_ex because this hash has no recursive things in it and is
not modified after it is created.Uwe
That will probably do it. I'm going to try and reproduce this with and
without the patch today on our Solaris box and I'll see what I get. I've
been swamped recently and haven't been able to give this a good look. (The
browscap extension really needs to be gutted, but at least it works, for
the most part...)
Going to go give this a try now...
J
On my solaris box the fix does it. I tested it by hammering the same PHP
script using get_browser()
with and without patch. Without patch it gets
this error. With not and script works :)
The big problem with this bug is that when the error happens the first time
(3 threads using get_browser()
), the thread which produces the error does
not reset the recursion counter (because of the error) and after finishing
all threads it is left at 1. When then 2 more threads use get_browser()
at
the same time the second one gets the error and at the end the counter is
left at 2. And then all further calls fail...
In my special case today when the error occured it was:
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 1985
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 1985
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 1985
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 150
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 150
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 150
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
6 threads accessed the get_browser()
at the same time - 3 of them failed
(content-length=150... and visible in error log), counter left at 3 -> from
this time on get_browser failed to work.
I think I should apply the patch to 4.3 and 5 (same problem there) and
close the bug.
Uwe
At 19:20 03.12.2003, Jay Smith wrote:
Uwe Schindler wrote:
One solution (attached is the patch, if nobody has someone against it I
will apply it):
I switch off the recursion protection for the browscap hash in
zend_hash_init_ex because this hash has no recursive things in it and is
not modified after it is created.Uwe
That will probably do it. I'm going to try and reproduce this with and
without the patch today on our Solaris box and I'll see what I get. I've
been swamped recently and haven't been able to give this a good look. (The
browscap extension really needs to be gutted, but at least it works, for
the most part...)Going to go give this a try now...
J
--
Uwe Schindler
thetaphi@php.net - http://www.php.net
NSAPI SAPI developer
Erlangen, Germany
Glad to see it's finally working. I was never able to get those errors, but
it probably had something to do with the thread pooling you had configured.
And seeing as it's working now, I guess I don't need to bother trying to
get them.
J
Uwe Schindler wrote:
On my solaris box the fix does it. I tested it by hammering the same PHP
script usingget_browser()
with and without patch. Without patch it gets
this error. With not and script works :)The big problem with this bug is that when the error happens the first
time (3 threads usingget_browser()
), the thread which produces the error
does not reset the recursion counter (because of the error) and after
finishing all threads it is left at 1. When then 2 more threads use
get_browser()
at the same time the second one gets the error and at the
end the counter is left at 2. And then all further calls fail...In my special case today when the error occured it was:
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 1985
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 1985
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 1985
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 150
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 150
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
168.221.143.68 www.pangaea.de - [03/Dec/2003:17:10:09 +0100] "GET /
HTTP/1.1" 200 150
"http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=pangaea"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"6 threads accessed the
get_browser()
at the same time - 3 of them failed
(content-length=150... and visible in error log), counter left at 3 ->
from this time on get_browser failed to work.I think I should apply the patch to 4.3 and 5 (same problem there) and
close the bug.Uwe
Uwe,
I'm having problems reproducing the error. I tried hammering a get_browser()
function with apachebench a couple of times with the options (-n 1000 -c
- and they all returned the same content length, so I'm assuming there
was no error in any of them.
I'm using iPlanet-WebServer-Enterprise/6.0 on Solaris 8 SPARC. Nothing
strange about the set up, I think.
What was the configuration line you used? I'll try to get my set up as close
as possible and go from there.
J
Uwe Schindler wrote:
Today I got the error from bug #25916 several times on our webserver.
Looking through the code I found out the following:
- It depends NOT on the fact if there is a parameter to
get_browser()
or
not * It happens sometimes when server is very heavy loaded, the homepage
of the domain uses theget_browser()
function and is the most visited
page.