Really odd PHP problem

21 years ago by Russ Garrett — view source — reply

unread

OK, first of all I apologise for not posting this in the "right place",
but this is an unreproducable bug (the worst kind...), and I need some
educated guesses as to what is causing it. This thing has me at my wits'
end...

The situation is this: Apache on our main dynamic web server keeps on
suddenly eating all 3+GB of available virtual memory. When this happens,
the server stops responding to all requests, and basically freezes until
the kernel OOM killer gets around to killing enough httpd processes so
we can get in to kill the rest and restart it. This happens every couple
of minutes to couple of hours - there's no pattern to it. You can see an
attractive graph of the occurence here:

http://static.last.fm/phpbug/mem.gif

As you can see, the memory usage shoots up suddenly - it doesn't appear
to be a conventional memory leak. This is accompanied by a similar spike
in the number of apache processes - right up to the MaxClients limit.

We've been running PHP with debug support enabled for the last couple of
days, and we've noticed that a series of errors is always logged just
before the spike in memory usage. A log snippet is available here (note
that these errors carry on for several pages - although I suspect the
first one is the only relevant one - this is only the first page or so):

http://static.last.fm/phpbug/log.txt

The error doesn't just happen with that script, however the initial
error always occurs in the same place (zend_variables.c:44).

This machine serves around 500,000 hits daily, and 99% of them are
PHP-parsed. It's running Debian 3.0 with backported kernel 2.6.7. The
bug manifests itself with both Apache 1.3 and 2, and both PHP 4.3.8 and
4.3.9RC2. Compile options are as follows:

'./configure' '--with-apxs2=/web/apache2/bin/apxs' '--without-mysql'
'--with-zlib-dir=/usr' '--enable-gd-native-ttf' '--with-gettext'
'--enable-mbstring' '--with-pgsql=/usr/local/pgsql' '--enable-sysvmsg'
'--with-gd' '--with-jpeg-dir=/usr' '--enable-debug'

The only third-party module we're using is Turck mmcache - removing it
is kind of difficult since running without any cache brings the machine
to its knees :).

Sorry about the length of this message, I had to fit all the details
in... I'd appreciate it if you have any suggestions at all, this is
really annoying me now. It's problems like this with PHP which make me
consider moving to Java ;)... Anyhow.

Thanks in advance,

Russ Garrett
russ@last.fm

21 years ago by Rasmus Lerdorf — view source — reply

unread

The only third-party module we're using is Turck mmcache - removing it
is kind of difficult since running without any cache brings the machine
to its knees :).

But you should be able to trivially replace it with pecl/apc as the
peformance of the two a very similar and that would at least eliminate one
variable.

-Rasmus

21 years ago by Rasmus Lerdorf — view source — reply

unread

This machine serves around 500,000 hits daily, and 99% of them are
PHP-parsed.

By the way, that is not a lot of hits. Less than 6 requests per second.
I tend to get worried when my servers can't do at least 80-100
requests/second. And you certainly shouldn't need an opcode cache to
do 6 req/sec. What exactly do these PHP scripts of yours do?

-Rasmus

21 years ago by Zeev Suraski — view source — reply

unread

Is your server really unusable w/o a compiled code cache? Of not, try to
remove it and see if the problem persists. One of the problems of most
opcode caches is that a crash bug in PHP or one of its modules can end up
resulting in a full server crash.

I have to say though that it doesn't look that way to me. From first
glance, it appears to be the standard 'spiraling crash'. What that
basically means is:

For whatever reason, the number of Apache processes rises (typically
due to increased end user load, but sometimes also because some
administration script is being run, database slowdown, etc.).
The machine hits the swap threshold , which causes it to slow down much
more (typically by an order of magnitude, at least).
Because of the slowdown, the increased number of Apache processes
quickly becomes saturated (it takes each one more time to serve the
request), and with new requests flowing in, the number of Apache processes
increases even more.
More swap is necessary for the increased number of Apache processes,
and an hopeless spiral begins, typically ending only when the server dies.

If that's indeed what happens on your system (and it happens to almost
everyone, sooner or later) - then it means your system has a value of
MaxClients that's not backed by its CPU power and more importantly,
available memory. You need to either decrease that number or add more memory.

Generally, your machine should have enough memory to run Apache when it
reaches MaxClients without hitting swap. You can test it by setting
StartServers to the same number as MaxClients, and then hitting some of
your PHP-based pages with a high-concurrency ab.

It might be possible that the crash is somehow related, especially if it
corrupts the compiled code cache and results in frequent crashes of Apache
processes, which will cause Apache to fork more and more processes that can
be the initial slowdown trigger, but still, a properly configured server
should not die out of memory because of that.

Zeev

At 22:35 05/09/2004, Russ Garrett wrote:

OK, first of all I apologise for not posting this in the "right place",
but this is an unreproducable bug (the worst kind...), and I need some
educated guesses as to what is causing it. This thing has me at my wits' end...

The situation is this: Apache on our main dynamic web server keeps on
suddenly eating all 3+GB of available virtual memory. When this happens,
the server stops responding to all requests, and basically freezes until
the kernel OOM killer gets around to killing enough httpd processes so we
can get in to kill the rest and restart it. This happens every couple of
minutes to couple of hours - there's no pattern to it. You can see an
attractive graph of the occurence here:

http://static.last.fm/phpbug/mem.gif

As you can see, the memory usage shoots up suddenly - it doesn't appear to
be a conventional memory leak. This is accompanied by a similar spike in
the number of apache processes - right up to the MaxClients limit.

We've been running PHP with debug support enabled for the last couple of
days, and we've noticed that a series of errors is always logged just
before the spike in memory usage. A log snippet is available here (note
that these errors carry on for several pages - although I suspect the
first one is the only relevant one - this is only the first page or so):

http://static.last.fm/phpbug/log.txt

The error doesn't just happen with that script, however the initial error
always occurs in the same place (zend_variables.c:44).

This machine serves around 500,000 hits daily, and 99% of them are
PHP-parsed. It's running Debian 3.0 with backported kernel 2.6.7. The bug
manifests itself with both Apache 1.3 and 2, and both PHP 4.3.8 and
4.3.9RC2. Compile options are as follows:

'./configure' '--with-apxs2=/web/apache2/bin/apxs' '--without-mysql'
'--with-zlib-dir=/usr' '--enable-gd-native-ttf' '--with-gettext'
'--enable-mbstring' '--with-pgsql=/usr/local/pgsql' '--enable-sysvmsg'
'--with-gd' '--with-jpeg-dir=/usr' '--enable-debug'

The only third-party module we're using is Turck mmcache - removing it is
kind of difficult since running without any cache brings the machine to
its knees :).

Sorry about the length of this message, I had to fit all the details in...
I'd appreciate it if you have any suggestions at all, this is really
annoying me now. It's problems like this with PHP which make me consider
moving to Java ;)... Anyhow.

Thanks in advance,

Russ Garrett
russ@last.fm

21 years ago by Russ Garrett — view source — reply

unread

Thanks for all the prompt responses, most appreciated.

Firstly I forgot to add in a fairly crucial subdomain to my hits
estimate (I'm half asleep today). It's closer to 2 million dynamic hits
per day, all added in, which make my numbers a little more reasonable...

I doubt the spiralling-crash theory because sometimes the server will
run fine for hours, using less than 1GB of the 2GB of RAM, and then
suddenly die. It tends to die as frequently during off-peak times as it
does during peak times. Plus, we're running at a modest MaxClients
setting of 100, which with dual 2.4 Xeons and 2GB of RAM should be more
than reasonable.

I tend to agree that it may be a case of the crash causing the opcode
cache to be corrupted, and causing the rest of the apache processes to hang.

APC doesn't seem to work at all as a DSO, I'll try it statically later.

We can't run without an opcode cache, I just tried it and it completely
maxes out the CPU and causes the load to go over 100. We are two fairly
heavy Smarty-based sites.

Cheers,

Russ

21 years ago by Rasmus Lerdorf — view source — reply

unread

APC doesn't seem to work at all as a DSO, I'll try it statically later.

I run it on thousands of servers as a DSO. What are you seeing that would
make you think this?

Also, if you are running PHP as a DSO and pushing your CPU you might want
to compile it non-pic. Use this patch and reconfigure/recompile:

http://lerdorf.com/non-pic.txt

-Rasmus

21 years ago by Russ Garrett — view source — reply

unread

Rasmus Lerdorf wrote:

APC doesn't seem to work at all as a DSO, I'll try it statically later.

I run it on thousands of servers as a DSO. What are you seeing that would
make you think this?

I didn't really want to hang around with the site offline to find out.
Load shot up, I couldn't get a page out of Apache at all. It may well
have been due to the debug build of PHP, although it appeared to load OK.

I've just installed the 30 day trial of Zend Perfomance Suite, so I
shall see if that fixes it.

Also, if you are running PHP as a DSO and pushing your CPU you might want
to compile it non-pic. Use this patch and reconfigure/recompile:
http://lerdorf.com/non-pic.txt

Noted. Thanks.

Russ

21 years ago by Russ Garrett — view source — reply

unread

OK, the situation seems a lot more stable with Zend Accelerator instead
of mmcache, and we're regularly getting quite a few "checksum failed"
errors in the logs, which does tend to indicate that shared memory
corruption was (and still is) happening. But now I don't have to restart
the damn thing every hour, at least for the duration of the 30-day trial ;).

However, now we've eliminated this problem, another becomes obvious.
Namely that there does seem to be a small amount of memory leaking -
likely due to the crashes which are still occurring (i.e. those detailed
here: http://static.last.fm/phpbug/log.txt).

This results in some Apache children taking up 200MB+ of RAM and
lingering there, not serving any requests, until they're killed or the
server is restarted. Regrettably I can't be more specific because the
location in our code that the crashes happen is random (the location in
PHP always appears to be zend_variables.c line 44).

Since the httpd processes appear to just hang, the Apache
MaxRequestsPerChild setting is useless against this.

Thanks so much for your help so far, it is most appreciated. We seem
to be ridiculously unlucky when it comes to these sorts of things...

Cheers,

Russ

21 years ago by Rasmus Lerdorf — view source — reply

unread

You are going to have to narrow this down further for us to have any
chance to help you. Put your stuff on a development server and hit your
various pages looking for that error or the request that causes your httpd
to grow to 200M (use Apache1, not Apache2 for this). Or if all else fails
replay the log to it to recreate the situation. Then replay it slower
without an opcode cache and get it down to a single script and then a
specific part of that script.

-Rasmus

OK, the situation seems a lot more stable with Zend Accelerator instead
of mmcache, and we're regularly getting quite a few "checksum failed"
errors in the logs, which does tend to indicate that shared memory
corruption was (and still is) happening. But now I don't have to restart
the damn thing every hour, at least for the duration of the 30-day trial ;).

However, now we've eliminated this problem, another becomes obvious.
Namely that there does seem to be a small amount of memory leaking -
likely due to the crashes which are still occurring (i.e. those detailed
here: http://static.last.fm/phpbug/log.txt).

This results in some Apache children taking up 200MB+ of RAM and
lingering there, not serving any requests, until they're killed or the
server is restarted. Regrettably I can't be more specific because the
location in our code that the crashes happen is random (the location in
PHP always appears to be zend_variables.c line 44).

Since the httpd processes appear to just hang, the Apache
MaxRequestsPerChild setting is useless against this.

Thanks so much for your help so far, it is most appreciated. We seem
to be ridiculously unlucky when it comes to these sorts of things...

Cheers,

Russ

21 years ago by Xuefer — view source — reply

unread

i can confirm it. it's the problem of cacher.
mmcache is rather complex and NOT stable, although many ppl is running happily, they're not under heavy load.
1 hours to 1days after apache is restarted, mmcache end up with all page randomly crash (share mem courpo

APC works with apache2 DSO, and the optimizer is stable ONLY with my patches
check it out here: http://pecl.php.net/bugs/search.php?cmd=display&status=Open&bug_type[]=APC
i've used APC from the time my last patch posted till now, having 0 crash. (if i clear the cache after long time running, about 1/10 chance will get crash)
FYI: my script seems never beyond cache size

phpa, is quite stable untill the author stopped releasing new version
even i installed the phpa "which can't work with my lastest php", my page is still ok.
this is because: it has "crash recover" scheme: mark the share memory to "reset" on crash, and reset it when it get write lock of share mem.
phpa fall back to non-caching whenever it failed to operate on the share mem, thus no hanging.
phpa stopped itself but won't let php down/hang if the share memory is dead locked or messed up or even can't be recovered.
(i know it by reading the log when phpa crash, some of the above is base on guessing)

both mmcache and apc does not have "crash recover"
does Zend products implement it?

----- Original Message -----
From: "Russ Garrett" russ@last.fm
To: internals@lists.php.net
Sent: Monday, September 06, 2004 3:35 AM
Subject: [PHP-DEV] Really odd PHP problem

OK, first of all I apologise for not posting this in the "right place",
but this is an unreproducable bug (the worst kind...), and I need some
educated guesses as to what is causing it. This thing has me at my wits'
end...

The situation is this: Apache on our main dynamic web server keeps on
suddenly eating all 3+GB of available virtual memory. When this happens,
the server stops responding to all requests, and basically freezes until
the kernel OOM killer gets around to killing enough httpd processes so
we can get in to kill the rest and restart it. This happens every couple
of minutes to couple of hours - there's no pattern to it. You can see an
attractive graph of the occurence here:

http://static.last.fm/phpbug/mem.gif

As you can see, the memory usage shoots up suddenly - it doesn't appear
to be a conventional memory leak. This is accompanied by a similar spike
in the number of apache processes - right up to the MaxClients limit.

We've been running PHP with debug support enabled for the last couple of
days, and we've noticed that a series of errors is always logged just
before the spike in memory usage. A log snippet is available here (note
that these errors carry on for several pages - although I suspect the
first one is the only relevant one - this is only the first page or so):

http://static.last.fm/phpbug/log.txt

The error doesn't just happen with that script, however the initial
error always occurs in the same place (zend_variables.c:44).

This machine serves around 500,000 hits daily, and 99% of them are
PHP-parsed. It's running Debian 3.0 with backported kernel 2.6.7. The
bug manifests itself with both Apache 1.3 and 2, and both PHP 4.3.8 and
4.3.9RC2. Compile options are as follows:

'./configure' '--with-apxs2=/web/apache2/bin/apxs' '--without-mysql'
'--with-zlib-dir=/usr' '--enable-gd-native-ttf' '--with-gettext'
'--enable-mbstring' '--with-pgsql=/usr/local/pgsql' '--enable-sysvmsg'
'--with-gd' '--with-jpeg-dir=/usr' '--enable-debug'

The only third-party module we're using is Turck mmcache - removing it
is kind of difficult since running without any cache brings the machine
to its knees :).

Sorry about the length of this message, I had to fit all the details
in... I'd appreciate it if you have any suggestions at all, this is
really annoying me now. It's problems like this with PHP which make me
consider moving to Java ;)... Anyhow.

Thanks in advance,

Russ Garrett
russ@last.fm

21 years ago by Rasmus Lerdorf — view source — reply

unread

APC works with apache2 DSO, and the optimizer is stable ONLY with my patches
check it out here: http://pecl.php.net/bugs/search.php?cmd=display&status=Open&bug_type[]=APC
i've used APC from the time my last patch posted till now, having 0 crash. (if i clear the cache after long time running, about 1/10 chance will get crash)
FYI: my script seems never beyond cache size

I have fixed a number of problems related to running out of shared memory
in APC lately. If you grab the current CVS version I think you will find
that it is less likely to fill up shared memory, and when it does, it is
smarter about handling that scenario when it happens.

I really haven't done much to the optimizer. I tend to just leave it off.
I would be interested in seeing your patches.

both mmcache and apc does not have "crash recover"

The concept of a crash recover is somewhat flawed in my opinion. The only
way to really do this is to catch SIGSEGV, SIGBUS and other such fatal
signals and twiddle a knob somewhere in shared memory that tells other
processes to flush the cache. The problem with doing this is that once
you get a SEGV, it really isn't safe to do anything like that. You run a
very serious risk of ending up in an infinite crash loop where you catch
the crash, try to set the crash-recover flag, crash trying to do that,
catch the crash, etc.

-Rasmus

21 years ago by Xuefer — view source — reply

unread

thanks for your taking care of my bug reports
my optimizer patch is in http://pecl.php.net/bugs/bug.php?id=1678
i guess u've saw it just now. the changes required by the fix isn't that much as my patch.
i reorgnized the blocks of code into macro(i personally don't like too much boring repeats),
this should make it easy to update and less mistakes. i don't knw it this breaks any coding rules.
feel free to keep origin struct but make the changes careful :)

the mose unstable code is doing constant_fold.
IIRC, long ago, ZendEngine disabled static computeValue:

php -r 'function a(){static $a=1+1;}'
Parse error: parse error, expecting ','' or';'' in Command line code on line 1

to avoid being unstable(crash?)

i wonder why mmcache managed to do it.
how about other optimizers?

----- Original Message -----
From: "Rasmus Lerdorf" rasmus@php.net
To: "Xuefer" Xuefer@hotmail.com
Cc: internals@lists.php.net; "Russ Garrett" russ@last.fm
Sent: Tuesday, September 07, 2004 12:05 PM
Subject: Re: [PHP-DEV] Really odd PHP problem

APC works with apache2 DSO, and the optimizer is stable ONLY with my patches
check it out here: http://pecl.php.net/bugs/search.php?cmd=display&status=Open&bug_type[]=APC
i've used APC from the time my last patch posted till now, having 0 crash. (if i clear the cache after long time running, about 1/10 chance will get crash)
FYI: my script seems never beyond cache size

I have fixed a number of problems related to running out of shared memory
in APC lately. If you grab the current CVS version I think you will find
that it is less likely to fill up shared memory, and when it does, it is
smarter about handling that scenario when it happens.

I really haven't done much to the optimizer. I tend to just leave it off.
I would be interested in seeing your patches.

both mmcache and apc does not have "crash recover"

The concept of a crash recover is somewhat flawed in my opinion. The only
way to really do this is to catch SIGSEGV, SIGBUS and other such fatal
signals and twiddle a knob somewhere in shared memory that tells other
processes to flush the cache. The problem with doing this is that once
you get a SEGV, it really isn't safe to do anything like that. You run a
very serious risk of ending up in an infinite crash loop where you catch
the crash, try to set the crash-recover flag, crash trying to do that,
catch the crash, etc.

-Rasmus

21 years ago by Xuefer — view source — reply

unread

both mmcache and apc does not have "crash recover"

The concept of a crash recover is somewhat flawed in my opinion. The only
way to really do this is to catch SIGSEGV, SIGBUS and other such fatal
signals and twiddle a knob somewhere in shared memory that tells other
processes to flush the cache. The problem with doing this is that once
you get a SEGV, it really isn't safe to do anything like that. You run a
very serious risk of ending up in an infinite crash loop where you catch

the crash, try to set the crash-recover flag, crash trying to do that,
catch the crash, etc.

-Rasmus

without crash recover, corrupted share mem will trigger the crash too in another process.

sorry for my low experience on C and sharemem
but IMHO, it not that hard
it easy to make a reset_flag at top or bottom of sharemem
flag is just an int, not pointer
it won't crash unless the sharemem is unavailable, or the pointer to the share mem is corrupted, maybe possible?
after all we can reset the signalhandler when we're going to operate on the flag
remember to log something when share mem is going to reset(no matter can or cannot obtain write lock to reset)

21 years ago by Rasmus Lerdorf — view source — reply

unread

both mmcache and apc does not have "crash recover"

The concept of a crash recover is somewhat flawed in my opinion. The only
way to really do this is to catch SIGSEGV, SIGBUS and other such fatal
signals and twiddle a knob somewhere in shared memory that tells other
processes to flush the cache. The problem with doing this is that once
you get a SEGV, it really isn't safe to do anything like that. You run a
very serious risk of ending up in an infinite crash loop where you catch

the crash, try to set the crash-recover flag, crash trying to do that,
catch the crash, etc.

-Rasmus

without crash recover, corrupted share mem will trigger the crash too in another process.

sorry for my low experience on C and sharemem
but IMHO, it not that hard
it easy to make a reset_flag at top or bottom of sharemem
flag is just an int, not pointer
it won't crash unless the sharemem is unavailable, or the pointer to the share mem is corrupted, maybe possible?
after all we can reset the signalhandler when we're going to operate on the flag
remember to log something when share mem is going to reset(no matter can or cannot obtain write lock to reset)

There are different ways of doing it, but using a signal handler to catch
a SEGV is not a good idea. You can't count on any code of any sort
working after a SEGV. You could turn it around and have processes check
in and out as they handle requests and if a process doesn't check out
within some allotted time, assume a crash and reset. Or you could have an
external mechanism monitor for crashes and do the reset externally. But
having the process itself that crashed do anything is just asking for
trouble. It doesn't matter if the flag is an int or what it is. Any code
at all executed after a SEGV is unsafe.

-Rasmus

21 years ago by Wez Furlong — view source — reply

unread

I'd recommend the Microsoft "Debugging Tools for Windows" to be able
to do very similar things to gdb under windows; it's free, not as
bloated as VC++/VS.Net IDE debugger and comes in console and GUI
flavours.

Caveat emptor:

you need to understand how to debug
you want a debug build of php with symbols to do much with it

Also, it should be possible (although I've never found out exactly
how) to have the OS/Dr. Watson produce a minidump which you can then
post-mortem backtrace with these tools on a developers box (similar to
gdb'ing a core file on unix). The advantage of this is that your
client wouldn't need to install the debugger and operate it
themselves.

Hope that helps.... (and if you find out about the minidump thing,
please share the knowledge ;-)

--Wez.

A large telco client of ours is having problems that match Russ' problem
almost exactly... except on Win2k.

Does anyone have pointers to windows tools to achieve the kind of
debugging/tracing described below?

Rasmus Lerdorf wrote:

I can watch them for hours in the apache mod_status view, and they'll
show the same last request. They won't respond to a kill -15, I have to
kill -9 them. Strace reports they're doing absolutely nothing.

Could you use 'gcore' to drop a core from one of these spinning processes
and get a backtrace, or simply attach gdb to one of them and see if you
can get a backtrace. Chances are it's off in the middle of nowhere, but
by poking around a bit and looking at 'ap_request' and walking through
'ap_request->headers_in' you should be able to get an idea of the exact
request that caused it to go nuts.

-Rasmus

21 years ago by Daniel Convissor — view source — reply

unread

Hope that helps.... (and if you find out about the minidump thing,
please share the knowledge ;-)

This may be of assistance...

Get the Windows Debugger:
http://www.microsoft.com/ddk/debugging/

Here's a tip sheet on the process:
http://www.jsiinc.com/SUBJ/tip4900/rh4981.htm

Something I did to make things easier, was add the commands to the
file extension handling. In Windows Explorer, select the Folder
Options, File Types and add the DMP extension. Add an event called
Open and put this value in there:

C:\Program Files\Debugging Tools for Windows\windbg.exe" -y
"SRVC:\Program Files\Debugging Tools for
Windows\Symbolshttp://msdl.microsoft.com/download/symbols";
-i C:\WINNT\system32 -z "%1"

Note, I manually put in those line breaks after spaces here for
visibility. Merge them onto one line and edit the paths as
appropriate for you.

--Dan

--
T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y
data intensive web and database programming
http://www.AnalysisAndSolutions.com/
4015 7th Ave #4, Brooklyn NY 11232 v: 718-854-0335 f: 718-854-0409

21 years ago by Derick Rethans — view source — reply

unread

Hope that helps.... (and if you find out about the minidump thing,
please share the knowledge ;-)

This may be of assistance...

Can you perhaps wrap up instructions al la "how to generate a backtrace"
for unix with those tools? That would be much appreciated.

Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

21 years ago by Wez Furlong — view source — reply

unread

We need to:

provide a debug build with symbols (could be generated daily to
reduce load on Edin's snap box)
find out exactly how to get Dr. Watson (or whatever) to drop dumps
for applications
otherwise suggest that they install a debugger. Installing dev tools
under win tends to destabilize the system, so its something to avoid
for production machines; Debugging Tools for Windows is much better
for this than VC++/VS.Net, and is also free.

--Wez.

On Fri, 10 Sep 2004 09:28:04 +0200 (CEST), Derick Rethans
derick@php.net wrote:

Hope that helps.... (and if you find out about the minidump thing,
please share the knowledge ;-)

This may be of assistance...

Can you perhaps wrap up instructions al la "how to generate a backtrace"
for unix with those tools? That would be much appreciated.

Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

21 years ago by Nuno Lopes — view source — reply

unread

We need to:

provide a debug build with symbols (could be generated daily to
reduce load on Edin's snap box)

This would great! (I had already suggested that). I don't have the MS
compiler, just cygwin, and sometimes I get an error with the snap binnary
and then I can't reproduce it with the cygwin build. So, it would be a great
help.

find out exactly how to get Dr. Watson (or whatever) to drop dumps
for applications

I'll investigate this.

otherwise suggest that they install a debugger. Installing dev tools
under win tends to destabilize the system, so its something to avoid
for production machines; Debugging Tools for Windows is much better
for this than VC++/VS.Net, and is also free.

--Wez.

21 years ago by Dietrich Ayala — view source — reply

unread

Looks like Dr Watson won't help our specific problem:

"Dr. Watson cannot create a snapshot if the program does not respond
(hangs)."

From "How to Troubleshoot Program Faults with Dr. Watson":
http://support.microsoft.com/default.aspx?scid=kb;en-us;q275481

However, we do experience php.exe crashes at times, and it looks like we
may be able to get a minidump in those instances using Dr. Watson.

Nuno Lopes wrote:

We need to:

provide a debug build with symbols (could be generated daily to
reduce load on Edin's snap box)

This would great! (I had already suggested that). I don't have the MS
compiler, just cygwin, and sometimes I get an error with the snap
binnary and then I can't reproduce it with the cygwin build. So, it
would be a great help.

find out exactly how to get Dr. Watson (or whatever) to drop dumps
for applications

I'll investigate this.

otherwise suggest that they install a debugger. Installing dev tools
under win tends to destabilize the system, so its something to avoid
for production machines; Debugging Tools for Windows is much better
for this than VC++/VS.Net, and is also free.

--Wez.

21 years ago by Dietrich Ayala — view source — reply

unread

excellent tips, thx dan and wez.

do i need to do add any special options to the debug build configuration
to generate the debug symbols? also, what is the release_tsdbg build
configuration? a release build w/ debug symbols?

i'll check out the minidump idea. that'd be ideal, as the client is
remote and won't allow remote access to their box.

thanks,

dietrich

Wez Furlong wrote:

I'd recommend the Microsoft "Debugging Tools for Windows" to be able
to do very similar things to gdb under windows; it's free, not as
bloated as VC++/VS.Net IDE debugger and comes in console and GUI
flavours.

Caveat emptor:

you need to understand how to debug

you want a debug build of php with symbols to do much with it

Also, it should be possible (although I've never found out exactly
how) to have the OS/Dr. Watson produce a minidump which you can then
post-mortem backtrace with these tools on a developers box (similar to
gdb'ing a core file on unix). The advantage of this is that your
client wouldn't need to install the debugger and operate it
themselves.

Hope that helps.... (and if you find out about the minidump thing,
please share the knowledge ;-)

--Wez.

A large telco client of ours is having problems that match Russ' problem
almost exactly... except on Win2k.

Does anyone have pointers to windows tools to achieve the kind of
debugging/tracing described below?

Rasmus Lerdorf wrote:

I can watch them for hours in the apache mod_status view, and they'll
show the same last request. They won't respond to a kill -15, I have to
kill -9 them. Strace reports they're doing absolutely nothing.

Could you use 'gcore' to drop a core from one of these spinning processes
and get a backtrace, or simply attach gdb to one of them and see if you
can get a backtrace. Chances are it's off in the middle of nowhere, but
by poking around a bit and looking at 'ap_request' and walking through
'ap_request->headers_in' you should be able to get an idea of the exact
request that caused it to go nuts.

-Rasmus

21 years ago by Wez Furlong — view source — reply

unread

The .pdb files contain the debugging info; they should be generated as
part of the debug build.

Dans tips on reading the minidump are handy, but don't tell you how to
get hold of one for an application crash--they cover only a kernel
crash (BSOD). You'll need to do some googling to try and find out
how to make it work.

I'd stick to a regular Debug_TS if I were you, as there is a fair
chance that some of those more esoteric build configurations are not
completely up to date.

--Wez.

excellent tips, thx dan and wez.

do i need to do add any special options to the debug build configuration
to generate the debug symbols? also, what is the release_tsdbg build
configuration? a release build w/ debug symbols?