Hello internals,
today I have been pointed to bug #60982[1], which appears to be an
unpleasant limitation. I wonder why the GC is not triggered when the
memory limit is exhausted, what would avoid the script to end
prematurely if there are uncollected cycles.
I've tried out a (maybe too simplistic) solution[2], and with that
modification the test script of bug #69639[3] would run, as well as Dan
Ackroyd's memTest.php[4] presented in the other bug report.
With my limited knowledge of C, systems programming, and the Zend Engine
in particular, I can't see any drawbacks with this modification, so I
would be happy if more experienced programmers could point out crucial
issues. Otherwise it might be worthwhile to investigate further
improvements (I have not tackled zend_mm_realloc_heap() or even "Out of
memory" conditions).
[1] https://bugs.php.net/bug.php?id=60982
[2] https://github.com/cmb69/php-src/tree/gc
[3] https://bugs.php.net/bug.php?id=69639
[4] https://gist.github.com/Danack/69323606f144a2bbb9db
--
Christoph M. Becker
Hi Christopher,
Hello internals,
today I have been pointed to bug #60982[1], which appears to be an
unpleasant limitation.
It sure is!
I wonder why the GC is not triggered when the memory limit is exhausted,
To hopefully point you in the right direction, I believe the problem I
ran into, and the issue that stops it being a trivial thing to fix is
this; it can be necessary for memory allocations to take place inside
gc_collect_cycles.
For example when an object has a userland __destruct method, and it is
destroyed when the GC runs, then calling that method should require
allocations.....but we've already run out of memory, so that can't
happen.
As Rasmus suggested[1], in an issue linked through #60982, a simple
way to solve this would be to have both a soft and hard limit for
memory, and to allow the soft-limit to be a user callback, which could
call gc_collect_cycles, or not as the user desired.
However, it might be time to give the garbage collector a bit of love
and care, as it's not necessarily the case that the GC requirements
for PHP running as CGI process are the same as it running as a
long-running CLI process.
If nothing else, it would be nice to get the GC_BENCH benchmark stuff
working for the CLI again; it currently only works in certain
circumstances[2].
cheers
Dan
[1] https://bugs.php.net/bug.php?id=41245
[2] <https://bugs.php.net/bug.php?id=68343
Dan Ackroyd wrote:
I wonder why the GC is not triggered when the memory limit is exhausted,
To hopefully point you in the right direction, I believe the problem I
ran into, and the issue that stops it being a trivial thing to fix is
this; it can be necessary for memory allocations to take place inside
gc_collect_cycles.For example when an object has a userland __destruct method, and it is
destroyed when the GC runs, then calling that method should require
allocations.....but we've already run out of memory, so that can't
happen.
Ah, I see! Thanks, Dan. Having given that some thought, I can
understand that the issue has not been addressed yet.
As Rasmus suggested[1], in an issue linked through #60982, a simple
way to solve this would be to have both a soft and hard limit for
memory, and to allow the soft-limit to be a user callback, which could
call gc_collect_cycles, or not as the user desired.
What happens if the soft limit is exhausted, but the GC can free only a
little memory? That might trigger the GC shortly afterwards again and
again. A user would have to carefully adjust the soft limit
dynamically, to work around this problem. Then again, it might be
better than the current situation.
However, it might be time to give the garbage collector a bit of love
and care, as it's not necessarily the case that the GC requirements
for PHP running as CGI process are the same as it running as a
long-running CLI process.
I fully agree.
If nothing else, it would be nice to get the GC_BENCH benchmark stuff
working for the CLI again; it currently only works in certain
circumstances[2].
That doesn't seem to work on Windows at all (at least for current
master). :(
[1] https://bugs.php.net/bug.php?id=41245
[2] https://bugs.php.net/bug.php?id=68343
--
Christoph M. Becker
As Rasmus suggested[1], in an issue linked through #60982, a simple
way to solve this would be to have both a soft and hard limit for
memory, and to allow the soft-limit to be a user callback, which could
call gc_collect_cycles, or not as the user desired.
What happens if the soft limit is exhausted, but the GC can free only a
little memory? That might trigger the GC shortly afterwards again and
again. A user would have to carefully adjust the soft limit
dynamically, to work around this problem. Then again, it might be
better than the current situation.
Apart from chewing CPU cycles, would that actually be a problem? The
soft-limit callback would presumably be responsible for doing one of two
things: gracefully terminating the request (the "pretty error page" use
case) or reducing the used amount of memory (the "trigger GC" use case).
If the memory usage was still over the soft limit when the callback
ends, the engine could terminate as though the hard limit was reached.
Rather than adding a soft limit, you could say that we are adding an
additional reserve of memory only accessible to the memory-out callback.
If the callback freed at least some memory, the engine could execute at
least one instruction before the soft limit was reached a second time,
triggering the callback again. In the worst case, a loop could
repeatedly push usage over the limit, triggering the callback repeatedly
like a tick function; however, one of the following would then have to
happen:
- the oscillation could continue for a while, but the loop or the whole
program eventually finish normally, just a bit slower than normal - if the net amount of memory freed by the callback was slightly higher
than the net amount allocated after it returned, the memory usage would
slowly decline, breaking out of the oscillation once it dropped below
the soft limit - if the amount of memory freed was lower than the amount allocated, the
memory usage would slowly grow, eventually reaching the hard limit - if the amount of memory freed and allocated was consistently
identical, the oscillation could continue and cause the program to hit
the execution timeout
Dynamically adjusting the limit would be no help, because if you're that
hard up against the limit, your only hope is to gracefully end the
process anyway.
--
Rowan Collins
[IMSoP]
What happens if the soft limit is exhausted, but the GC can free only a
little memory? That might trigger the GC shortly afterwards again and
again.
The usual way to stop this sort of flapping is to set the point at
which the alarm gets reset to be significantly lower than the point at
which an alarm is triggered in the first place. And as Rowan said, the
current situation is that applications just crash when they run out of
memory, so a little slow but not 50x response would still be better.
Dynamically adjusting the limit would be no help, because if you're that
hard up against the limit, your only hope is to gracefully end the process anyway.
To be clear - we don't really have hard memory limits in PHP. We have
a limit that users specify in an ini file, but this can be overridden
in a script by just calling ini_set('memory_limit', $moreMemory);. The
only real hard limit is when malloc starts failing, at which point all
hope is lost.
So, I think what would be needed for graceful handling of memory usage is:
-
Allow users to specify programmatically (i.e. not in the ini file)
memory trigger limits and a callable for each of those limits. -
When one limit is reached, set the memory limit to the next higher
one, and call the callable associated with the memory level just
reached, at the next appropriate place in the engine. I don't think
checking after each allocation would be feasible as it would be too
nasty to code, as well as having to much of a performance impact.
Instead this seems to be more likely to work if it was tied into the
'tick' functionality. -
When there are no more memory limits left, or a limit would be
higher than memory_limit, trigger the current behaviour for when the
application has run out of memory (i.e. abort the application). -
When gc_collect_cycles is called (and potentially other functions),
at the end of that function check whether the amount of memory being
used is less than reset level for each trigger. If so reduce the
current memory limit to the lowest trigger level.
The reason for having multiple limits is that people will want their
code to do different things at the different limits. For example,
imagine someone who has some PHP code serving a simple API that for
business reasons needs to respond quickly, which uses typically uses
4MB to serve a request.
It would be reasonable for them to have the trigger levels of:
8MB - Log that this request used more memory than average, and someone
should investigate it at some point.
32MB - Log this request is using 8 times as much memory as usual and
someone needs to investigate it reasonably urgently, and trigger
gc_collect_cycles. gc_collect_cycles is a relatively slow function,
and so this will make the request slower than desired.
64MB - Throw an exception and cleanly shut down the application.
And a max memory_limit of 128MB where the application does the current
behaviour of falling over and dying.
I'll wait for some feedback, then write an RFC unless someone can say
something that would be significantly better, or why the above would
be either bad or technically unfeasible.
cheers
Dan
Dan Ackroyd wrote on 19/05/2015 14:08:
- When gc_collect_cycles is called (and potentially other functions),
at the end of that function check whether the amount of memory being
used is less than reset level for each trigger. If so reduce the
current memory limit to the lowest trigger level.
The only flaw I see in this approach is that the majority of memory
deallocation is done transparently be the ref-counting mechanism, not
any particular function, and the user should be able to trigger this
manually.
For instance, you might have an EntityRepository which caches all the
entities it's loaded thus far in the script to save DB round trips, and
calling clearCache() on this repository might free a significant amount
of memory.
It would probably make sense for the memory level to be checked against
reset levels immediately after the low-memory callback returns, so that
users can deliberately take advantage of such deallocations. However,
that still leaves the case of that method being called during normal
flow, and requiring a call to gc_collect_cycles just for the side-effect
of resetting memory limits seems a bit odd.
I don't know enough about the memory manager to know, but presumably at
the moment memory_limit is checked on every allocation? Could a similar
check be made on deallocation?
Having multiple levels configured doesn't actually have any extra
performance cost, except in the case where they are all passed at once -
you just store the "current" level, and test whether you have reached
the next level up or down. If a single deallocation drops you from 20MB
to 2MB, you first register that the 16MB reset limit has been reached,
and only then perform an extra check if the 8MB reset limit has also
been reached.
Regards,
Rowan Collins
[IMSoP]
Dan Ackroyd wrote on 19/05/2015 14:08:
- When gc_collect_cycles is called (and potentially other functions),
at the end of that function check whether the amount of memory being
used is less than reset level for each trigger. If so reduce the
current memory limit to the lowest trigger level.The only flaw I see in this approach is that the majority of memory
deallocation is done transparently be the ref-counting mechanism, not any
particular function, and the user should be able to trigger this manually.
Good point, I'll think about that.
Danack wrote:
I'll wait for some feedback, then write an RFC
Actually no I won't, I'll try to implement it as an extension first.
cheers
Dan