Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS stat()
call for files, so that subsequent reads on the same file can be faster. However, it's even less realized that it's a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.
For more info: https://tideways.com/profiler/blog/the-php-stat-cache-explained
Because it's so rarely relevant, in the cases it is relevant, it can be quite a surprise, and a surprise causing weird and hard to explain caching bugs in applications.
The cache also dates from 20 years ago, when Rasmus added it (and the realpath cache) in Yahoo's forked PHP 4, and then it got integrated into PHP 5. However, hard drives are vastly faster than they were then, and operating systems are vastly more efficient than they were then.
There's been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:
https://github.com/php/php-src/pull/17178
Arnaud ran some quick benchmarks and found that disabling it has a less than 1% impact on Symfony and WordPress.
https://github.com/php/php-src/pull/17178#issuecomment-2554323572
Before we go any further, is there appetite among the voting population to remove it? clearstatcache()
and similar functions would get stubbed out as no-ops, but otherwise we'd just hand the responsibility back to the OS where it belongs, which seems so far like it would be almost an unmeasurable performance difference but remove some surprise complexity.
Would you support such a removal?
What additional data would you need to make the case for such removal?
--
Larry Garfield
larry@garfieldtech.com
Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS
stat()
call for files, so that subsequent reads on the same file can be faster. However, it's even less realized that it's a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.For more info: https://tideways.com/profiler/blog/the-php-stat-cache-explained
Because it's so rarely relevant, in the cases it is relevant, it can be quite a surprise, and a surprise causing weird and hard to explain caching bugs in applications.
The cache also dates from 20 years ago, when Rasmus added it (and the realpath cache) in Yahoo's forked PHP 4, and then it got integrated into PHP 5. However, hard drives are vastly faster than they were then, and operating systems are vastly more efficient than they were then.
There's been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:
https://github.com/php/php-src/pull/17178
Arnaud ran some quick benchmarks and found that disabling it has a less than 1% impact on Symfony and WordPress.
https://github.com/php/php-src/pull/17178#issuecomment-2554323572
Before we go any further, is there appetite among the voting population to remove it?
clearstatcache()
and similar functions would get stubbed out as no-ops, but otherwise we'd just hand the responsibility back to the OS where it belongs, which seems so far like it would be almost an unmeasurable performance difference but remove some surprise complexity.Would you support such a removal?
What additional data would you need to make the case for such removal?--
Larry Garfield
larry@garfieldtech.com
At least on the platform I'm supporting (IBM i), filesystem calls can be
quite slow. I know it's similar on Windows too. That said, I think
getting rid of the stat cache is probably the right call. It's better to
do this at the OS or application levels, where they know more about the
workload (either because they have a system view, or the app knows what
it needs to keep). I haven't measured this yet though.
Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS
stat()
call for files, so that subsequent reads on the same file can be faster. However, it's even less realized that it's a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.There's been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:
https://github.com/php/php-src/pull/17178
Arnaud ran some quick benchmarks and found that disabling it has a less than 1% impact on Symfony and WordPress.
https://github.com/php/php-src/pull/17178#issuecomment-2554323572
Before we go any further, is there appetite among the voting population to remove it?
clearstatcache()
and similar functions would get stubbed out as no-ops, but otherwise we'd just hand the responsibility back to the OS where it belongs, which seems so far like it would be almost an unmeasurable performance difference but remove some surprise complexity.Would you support such a removal?
I still think the stat cache should be deprecated first. That gives
users a chance to reconsider calling multiple stat related functions
instead of doing a single stat()
call. See my previous comment[1] for
some further details.
[1] https://github.com/php/php-src/pull/5894#issuecomment-2546473892
Christoph
Hi,
On Fri, Dec 20, 2024 at 10:37 PM Christoph M. Becker cmbecker69@gmx.de
wrote:
Background: PHP has a not-often-considered feature, the stat-cache.
That is, the runtime caches the OSstat()
call for files, so that
subsequent reads on the same file can be faster. However, it's even less
realized that it's a single-file cache. It literally only applies when you
try to do two file-infomation operations on the same file in rapid
succession, without any other file reads in between.There's been some discussion about making the cache disable-able, though
the consensus now seems to be leaning toward getting rid of it outright:https://github.com/php/php-src/pull/17178
Arnaud ran some quick benchmarks and found that disabling it has a less
than 1% impact on Symfony and WordPress.https://github.com/php/php-src/pull/17178#issuecomment-2554323572
Before we go any further, is there appetite among the voting population
to remove it?clearstatcache()
and similar functions would get stubbed out
as no-ops, but otherwise we'd just hand the responsibility back to the OS
where it belongs, which seems so far like it would be almost an
unmeasurable performance difference but remove some surprise complexity.Would you support such a removal?
I still think the stat cache should be deprecated first. That gives
users a chance to reconsider calling multiple stat related functions
instead of doing a singlestat()
call. See my previous comment[1] for
some further details.
I don't think we should force users update their code because of negligible
perf impact. Most of the time this want play any role in perf anyway as
often for applications, that actually do something, the most time is spent
on waiting for IO. So I really don't see a reason for deprecation in this
case.
Regards
Jakub
Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS
stat()
call for files, so that subsequent reads on the same file can be faster. However, it's even less realized that it's a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.There's been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:
https://github.com/php/php-src/pull/17178
Arnaud ran some quick benchmarks and found that disabling it has a less than 1% impact on Symfony and WordPress.
https://github.com/php/php-src/pull/17178#issuecomment-2554323572
Before we go any further, is there appetite among the voting population to remove it?
clearstatcache()
and similar functions would get stubbed out as no-ops, but otherwise we'd just hand the responsibility back to the OS where it belongs, which seems so far like it would be almost an unmeasurable performance difference but remove some surprise complexity.Would you support such a removal?
I still think the stat cache should be deprecated first. That gives
users a chance to reconsider calling multiple stat related functions
instead of doing a singlestat()
call. See my previous comment[1] for
some further details.[1] https://github.com/php/php-src/pull/5894#issuecomment-2546473892
Christoph
What exactly would deprecation look like here? My plan was to just rip the cache out, and update clearstatcache()
to be a no-op, but issue a deprecation message "Hey, this doesn't do anything anymore." And then we can remove the function itself in like PHP 10 or something, because it doesn't hurt anything to leave it be.
I don't see there being much value to a period of "hey, this is going to do nothing in the future", when users couldn't do anything about it. That just gives them a deprecation notice they cannot fix, if they're in one of the very few situations where manually clearing the cache is useful. That doesn't seem great.
--Larry Garfield
I still think the stat cache should be deprecated first. That gives
users a chance to reconsider calling multiple stat related functions
instead of doing a singlestat()
call. See my previous comment[1] for
some further details.[1] https://github.com/php/php-src/pull/5894#issuecomment-2546473892
What exactly would deprecation look like here? My plan was to just rip the cache out, and update
clearstatcache()
to be a no-op, but issue a deprecation message "Hey, this doesn't do anything anymore." And then we can remove the function itself in like PHP 10 or something, because it doesn't hurt anything to leave it be.I don't see there being much value to a period of "hey, this is going to do nothing in the future", when users couldn't do anything about it. That just gives them a deprecation notice they cannot fix, if they're in one of the very few situations where manually clearing the cache is useful. That doesn't seem great.
I believe the whole point of the stat cache is to optimize multiple
consecutive calls to stat releted functions on the same file name.
E.g. code like
$mtime = filemtime($filename);
$fsize = filesize($filename);
would be a relevant example. Such code could be changed in userland to
$stat = stat($filename);
$mtime = $stat["mtime"];
$fsize = $stat["stat"];
where the stat cache would be irrelevant. Of course, users who are not
aware that there may be a difference in performance won't even think
about that. As such a deprecation message could be triggered whenever
the stat cache is hit, possibly pointing also to the file:line where the
cache had been populated. The usefulness of this is based on the
assumption that it's pretty unlikely that the stat cache is hit from
unrelated code paths.
If a general deprecation is not desired (and that seems to be the case),
I'm also fine with a PR/patch that users could apply themselves, similar
what Nikita did back then when string to number comparisons changed[1].
Note that clearstatcache()
should not be no-opped altogether; clearing
(parts of) the realpath cache seems still useful.
[1] https://github.com/php/php-src/pull/3917
Christoph
On Fri, Dec 20, 2024 at 8:29 PM Larry Garfield larry@garfieldtech.com
wrote:
Background: PHP has a not-often-considered feature, the stat-cache. That
is, the runtime caches the OSstat()
call for files, so that subsequent
reads on the same file can be faster. However, it's even less realized
that it's a single-file cache. It literally only applies when you try to
do two file-infomation operations on the same file in rapid succession,
without any other file reads in between.For more info:
https://tideways.com/profiler/blog/the-php-stat-cache-explainedBecause it's so rarely relevant, in the cases it is relevant, it can be
quite a surprise, and a surprise causing weird and hard to explain caching
bugs in applications.The cache also dates from 20 years ago, when Rasmus added it (and the
realpath cache) in Yahoo's forked PHP 4, and then it got integrated into
PHP 5. However, hard drives are vastly faster than they were then, and
operating systems are vastly more efficient than they were then.There's been some discussion about making the cache disable-able, though
the consensus now seems to be leaning toward getting rid of it outright:https://github.com/php/php-src/pull/17178
Arnaud ran some quick benchmarks and found that disabling it has a less
than 1% impact on Symfony and WordPress.https://github.com/php/php-src/pull/17178#issuecomment-2554323572
Before we go any further, is there appetite among the voting population to
remove it?clearstatcache()
and similar functions would get stubbed out as
no-ops, but otherwise we'd just hand the responsibility back to the OS
where it belongs, which seems so far like it would be almost an
unmeasurable performance difference but remove some surprise complexity.Would you support such a removal?
What additional data would you need to make the case for such removal?--
Larry Garfield
larry@garfieldtech.com
This gets a +1 from me. I've had bugs that I suspected were caused by this
cache, but I was never able to confirm it until putting clearstatcache()
in
production. That's not a workflow I'd like to follow, and it has wasted
enough of my time.
Am 20.12.2024 um 20:26 schrieb Larry Garfield:
Would you support such a removal?
+1 from me.
Here is an example of how the stat-cache can lead to interesting
situations in testing:
https://github.com/sebastianbergmann/phpunit/issues/5996#issuecomment-2422018481
There's been some discussion about making the cache disable-able, though the consensus now seems to be leaning toward getting rid of it outright:
Just to fill in more context, which wasn't originally obvious to me: that PR thread replaces one from 2021 https://github.com/php/php-src/pull/5894 which was discussed on the list before without consensus: https://externals.io/message/115912.
That in turn links to a feature request from all the way back in 2004: https://bugs.php.net/bug.php?id=28790
I have no doubt there are various other duplicates and discussions; clearly this has always been a contentious topic.
Regards,
Rowan Tommins
[IMSoP]
On Fri, Dec 20, 2024 at 8:29 PM Larry Garfield larry@garfieldtech.com
wrote:
Background: PHP has a not-often-considered feature, the stat-cache. That
is, the runtime caches the OSstat()
call for files, so that subsequent
reads on the same file can be faster. However, it's even less realized
that it's a single-file cache. It literally only applies when you try to
do two file-infomation operations on the same file in rapid succession,
without any other file reads in between.For more info:
https://tideways.com/profiler/blog/the-php-stat-cache-explainedBecause it's so rarely relevant, in the cases it is relevant, it can be
quite a surprise, and a surprise causing weird and hard to explain caching
bugs in applications.The cache also dates from 20 years ago, when Rasmus added it (and the
realpath cache) in Yahoo's forked PHP 4, and then it got integrated into
PHP 5. However, hard drives are vastly faster than they were then, and
operating systems are vastly more efficient than they were then.There's been some discussion about making the cache disable-able, though
the consensus now seems to be leaning toward getting rid of it outright:https://github.com/php/php-src/pull/17178
Arnaud ran some quick benchmarks and found that disabling it has a less
than 1% impact on Symfony and WordPress.https://github.com/php/php-src/pull/17178#issuecomment-2554323572
Before we go any further, is there appetite among the voting population to
remove it?clearstatcache()
and similar functions would get stubbed out as
no-ops, but otherwise we'd just hand the responsibility back to the OS
where it belongs, which seems so far like it would be almost an
unmeasurable performance difference but remove some surprise complexity.Would you support such a removal?
What additional data would you need to make the case for such removal?
I would prefer to disable it by default but keep some option (INI) to
re-enable it. I think that for most users the perf impact will be
negligible. However, it is quite likely that there are some user workflows
and platforms where benefiting from the stat cache can be still significant
in terms of performance. So those users should have the option to re-enable
it if they see some significant regression rather then force them to update
their code to make it faster or implement their own cache which would just
make their migration to the next version much harder / potentially
impossible. There is not such a huge maintenance that we would really need
to get rid of it completely. I would really prefer having such option and
tell to users to re-enable it rather than not be able to deal with
potentially reported future perf regressions.
I think the main issue with the cache is that is just not convenient for
use cases where it doesn't get flushed during some different access methods
that don't trigger flush. We could probably improve the stream situation a
bit but it still leaves external (e.g. shell) access problem in place which
we just cannot fix. On the other hand it is possible to use it in a way
that users can profit from it but they really need to know how it works.
That's way it should be an optional feature IMO. We should also improve
documentation in that regards.
In terms of voting, if there was no option to re-enable it, I would
probably vote against this proposal as I'm a bit worried about those
possible regression reports.
Regards
Jakub
I would prefer to disable it by default but keep some option (INI) to
re-enable it.
Rather than a global setting, which would make behaviour even more unpredictable in libraries and out-the-box applications, I wonder if we could make the cache explicit on the functions that use it?
I'm thinking for instance of an extra argument, like:
$perms = fileperms($name, statcache: true);
$size = filesize($name, statcache: true);
I'm not sure if this should default to false straight away, or be introduced gradually somehow, but it would make the behaviour much more explicit.
Regards,
Rowan Tommins
[IMSoP]
Rather than a global setting, which would make behaviour even more unpredictable in libraries and out-the-box applications, I wonder if we could make the cache explicit on the functions that use it?
I'm thinking for instance of an extra argument, like:
$perms = fileperms($name, statcache: true);
$size = filesize($name, statcache: true);
In my opinion, this will become very messy.
I'm not sure if this should default to false straight away, or be introduced gradually somehow, but it would make the behaviour much more explicit.
Changing a default would be another BC break.
Regards,
Rowan Tommins
[IMSoP]
Kind regards
Niels
$perms = fileperms($name, statcache: true);
$size = filesize($name, statcache: true);In my opinion, this will become very messy.
Could you elaborate?
Changing a default would be another BC break.
"Another" after what? Adding either an INI setting or an optional parameter is not a BC break, unless and until the default is changed, at which point there is exactly one BC break.
Regards,
Rowan Tommins
[IMSoP]
$perms = fileperms($name, statcache: true);
$size = filesize($name, statcache: true);In my opinion, this will become very messy.
Could you elaborate?
Adding a parameter for a cache, which should've been transparent in the first place, to every file operation is messy.
A cache should normally be transparent, and the reason we're having this discussion in the first place is because the cache isn't transparent and causes problems. Adding an extra parameter is going further away from transparency. It's also inconvenient for programmers to add this to different places in their codebase.
Changing a default would be another BC break.
"Another" after what? Adding either an INI setting or an optional parameter is not a BC break, unless and until the default is changed, at which point there is exactly one BC break.
Adding an INI: no BC break indeed.
But if you want to add extra parameters to functions that can potentially touch the stat cache, then you need to take into account spl as well. Adding extra parameters to the functions in those classes are a BC break because the signature of potential userland function overrides would no longer be compatible at compile time.
Regards,
Rowan Tommins
[IMSoP]
Kind regards
Niels
Adding a parameter for a cache, which should've been transparent in the first place, to every file operation is messy.
I would say it's less messy than having to work out when to turn a
global setting on or off. In particular, it would be horrible for shared
libraries, the equivalent of the above would be something like this:
$old_cache_setting = ini_set('enable_stat_cache', 1);
$perms = fileperms($name); $size = filesize($name);
ini_set('enable_stat_cache', $old_cache_setting);
Similarly, for the false case, library code would either have to assume
the cache might be enabled, and call clearstatcache()
just in case; or
it would have to carefully wrap code in similar ini_set blocks.
As far as I can see, both code that benefits from the cache, and code
that suffers from it, is very rare; but if you know you're writing one
or the other, having an explicit way to mark that code seems more
appropriate than toggling a global setting.
But if you want to add extra parameters to functions that can potentially touch the stat cache, then you need to take into account spl as well. Adding extra parameters to the functions in those classes are a BC break because the signature of potential userland function overrides would no longer be compatible at compile time.
Ah yes, I hadn't thought of objects being affected. On the other hand,
objects have an obvious place to store both the state of the setting and
the cache itself: on the instance.
For example, a local rather than global cache would allow this to make
two stat calls, rather than four:
$file1 = new SplFileInfo($name1, usecache: true);
$file2= new SplFileInfo($name2, usecache: true);
if (
$file1->getSize() !== $file2->getSize()
|| $file1->getMTime() !== $file2->getMTime()
) { ... }
In fact, it would probably be useful to pre-fetch a snapshot in the
constructor, rather than just caching on the first method call, so that
this worked:
$before = new SplFileInfo($name, snapshot: true);
do_something();
$after = new SplFileInfo($name, snapshot: true);
if ( $before->getSize() !== $after->getSize() ) { ... }
Inheritance of constructors isn't restricted, so that would not be a BC
break, and seems both more powerful and easier to understand than the
current feature.
Regards,
--
Rowan Tommins
[IMSoP]
Adding a parameter for a cache, which should've been transparent in the first place, to every file operation is messy.
I would say it's less messy than having to work out when to turn a global setting on or off. In particular, it would be horrible for shared libraries, the equivalent of the above would be something like this:
$old_cache_setting = ini_set('enable_stat_cache', 1);
$perms = fileperms($name); $size = filesize($name); ini_set('enable_stat_cache', $old_cache_setting);Similarly, for the false case, library code would either have to assume the cache might be enabled, and call
clearstatcache()
just in case; or it would have to carefully wrap code in similar ini_set blocks.As far as I can see, both code that benefits from the cache, and code that suffers from it, is very rare; but if you know you're writing one or the other, having an explicit way to mark that code seems more appropriate than toggling a global setting.
I see your argument (hah!) for shared libraries.
But the programmer effort of adding the extra argument to calls starts to outweigh the global ini setting approach quickly, as the global ini setting has a "fixed cost" of two lines of code while adding a parameter scales in the number of calls the programmer has to change.
But if you want to add extra parameters to functions that can potentially touch the stat cache, then you need to take into account spl as well. Adding extra parameters to the functions in those classes are a BC break because the signature of potential userland function overrides would no longer be compatible at compile time.
Ah yes, I hadn't thought of objects being affected. On the other hand, objects have an obvious place to store both the state of the setting and the cache itself: on the instance.
I do agree with this yes.
All in all though, I'm not convinced by the parameter approach.
I'd like a proper solution rather than some plaster.
There are some options:
- Try to fix the stat cache.
- Put stat cache behind an ini knob.
- Get rid of the stat cache.
All of these simplify the developer experience. Adding more configuration knobs or extra parameters add complexity.
I'd like to see more simplification.
Kind regards
Niels
All in all though, I'm not convinced by the parameter approach.
I'd like a proper solution rather than some plaster.
There are some options:
- Try to fix the stat cache.
- Put stat cache behind an ini knob.
- Get rid of the stat cache.
All of these simplify the developer experience. Adding more configuration knobs or extra parameters add complexity.
I'd like to see more simplification.
I disagree that number 2 simplifies anything. Users who get tripped up by the subtle behaviour of the cache won't know to turn it off; users who know they can get advantage from it will have to make sure it is turned on, and not conflicting with code that wants it turned off; and libraries will have to account for both modes, or add boilerplate.
I would probably vote for abolishing it completely, or replacing it with some form of local setting (maybe only on SplFileInfo). I would probably vote against adding an ini setting.
Regards,
Rowan Tommins
[IMSoP]
Background: PHP has a not-often-considered feature, the stat-cache. That is, the runtime caches the OS
stat()
call for files, so that subsequent reads on the same file can be faster. However, it's even less realized that it's a single-file cache. It literally only applies when you try to do two file-infomation operations on the same file in rapid succession, without any other file reads in between.
I would prefer to disable it by default but keep some option (INI) to
re-enable it. I think that for most users the perf impact will be
negligible. However, it is quite likely that there are some user
workflows and platforms where benefiting from the stat cache can be
still significant in terms of performance. So those users should have
the option to re-enable it if they see some significant regression
rather then force them to update their code to make it faster or
implement their own cache which would just make their migration to the
next version much harder / potentially impossible. There is not such a
huge maintenance that we would really need to get rid of it completely.
I would really prefer having such option and tell to users to re-enable
it rather than not be able to deal with potentially reported future
perf regressions.I think the main issue with the cache is that is just not convenient
for use cases where it doesn't get flushed during some different access
methods that don't trigger flush. We could probably improve the stream
situation a bit but it still leaves external (e.g. shell) access
problem in place which we just cannot fix. On the other hand it is
possible to use it in a way that users can profit from it but they
really need to know how it works. That's way it should be an optional
feature IMO. We should also improve documentation in that regards.In terms of voting, if there was no option to re-enable it, I would
probably vote against this proposal as I'm a bit worried about those
possible regression reports.
I really don't like the idea of another ini toggle. That actually creates more work, as people writing code that works with the file system now have one more invisible context they have to think about. Which means they probably won't, until it bites them. (They'll either never bother clearing the cache, so their code may malfunction on the rare system where it's enabled, or always clear it, which 99.9% of the time will actually be slower as we have to invoke the function for it to do nothing. Both are bad.)
I suppose a possible alternative would be to modify all file system mutation functions (file_put_contents(), touch()
, etc.) to flush the cache, which for whatever reason doesn't happen now. That would be above my skill level, though, so someone else would need to do it. Also, I don't know if there's a good reason those functions don't clear the cache currently or if it was just an oversight.
--Larry Garfield
On Sun, Dec 22, 2024 at 5:12 AM Larry Garfield larry@garfieldtech.com
wrote:
On Fri, Dec 20, 2024 at 8:29 PM Larry Garfield larry@garfieldtech.com
wrote:Background: PHP has a not-often-considered feature, the stat-cache.
That is, the runtime caches the OSstat()
call for files, so that
subsequent reads on the same file can be faster. However, it's even less
realized that it's a single-file cache. It literally only applies when you
try to do two file-infomation operations on the same file in rapid
succession, without any other file reads in between.I would prefer to disable it by default but keep some option (INI) to
re-enable it. I think that for most users the perf impact will be
negligible. However, it is quite likely that there are some user
workflows and platforms where benefiting from the stat cache can be
still significant in terms of performance. So those users should have
the option to re-enable it if they see some significant regression
rather then force them to update their code to make it faster or
implement their own cache which would just make their migration to the
next version much harder / potentially impossible. There is not such a
huge maintenance that we would really need to get rid of it completely.
I would really prefer having such option and tell to users to re-enable
it rather than not be able to deal with potentially reported future
perf regressions.I think the main issue with the cache is that is just not convenient
for use cases where it doesn't get flushed during some different access
methods that don't trigger flush. We could probably improve the stream
situation a bit but it still leaves external (e.g. shell) access
problem in place which we just cannot fix. On the other hand it is
possible to use it in a way that users can profit from it but they
really need to know how it works. That's way it should be an optional
feature IMO. We should also improve documentation in that regards.In terms of voting, if there was no option to re-enable it, I would
probably vote against this proposal as I'm a bit worried about those
possible regression reports.I really don't like the idea of another ini toggle. That actually creates
more work, as people writing code that works with the file system now have
one more invisible context they have to think about. Which means they
probably won't, until it bites them. (They'll either never bother clearing
the cache, so their code may malfunction on the rare system where it's
enabled, or always clear it, which 99.9% of the time will actually be
slower as we have to invoke the function for it to do nothing. Both are
bad.)
Well it's much less likely to bite anyone than if it's always on. I think
if we document it well and there is a good switch note, it should be clear
enough for users and only users that understand what it does should enable
it.
I can see that if anyone enables it just on prod, then they will have hard
time to recreate the issues on local setup but that's already the case with
some other option. You just need to get the right settings from prod to be
able to recreate things on local setup.
I don't really have a better idea how to minimize impact on the users if
they see significant regression from this change. Changing the functions
signature is just not viable IMO.
I suppose a possible alternative would be to modify all file system
mutation functions (file_put_contents(),touch()
, etc.) to flush the cache,
which for whatever reason doesn't happen now. That would be above my skill
level, though, so someone else would need to do it. Also, I don't know if
there's a good reason those functions don't clear the cache currently or if
it was just an oversight.
As I said we could probably handle some stream cases more aggressively but
it won't resolve the problem completely. We still have things like
system("touch /file/path") which we cannot flush the stat cache for. And
it's not just shell access - there might be some 3rd party extensions that
operate on files or there might be other programs accessing files at the
same time. So there are many places which we just cannot control.
Regards
Jakub
I suppose a possible alternative would be to modify all file system
mutation functions (file_put_contents(),
touch()
, etc.) to flush the cache,
which for whatever reason doesn't happen now. That would be above my skill
level, though, so someone else would need to do it. Also, I don't know if
there's a good reason those functions don't clear the cache currently or if
it was just an oversight.As I said we could probably handle some stream cases more aggressively but
it won't resolve the problem completely. We still have things like
system("touch /file/path") which we cannot flush the stat cache for. And
it's not just shell access - there might be some 3rd party extensions that
operate on files or there might be other programs accessing files at the
same time. So there are many places which we just cannot control.
Thinking about it, there might be a possibility to address it (at least on
Linux) using fanotify. Not sure about other platforms but maybe there are
some solutions to address it. Also it might get a bit complex and not sure
how much the solution is viable.
I guess we should first research and maybe PoC to which extend this can be
actually fixed. I will try to prioritize it and look into it in the coming
weeks.
Regards
Jakub
Thinking about it, there might be a possibility to address it (at least on
Linux) using fanotify. Not sure about other platforms but maybe there are
some solutions to address it. Also it might get a bit complex and not sure
how much the solution is viable.
For FrankenPHP, we successfully use https://github.com/e-dant/watcher for
this kind of usage. It supports Linux (fanotify, notify, epoll...), macOS,
and Windows, and is very efficient.
It's a C++ library but with pure C bindings.
Best,
Thinking about it, there might be a possibility to address it (at least on Linux) using fanotify. Not sure about other platforms but maybe there are some solutions to address it. Also it might get a bit complex and not sure how much the solution is viable.
For FrankenPHP, we successfully use https://github.com/e-dant/watcher for this kind of usage. It supports Linux (fanotify, notify, epoll...), macOS, and Windows, and is very efficient.
It's a C++ library but with pure C bindings.Best,
I think watching files makes sense for FrankenPHP's use case of
development server, but I can't imagine the overhead and
non-portability makes sense for the stat cache. I feel any gains you'd
have from stat cache would be offset by that. And it'd be unfortunately
the only way to actually solve the "external thing changes thing behind
the stat cache's back" problem.
Thinking about it, there might be a possibility to address it (at least
on Linux) using fanotify. Not sure about other platforms but maybe there
are some solutions to address it. Also it might get a bit complex and not
sure how much the solution is viable.For FrankenPHP, we successfully use https://github.com/e-dant/watcher
for this kind of usage. It supports Linux (fanotify, notify, epoll...),
macOS, and Windows, and is very efficient.
It's a C++ library but with pure C bindings.Best,
I think watching files makes sense for FrankenPHP's use case of
development server, but I can't imagine the overhead and
non-portability makes sense for the stat cache. I feel any gains you'd
have from stat cache would be offset by that. And it'd be unfortunately
the only way to actually solve the "external thing changes thing behind
the stat cache's back" problem.
Yeah I did some investigation and was thinking more about this. To make it
effective, we would need a new thread that would poll the selected API and
then clear the stat cache file (there would need to be a write lock for
this ofc). We have actually another use case for this worker thread on ZTS
MacOS in relation of timers which we were discussing some time ago - it
could open path to removing the need for signals (e.g. possibly getting
step closer to using goroutines for FrankenPHP). This was just for ZTS and
we realised that for that we need a better event loop. We have got some
plans for that but it might take some time to get there. But there is some
possibility that we could eventually have it. It could also mean that we
could extend that stat cache for more than the last file which could have
some positive impact on perf.
What we could do in the meantime is to do more aggressive flushing as
mentioned in https://bugs.php.net/bug.php?id=72666 (comments). That could
be potentially also applied as a bug fix (or at least some part of it and
the more aggressive parts in master). It won't fix it completely but it
might help with the most problematic issues.
Regards
Jakub
While it is nice the Symfony and WordPress wouldn't suffer a lot from
dropping this cache, what's the impact on scripts that are processing
hundreds of files?
Would doing $stat = stat($filename);
instead of separate calls to
filemtime
and filesize
actually be important? Or would it still amount
to 1% performance difference on an SSD?
I mean, are there cases when this cache is still useful in 2025?
While it is nice the Symfony and WordPress wouldn't suffer a lot from
dropping this cache, what's the impact on scripts that are processing
hundreds of files?Would doing
$stat = stat($filename);
instead of separate calls to
filemtime
andfilesize
actually be important? Or would it still amount
to 1% performance difference on an SSD?
The limited data so far suggests it isn't that important, unless you're doing filemtime()
, filesize()
together in order over hundreds or thousands of files. In that case, calling stat()
would be better, though by how much is unclear. Or using SplFileInfo(). (I have no idea if it uses the stat cache or loads the stat data once and just exposes it through methods.)
I mean, are there cases when this cache is still useful in 2025?
That is indeed the question. :-) I think so far we can say "not most of the time," but haven't yet figured out all the possible edge cases.
--Larry Garfield