Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126163 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id EBED91A00BD for ; Sun, 22 Dec 2024 07:48:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1734853514; bh=calVywo2An9Zo7HGHLiSo+5OVJxY4h953k1HjK4xJ1I=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=gX+LX+LCZA8dfRTMgGoU1eGwWe/v6Z44BLQ7W/1lIAPPAET1JugliH7JILdU2P9zj 4VvWgBNYr59TssOKwybmDCrHxJ75jJGqgfY7FHD+OY1OLDNF8dlWpZTRz6QuQleS2G 638kkChwc3Rzs/pZf6l50EvaX3NJVu+7K5xp3sMiFMyMkxTr5Mly/NJK89AslCfu1K znfx8NRrPhps1IqC/v7/gGQBl846EGdXKv80JTPwA9aimYjOwU73L8yu8/oayDXkSi ifGOuxgUR5a18GYlRFDgBHx85GJe9i++evpjaOzaT5mRkIm+qXE7WJlB3XGalYKG9p wBnrGzkf09grw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id BA1E618003E for ; Sun, 22 Dec 2024 07:45:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: * X-Spam-Status: No, score=1.9 required=5.0 tests=BAYES_50,DMARC_NONE, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-oa1-f44.google.com (mail-oa1-f44.google.com [209.85.160.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 22 Dec 2024 07:45:13 +0000 (UTC) Received: by mail-oa1-f44.google.com with SMTP id 586e51a60fabf-29f88004a92so1882470fac.1 for ; Sat, 21 Dec 2024 23:48:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734853693; x=1735458493; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=lRVXWkAPnJCXfKRtfK76OvU7RvX/jfOF1VsDMsVL24A=; b=WKrYsSRBM2gJwsLOIQIFvmCtVeZAoOOL1a1d/XCQpurr+bNUANEUVOndARvXx9hA+q nIw+prghrl+Z6TaOi7t8zqhT9Ktw+hwFbM/raIwcTXH443flBW2DO0M7/Q/fK32cGTMF twp5l/k2YWa0yGqzLUiA2EY8g/eAAyWESxRFf32/gwmlewklf1IrcCv8ZS5au7BUERCk 8F6bvDVFbCRiQlqyZz/Lc8sVoDoVhoe6LZIVFVymQGW0THBADBYFe16yR2CS/jOOnXcZ OFFOYMlMPmIj88eKBl7DHLlvhs32nBOwCvfVtdGMuC7LtNuQrKFZgN9fyL3JP0cDs7Hb MooQ== X-Gm-Message-State: AOJu0Yzh5wEhGYtPu5yvUhfsFq2nlrBllfW6EDJHSNP5dukclIcjw8Ss wQfShnaxrAVLlH7ZnqUpb12e+vv8YbHbKCDTpmN45CC5dm5ffhE5Mf1XZVmbRF32MsiysnNWSaC 3Mhd2ZvVzzYwC8vq+fqy9e/l+Wuu3oA== X-Gm-Gg: ASbGncubzORkxQHVmffcOYRIW5xMKw8zvuJXvGdhYCPA6AcvxnDdthvUhPOZIPWuS4I w/nxdxuWquFTctraQRcyFrU+Kt2OjeH48qEM9 X-Google-Smtp-Source: AGHT+IFjnciObId7+JT+PzAofeApvEff/9PHbyraPXbOb2P8sTvF63Zl2dADyYeKzShdjJUyxGzFK0xrvnW9o8TEJFI= X-Received: by 2002:a05:6870:9c83:b0:29e:5e83:150e with SMTP id 586e51a60fabf-2a7fb431d10mr5244362fac.27.1734853693426; Sat, 21 Dec 2024 23:48:13 -0800 (PST) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <27531d9d-9bfe-4acc-b9ab-80b1017e3038@app.fastmail.com> In-Reply-To: Date: Sun, 22 Dec 2024 08:48:02 +0100 Message-ID: Subject: Re: [PHP-DEV] Discussion: Remove file statcache? To: Larry Garfield Cc: php internals Content-Type: multipart/alternative; boundary="0000000000002254480629d71a6d" From: bukka@php.net (Jakub Zelenka) --0000000000002254480629d71a6d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Dec 22, 2024 at 5:12=E2=80=AFAM Larry Garfield wrote: > On Sat, Dec 21, 2024, at 10:43 AM, Jakub Zelenka wrote: > > On Fri, Dec 20, 2024 at 8:29=E2=80=AFPM Larry Garfield > wrote: > >> Background: PHP has a not-often-considered feature, the stat-cache. > That is, the runtime caches the OS stat() call for files, so that > subsequent reads on the same file can be faster. However, it's even less > realized that it's a single-file cache. It literally only applies when y= ou > try to do two file-infomation operations on the same file in rapid > succession, without any other file reads in between. > > > I would prefer to disable it by default but keep some option (INI) to > > re-enable it. I think that for most users the perf impact will be > > negligible. However, it is quite likely that there are some user > > workflows and platforms where benefiting from the stat cache can be > > still significant in terms of performance. So those users should have > > the option to re-enable it if they see some significant regression > > rather then force them to update their code to make it faster or > > implement their own cache which would just make their migration to the > > next version much harder / potentially impossible. There is not such a > > huge maintenance that we would really need to get rid of it completely. > > I would really prefer having such option and tell to users to re-enable > > it rather than not be able to deal with potentially reported future > > perf regressions. > > > > I think the main issue with the cache is that is just not convenient > > for use cases where it doesn't get flushed during some different access > > methods that don't trigger flush. We could probably improve the stream > > situation a bit but it still leaves external (e.g. shell) access > > problem in place which we just cannot fix. On the other hand it is > > possible to use it in a way that users can profit from it but they > > really need to know how it works. That's way it should be an optional > > feature IMO. We should also improve documentation in that regards. > > > > In terms of voting, if there was no option to re-enable it, I would > > probably vote against this proposal as I'm a bit worried about those > > possible regression reports. > > I really don't like the idea of another ini toggle. That actually create= s > more work, as people writing code that works with the file system now hav= e > one more invisible context they have to think about. Which means they > probably won't, until it bites them. (They'll either never bother cleari= ng > the cache, so their code may malfunction on the rare system where it's > enabled, or always clear it, which 99.9% of the time will actually be > slower as we have to invoke the function for it to do nothing. Both are > bad.) > > Well it's much less likely to bite anyone than if it's always on. I think if we document it well and there is a good switch note, it should be clear enough for users and only users that understand what it does should enable it. I can see that if anyone enables it just on prod, then they will have hard time to recreate the issues on local setup but that's already the case with some other option. You just need to get the right settings from prod to be able to recreate things on local setup. I don't really have a better idea how to minimize impact on the users if they see significant regression from this change. Changing the functions signature is just not viable IMO. > I suppose a possible alternative would be to modify all file system > mutation functions (file_put_contents(), touch(), etc.) to flush the cach= e, > which for whatever reason doesn't happen now. That would be above my ski= ll > level, though, so someone else would need to do it. Also, I don't know i= f > there's a good reason those functions don't clear the cache currently or = if > it was just an oversight. > > As I said we could probably handle some stream cases more aggressively but it won't resolve the problem completely. We still have things like system("touch /file/path") which we cannot flush the stat cache for. And it's not just shell access - there might be some 3rd party extensions that operate on files or there might be other programs accessing files at the same time. So there are many places which we just cannot control. Regards Jakub --0000000000002254480629d71a6d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Sun, Dec 22, 2024 at 5:12=E2=80=AFAM Larry Garfiel= d <larry@garfieldtech.com&= gt; wrote:
On Sat, Dec 21, 2024, at 10:43 AM, = Jakub Zelenka wrote:
> On Fri, Dec 20, 2024 at 8:29=E2=80=AFPM Larry Garfield <larry@garfieldtech.com= > wrote:
>> Background: PHP has a not-often-considered feature, the stat-cache= .=C2=A0 That is, the runtime caches the OS stat() call for files, so that s= ubsequent reads on the same file can be faster.=C2=A0 However, it's eve= n less realized that it's a single-file cache.=C2=A0 It literally only = applies when you try to do two file-infomation operations on the same file = in rapid succession, without any other file reads in between.

> I would prefer to disable it by default but keep some option (INI) to =
> re-enable it. I think that for most users the perf impact will be
> negligible. However, it is quite likely that there are some user
> workflows and platforms where benefiting from the stat cache can be > still significant in terms of performance. So those users should have =
> the option to re-enable it if they see some significant regression > rather then force them to update their code to make it faster or
> implement their own cache which would just make their migration to the=
> next version much harder / potentially impossible. There is not such a=
> huge maintenance that we would really need to get rid of it completely= .
> I would really prefer having such option and tell to users to re-enabl= e
> it rather than not be able to deal with potentially reported future > perf regressions.
>
> I think the main issue with the cache is that is just not convenient <= br> > for use cases where it doesn't get flushed during some different a= ccess
> methods that don't trigger flush. We could probably improve the st= ream
> situation a bit but it still leaves external (e.g. shell) access
> problem in place which we just cannot fix. On the other hand it is > possible to use it in a way that users can profit from it but they > really need to know how it works. That's way it should be an optio= nal
> feature IMO. We should also improve documentation in that regards.
>
> In terms of voting, if there was no option to re-enable it, I would > probably vote against this proposal as I'm a bit worried about tho= se
> possible regression reports.

I really don't like the idea of another ini toggle.=C2=A0 That actually= creates more work, as people writing code that works with the file system = now have one more invisible context they have to think about.=C2=A0 Which m= eans they probably won't, until it bites them.=C2=A0 (They'll eithe= r never bother clearing the cache, so their code may malfunction on the rar= e system where it's enabled, or always clear it, which 99.9% of the tim= e will actually be slower as we have to invoke the function for it to do no= thing.=C2=A0 Both are bad.)


Well it's much less likely to bite= anyone than if it's always on. I think if we document it well and ther= e is a good switch note, it should be clear enough for users and only users= that understand what it does should enable it.

I = can see that if anyone enables it just on prod, then they will have hard ti= me to recreate the issues on local setup but that's already the case wi= th some other option. You just need to get the right settings from prod to = be able to recreate things on local setup.=C2=A0

I= don't really have a better idea how to minimize impact on the users if= they see significant regression from this change. Changing the functions s= ignature is just not viable IMO.
=C2=A0
I suppose a possible alternative would be to modify all file system mutatio= n functions (file_put_contents(), touch(), etc.) to flush the cache, which = for whatever reason doesn't happen now.=C2=A0 That would be above my sk= ill level, though, so someone else would need to do it.=C2=A0 Also, I don&#= 39;t know if there's a good reason those functions don't clear the = cache currently or if it was just an oversight.

As I said we could probably handle some stream cases more aggr= essively but it won't resolve the problem completely. We still have thi= ngs like system("touch /file/path") which we cannot flush the sta= t cache for. And it's not just shell access - there might be some 3rd p= arty extensions that operate on files or there might be other programs acce= ssing files at the same time. So there are many places which we just cannot= control.

Regards=C2=A0 =C2=A0

Jakub
--0000000000002254480629d71a6d--