Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126155 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 8658E1A00BD for ; Sat, 21 Dec 2024 16:43:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1734799252; bh=9RxZg1OheSQUEPKhTnApmGKjzqSnPXMKXUkDcxPO4Uo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=hmAdjck3XnONhtiBHCdC1RTbMhDjTgnaqaiU64cTdnri65w8TtEXwJxS+xO2q11Yj lMhrPV/djMEotUW63CYbcf2nu6cTk7e7mTdMx5mcQ/KcY3NKFHIv4RLxtpL7kHRqU2 tDnCEdEAp8ayLeAnfbozdRgQ5egm28jeRvTj+Bpn8R482j6gwCQU6Ptup4P+Hk4IsH g2QBtHXkVSW/JWLxQGJNEnEWJZfdqXzRQx9WXel7+hdv74g8CKDtsR2g3U1Z7xcnnN JxoL+cN3xR0lC8dubfkPE5L19vHxT1XUIfpxS8UwQOGMYc+3OccbYy1fEAMUS3uwZV SDf1SQFY6+LSg== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 4946A180076 for ; Sat, 21 Dec 2024 16:40:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: * X-Spam-Status: No, score=2.0 required=5.0 tests=BAYES_50,DMARC_NONE, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-oa1-f51.google.com (mail-oa1-f51.google.com [209.85.160.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 21 Dec 2024 16:40:50 +0000 (UTC) Received: by mail-oa1-f51.google.com with SMTP id 586e51a60fabf-29e91e58584so1470217fac.2 for ; Sat, 21 Dec 2024 08:43:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734799432; x=1735404232; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RERROwVpJVVZzYUOKf1hS0GpXh8YoHQAhvsf/LModGg=; b=ZmCse0Vy0IvKv5X9Da8awNQp9Owpd4ywYOvvWVE87tydkIwGHPOP5W6XCDR0zsvNkL IQcxpltpATPpBV46tZlmhYZCCaY6+dx9qIXN0M8HL2JFmUTz8S6zVRV+/JF5cyBKvAUD eu91YrwTyP2nmx70FBbmWSmSMNljYOs8NsyIXFfc8Mmadb94dFRU9GBet4NjOt62bg74 MTf53KWJ5kT9arwigXQ3RKjJt2oaUYOQCTaWybTwSv3PoynVvNFnrkY7TSpK0bJcHdQf yFCwkP6MmXFggY1ENg1uuPnILK6RxMPJsyP2Qj1hZJbwiL6bZ1ta2Lx0526nEQH4nOdv dMhA== X-Gm-Message-State: AOJu0YyhrJDhyfpV+CFW7kNPhM5RSpeousap2F6z9SQXsB576QdLeMh6 TYYlWxlQV24T9gQafp8am20rDataiXpOF7Ejltf3TBJ2yLTXaB0bj1dT1CH+mxP96p5UY6xZqgH 91QFH6iNlPdiAVem0JKLUtnucyKiSkWy9 X-Gm-Gg: ASbGncskj+E9pEMM8H85TeM+2FQCoclH//t1J4IuOQ1m3xUIug48dlb1/HVph/7fu8Y g0/1Mqb3zZls082sd2Up6U4jznFVOpVW659jp X-Google-Smtp-Source: AGHT+IGcslpe+PJCzenKrc4fd9hYBvkd7iaAeA9GqAaUYE9fTSl0LF5o0k9XUsROoBjvGaIJXyB9xEO06er7Bp5zaSw= X-Received: by 2002:a05:6870:b9c9:b0:29e:70c7:a3eb with SMTP id 586e51a60fabf-2a7fb00b2dcmr3553872fac.7.1734799430687; Sat, 21 Dec 2024 08:43:50 -0800 (PST) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <27531d9d-9bfe-4acc-b9ab-80b1017e3038@app.fastmail.com> In-Reply-To: <27531d9d-9bfe-4acc-b9ab-80b1017e3038@app.fastmail.com> Date: Sat, 21 Dec 2024 17:43:39 +0100 Message-ID: Subject: Re: [PHP-DEV] Discussion: Remove file statcache? To: Larry Garfield Cc: php internals Content-Type: multipart/alternative; boundary="000000000000d2a1940629ca77a3" From: bukka@php.net (Jakub Zelenka) --000000000000d2a1940629ca77a3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Dec 20, 2024 at 8:29=E2=80=AFPM Larry Garfield wrote: > Background: PHP has a not-often-considered feature, the stat-cache. That > is, the runtime caches the OS stat() call for files, so that subsequent > reads on the same file can be faster. However, it's even less realized > that it's a single-file cache. It literally only applies when you try to > do two file-infomation operations on the same file in rapid succession, > without any other file reads in between. > > For more info: > https://tideways.com/profiler/blog/the-php-stat-cache-explained > > Because it's so rarely relevant, in the cases it is relevant, it can be > quite a surprise, and a surprise causing weird and hard to explain cachin= g > bugs in applications. > > The cache also dates from 20 years ago, when Rasmus added it (and the > realpath cache) in Yahoo's forked PHP 4, and then it got integrated into > PHP 5. However, hard drives are vastly faster than they were then, and > operating systems are vastly more efficient than they were then. > > There's been some discussion about making the cache disable-able, though > the consensus now seems to be leaning toward getting rid of it outright: > > https://github.com/php/php-src/pull/17178 > > Arnaud ran some quick benchmarks and found that disabling it has a less > than 1% impact on Symfony and WordPress. > > https://github.com/php/php-src/pull/17178#issuecomment-2554323572 > > Before we go any further, is there appetite among the voting population t= o > remove it? clearstatcache() and similar functions would get stubbed out = as > no-ops, but otherwise we'd just hand the responsibility back to the OS > where it belongs, which seems so far like it would be almost an > unmeasurable performance difference but remove some surprise complexity. > > Would you support such a removal? > What additional data would you need to make the case for such removal? > I would prefer to disable it by default but keep some option (INI) to re-enable it. I think that for most users the perf impact will be negligible. However, it is quite likely that there are some user workflows and platforms where benefiting from the stat cache can be still significant in terms of performance. So those users should have the option to re-enable it if they see some significant regression rather then force them to update their code to make it faster or implement their own cache which would just make their migration to the next version much harder / potentially impossible. There is not such a huge maintenance that we would really need to get rid of it completely. I would really prefer having such option and tell to users to re-enable it rather than not be able to deal with potentially reported future perf regressions. I think the main issue with the cache is that is just not convenient for use cases where it doesn't get flushed during some different access methods that don't trigger flush. We could probably improve the stream situation a bit but it still leaves external (e.g. shell) access problem in place which we just cannot fix. On the other hand it is possible to use it in a way that users can profit from it but they really need to know how it works. That's way it should be an optional feature IMO. We should also improve documentation in that regards. In terms of voting, if there was no option to re-enable it, I would probably vote against this proposal as I'm a bit worried about those possible regression reports. Regards Jakub --000000000000d2a1940629ca77a3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Fri, Dec 20, 2024 at 8:29=E2=80=AFPM L= arry Garfield <larry@garfieldt= ech.com> wrote:
Background: PHP has a n= ot-often-considered feature, the stat-cache.=C2=A0 That is, the runtime cac= hes the OS stat() call for files, so that subsequent reads on the same file= can be faster.=C2=A0 However, it's even less realized that it's a = single-file cache.=C2=A0 It literally only applies when you try to do two f= ile-infomation operations on the same file in rapid succession, without any= other file reads in between.

For more info: https://tideways.com/p= rofiler/blog/the-php-stat-cache-explained

Because it's so rarely relevant, in the cases it is relevant, it can be= quite a surprise, and a surprise causing weird and hard to explain caching= bugs in applications.

The cache also dates from 20 years ago, when Rasmus added it (and the realp= ath cache) in Yahoo's forked PHP 4, and then it got integrated into PHP= 5.=C2=A0 However, hard drives are vastly faster than they were then, and o= perating systems are vastly more efficient than they were then.

There's been some discussion about making the cache disable-able, thoug= h the consensus now seems to be leaning toward getting rid of it outright:<= br>
https://github.com/php/php-src/pull/17178

Arnaud ran some quick benchmarks and found that disabling it has a less tha= n 1% impact on Symfony and WordPress.

https://github.com/php/php-src/pull= /17178#issuecomment-2554323572

Before we go any further, is there appetite among the voting population to = remove it?=C2=A0 clearstatcache() and similar functions would get stubbed o= ut as no-ops, but otherwise we'd just hand the responsibility back to t= he OS where it belongs, which seems so far like it would be almost an unmea= surable performance difference but remove some surprise complexity.

Would you support such a removal?
What additional data would you need to make the case for such removal?
<= /blockquote>

I would prefer to disable it by default but= keep some option (INI) to re-enable it. I think that for most users the pe= rf impact will be negligible. However, it is quite likely that there are so= me user workflows and platforms where benefiting from the stat cache can be= still significant in terms of performance. So those users should have the = option to re-enable it if they see some significant regression rather then = force them to update their code to make it faster or implement their own ca= che which would just make their migration to the next version much harder /= potentially impossible. There is not such a huge maintenance that we would= really need to get rid of it completely. I would really prefer having such= option and tell to users to re-enable it rather than not be able to deal w= ith potentially reported future perf regressions.=C2=A0

I think the main issue with the cache is that is just not convenient = for use cases where it doesn't get flushed during some different access= methods that don't trigger flush. We could probably improve the stream= situation a bit but it still leaves external (e.g. shell) access problem i= n place which we just cannot fix. On the other hand it is possible to use i= t in a way that users can profit from it but they really need to know how i= t works. That's way it should be an optional feature IMO. We should als= o improve documentation in that regards.

In terms = of voting, if there was no option to re-enable it, I would probably vote ag= ainst this proposal as I'm a bit worried about those possible regressio= n reports.

Regards

Jakub<= /div>
--000000000000d2a1940629ca77a3--