Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:116958 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 282 invoked from network); 31 Jan 2022 13:18:52 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 31 Jan 2022 13:18:52 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 35FA21804BA for ; Mon, 31 Jan 2022 06:32:51 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 31 Jan 2022 06:32:50 -0800 (PST) Received: by mail-wr1-f52.google.com with SMTP id e2so25870388wra.2 for ; Mon, 31 Jan 2022 06:32:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lablabs-io.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:message-id:in-reply-to:references:subject :mime-version; bh=1dsu3OoHRTRdUlxCgJSRWhkGhk6mpQ1IbL7oTRneMMs=; b=1YywdyguC1m0ztYZKtcYrIRaxu6vIkMdmZupfjUiCBoA0JFJchOySZwOzyQZj3r8Bh yITyZPrCL08l8VtlGyz/gLmXDL1gQQq77eH6YkHe9RGBugB1ioeoC3b//zMQ+DGwVqsX 48R+MF+1ehezJqPSMebKsWTtWT7rWBeY4i7NQbI3kJM36SKA4YkbOq8E0jvjVg9xJP5T oTRnQwIjgaO15LZa4+8tT/UOKEVrU6XllUr+HbYPD1EBfTfR8wBkNslKufs7kvd9cRWB S3tdzbFPmV7I5ECoi6LHVamJh4nt3P+DyQZ7ZTtPAW8jUNbPc+upNpnEOt3ydpI0LYEQ v9Gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:message-id:in-reply-to :references:subject:mime-version; bh=1dsu3OoHRTRdUlxCgJSRWhkGhk6mpQ1IbL7oTRneMMs=; b=GtUM+J6R8KJG5VaGd2XX7wF8joi3DPjfJCOGzZ5vJKwEVF7xXCFvuNfYqGs8RrQJSt KI+dYhM4vTsE7mmC9VQjxce26LQzHNAZLI1EaeT8HPcgbveMSgoiaa45l496MUUPdbPv tX89xQiuDMsgkexAJi+lEh/RebelCMUu3D1m9FzfKFrwE2mwxoAMp1Bei15GqjeubSgv mXnCUUwM1C3E8Lz8cc8MuIDfg5ZpBO+Ffej0c17LnhWWygf8AeV4ml3HHn/YMR6k89KR X5u0rOtDHGndQFljtBt606LCRP4tg6HfK7IdgL+HqworJ1mDLMQEi9uUKP5L117dlNPb ThHg== X-Gm-Message-State: AOAM531l02Ib7H35iseRJZ3LhoZI5H5RmkeE/nVG1V2z0Cz7eGK6ouLe oYaHGeoWLe3uzQ0jrb5sBTpNtw== X-Google-Smtp-Source: ABdhPJz7TUxXSw2RNv0LQQLzX61Da3zxweS1P3OTe7Ln+uOO+htAAQxH35KW8tR70DHOezqnXQv70g== X-Received: by 2002:a5d:51d0:: with SMTP id n16mr17540569wrv.464.1643639569221; Mon, 31 Jan 2022 06:32:49 -0800 (PST) Received: from [192.168.0.136] ([92.240.230.250]) by smtp.gmail.com with ESMTPSA id f16sm8568340wmg.28.2022.01.31.06.32.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 31 Jan 2022 06:32:48 -0800 (PST) Date: Mon, 31 Jan 2022 15:32:42 +0100 To: =?utf-8?Q?Alexandru_P=C4=83tr=C4=83nescu?= Cc: PHP internals Message-ID: In-Reply-To: References: X-Readdle-Message-ID: a06192e1-648f-4f64-bf48-d2f74051880a@Spark MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="61f7f30f_43f18422_e489" Subject: Re: [PHP-DEV] Best way to monitor php-fpm container liveness on Kubernetes From: adam.hamsik@lablabs.io (Adam Hamsik) --61f7f30f_43f18422_e489 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi Alexander, Pls see below my answers. =C2=A0 =C2=A0Best Regards, =C2=A0 =C2=A0Adam. Adam Ham=C5=A1=C3=ADk Co-founder & CEO Mobile:=C2=A0+421-904-937-495 www.lablabs.io On 23 Jan 2022, 09:09 +0100, Alexandru P=C4=83tr=C4=83nescu , wrote: > > On Sat, Jan 22, 2022 at 10:00 PM Adam Hamsik = wrote: > > Hello, > > > > We are using PHP for our application backends, this works very well a= s we have developed s imple way to clone them with minimal effort(they ca= n be very similar). =46or our orchestration we are using Kubernetes (>=3D= 1.21). Our application pod generally contains NGINX + php-fpm and fluent= bit for log shipping. We generally want to have LivenessProbe(for an simp= le explanation this is a simple check which is run against our pod to ver= ify if it's alive, if it fails particular container will be restarted). > > > > This works very we(we are also using swoole which is roughly 80-70% b= etter)l, but in certain unstable situations when we see higher applicatio= n latency (db problem or a bug in our application). We often experience p= roblems, because pods are falsely marked as dead (failed liveness probe a= nd restarted by kubelet). This happens all processes in our static pool a= re allocated to application requests. =46or our livenessProbe we tried to= use both fpm.ping and fpm.status endpoints but both of them behave in a = same way as they are managed with worker processes. > > > > I had a look at pgp-src repo if e.g. we can use signals to verify if = application server is running as a way to go around our issue. When looki= ng at this I saw fpm-systemd.c which is a SystemD specific check. This ch= eck reports fpm status every couple seconds(configurable to systemd). Wou= ld you be willing to integrate similar feature for kubernetes. This would= be based on a pull model probably with and REST interface. > > > > My idea is following: > > > > 1) During startup if this is enabled php-fpm master will open a secon= dary port pm.health=5Fport(9001) and listen for a pm.health=5Fpath(/healt= z)=5B2=5D. > > 2) If we receive GET request fpm master process will respond with HTT= P code 200 and string ok. If anything is wrong (we can later add some che= cks/metrics to make sure fpm is in a good state).=C2=A0=C2=A0If we do not= respond or fpm is not ok our LivenessProbe will fail. based on configura= tion this will trigger container restart. > > > > Would you be interested to integrate feature like this =3F or is ther= e any other way how we can achieve similar results =3F > > > > =5B1=5D=C2=A0https://kubernetes.io/docs/concepts/workloads/pods/pod-l= ifecycle/=23when-should-you-use-a-liveness-probe=5B2=5D=C2=A0https://kube= rnetes.io/docs/reference/using-api/health-checks/ > > > > =C2=A0 =C2=A0Best Regards, > > > > =C2=A0 =C2=A0Adam. > > > > > > > > Adam Ham=C5=A1=C3=ADk > > Co-founder & CEO > > Mobile:=C2=A0+421-904-937-495 > > www.lablabs.io > > Hi Adam, > > While I believe that improvements for health checking and other metrics= can be added to the php-fpm to expose internal status and statistics, > I want to say that I don't know too much about that and I want to first= discuss the problem that you mentioned and the approach. > > Based on my experience, it is best to have the health check always goin= g through the application. > You mentioned =22certain unstable situations when we see higher applica= tion latency (db problem or a bug in our application)=22. > Taking this two examples: > > - =22db problems=22. I'm guessing you mean, higher latency from the dat= abase. > In case of the health check, you should not connect to the database, of= course so the actual execution of the healthcheck should not be impacted= . > But probably you mean that more requests are piling up as php-fpm is no= t able to handle them as fast as they are coming due to limited child pro= cesses. > One solution here would be to configure a second listening pool for hea= lth endpoint on php-fpm with 1 or 2 child processes and configure nginx t= o use it for the specific path. > > - =22a bug in our application=22.I'm guessing you mean a bug that cause= s high CPU usage. > If the issue is visible immediately once the pod starts, it's good to h= ave the health check failed so the deployment rollout fails and avoid bri= nging bugs in production. > If the issue is visible later, some time after the pod starts, I'm thin= king this could happen due to a memory leak. A pod restart due to a faile= d health check would also make sure the production stays healthy. Both of these problem are not usually by themselves big enough to cause a= n outage. They are just making application behave slightly worse, this ho= wever can sometimes lead to failed liveness probes -> pod restarts. > > Having the health check passing through the application makes sure it's= actually working. Sure, Bu in our case we either go to fpm.ping or fpm.status as initializi= ng whole symfony is quite expensive. I'm not sure if this counts as=C2=A0= =C2=A0going through application. > Based on my experience, it's good to include in the health check all th= e application bootstrapping that are local and avoid any I/O like databas= e, memcache and others. > A missed new production configuration dependency that=C2=A0would make t= he application not start up properly would not allow the deployment rollo= ut and keep high uptime. > A health check=C2=A0that is not using the actual application would repo= rt it healthy while=C2=A0it will not be able to handle requests. > I agree with this. We initially tried to do a lot in our healthchecks and= gradually reducte their footprint/scope to just required minimum, becaus= e they were too fragile. > If I understand things differently or there are other cases that you en= countered where you think a health check not going through the app is hel= ping please share so we can learn about it. > > Regards, > Alex > --61f7f30f_43f18422_e489--