Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:116915 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 19191 invoked from network); 23 Jan 2022 06:57:45 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 23 Jan 2022 06:57:45 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 4AAD11804D9 for ; Sun, 23 Jan 2022 00:09:40 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 23 Jan 2022 00:09:39 -0800 (PST) Received: by mail-ed1-f53.google.com with SMTP id j23so45570248edp.5 for ; Sun, 23 Jan 2022 00:09:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=1PsA1jcoalklP9x/DLvVl4Un7flLWEu37D7BVo8vixE=; b=Videc6dq+2Qr3LKrFLFEIETObCemwM0sAV8Ro4uecrsj410c6Z7B1BHmtMC7TsRt3H jw39kUpigd5tbj3LVFjagWhcT58sSbhA4j243DKhYxkTzu5qBm7YlY2UJkIahRxX4ITW fAH9wncDNb6CdY/0Hn9pgvLrw9n3K0LOWUAUzRZtFd+QjoHmLRTZwyoUMFWSFTeejuNj 5rJfs8GG3EyW93fMz8TK8bEvbClWQz+XBXBRRXdp3CqkRuPXYmgiT6f/hXvYyjxRz3H0 dgpN1LBOIL11LrGN/A8lLBYSGlsXseNpzfl5T6RwU2K8yKX0Cc9OicgeGpDfKux99Vsz GjGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=1PsA1jcoalklP9x/DLvVl4Un7flLWEu37D7BVo8vixE=; b=Rcpc8Ul3yKUbgIZTVZQ09emOBEEitzkrAd3caseaz9q/O+6G6SOAhpb6M6aRhJMwXm 1vSESxcT8ZZdd2sZKwULqwSpe5HEtZ3uDXnF1Lx0qE7sumrBgwvVP0tYcAnuMIoGNYOG Ew7qAY6J77zW9mbzIho6bvVGNOj+iyOOvN4BuG4/uA7j0UIU05TlyxfphByFBoKnfBMq N5quwAo1rIPAuCtO+LFuy8AFYlpPpYc+8EThQSWoXe0pH67X3Tqs6hmZqfDLOBALIlwJ 9mLR8a4/j5+zrMl25rkbjOg+J9Xo3AiTE2ppIafeSY3GGt6TO9vIey25HEvpvx4tvKx0 +tCA== X-Gm-Message-State: AOAM531qw/CuTwN+Z83nVX8PtaoRQWrqSP0WdD+qcfKAPwHi1SJyb1i7 mOvyppgnaCuEMPE4pDfaE3iAaXoyrePU2FHSn3uleX2gDR/TGw== X-Google-Smtp-Source: ABdhPJwrgM6hdg+EGEaoKz0264yx/YvltsLVRh6E7GUID3KaOxiVAGBaLx6vFRhO2EijubKFXzW/XT2mHE63E6iuezg= X-Received: by 2002:a05:6402:11ca:: with SMTP id j10mr11146209edw.169.1642925378490; Sun, 23 Jan 2022 00:09:38 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: Date: Sun, 23 Jan 2022 10:09:22 +0200 Message-ID: To: Adam Hamsik Cc: PHP internals Content-Type: multipart/alternative; boundary="00000000000093d4e205d63b5f07" Subject: Re: [PHP-DEV] Best way to monitor php-fpm container liveness on Kubernetes From: drealecs@gmail.com (=?UTF-8?Q?Alexandru_P=C4=83tr=C4=83nescu?=) --00000000000093d4e205d63b5f07 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Jan 22, 2022 at 10:00 PM Adam Hamsik wrote= : > Hello, > > We are using PHP for our application backends, this works very well as we > have developed s imple way to clone them with minimal effort(they can be > very similar). For our orchestration we are using Kubernetes (>=3D 1.21).= Our > application pod generally contains NGINX + php-fpm and fluentbit for log > shipping. We generally want to have LivenessProbe(for an simple explanati= on > this is a simple check which is run against our pod to verify if it's > alive, if it fails particular container will be restarted). > > This works very we(we are also using swoole which is roughly 80-70% > better)l, but in certain unstable situations when we see higher applicati= on > latency (db problem or a bug in our application). We often experience > problems, because pods are falsely marked as dead (failed liveness probe > and restarted by kubelet). This happens all processes in our static pool > are allocated to application requests. For our livenessProbe we tried to > use both fpm.ping and fpm.status endpoints but both of them behave in a > same way as they are managed with worker processes. > > I had a look at pgp-src repo if e.g. we can use signals to verify if > application server is running as a way to go around our issue. When looki= ng > at this I saw fpm-systemd.c which is a SystemD specific check. This check > reports fpm status every couple seconds(configurable to systemd). Would y= ou > be willing to integrate similar feature for kubernetes. This would be bas= ed > on a pull model probably with and REST interface. > > My idea is following: > > 1) During startup if this is enabled php-fpm master will open a secondary > port pm.health_port(9001) and listen for a pm.health_path(/healtz)[2]. > 2) If we receive GET request fpm master process will respond with HTTP > code 200 and string ok. If anything is wrong (we can later add some > checks/metrics to make sure fpm is in a good state). If we do not respon= d > or fpm is not ok our LivenessProbe will fail. based on configuration this > will trigger container restart. > > Would you be interested to integrate feature like this ? or is there any > other way how we can achieve similar results ? > > [1] > https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-sh= ould-you-use-a-liveness-probe > [2] https://kubernetes.io/docs/reference/using-api/health-checks/ > > > Best Regards, > > Adam. > > > > Adam Ham=C5=A1=C3=ADk > Co-founder & CEO > Mobile: +421-904-937-495 > www.lablabs.io > > Hi Adam, While I believe that improvements for health checking and other metrics can be added to the php-fpm to expose internal status and statistics, I want to say that I don't know too much about that and I want to first discuss the problem that you mentioned and the approach. Based on my experience, it is best to have the health check always going through the application. You mentioned "certain unstable situations when we see higher application latency (db problem or a bug in our application)". Taking this two examples: - "db problems". I'm guessing you mean, higher latency from the database. In case of the health check, you should not connect to the database, of course so the actual execution of the healthcheck should not be impacted. But probably you mean that more requests are piling up as php-fpm is not able to handle them as fast as they are coming due to limited child processes. One solution here would be to configure a second listening pool for health endpoint on php-fpm with 1 or 2 child processes and configure nginx to use it for the specific path. - "a bug in our application".I'm guessing you mean a bug that causes high CPU usage. If the issue is visible immediately once the pod starts, it's good to have the health check failed so the deployment rollout fails and avoid bringing bugs in production. If the issue is visible later, some time after the pod starts, I'm thinking this could happen due to a memory leak. A pod restart due to a failed health check would also make sure the production stays healthy. Having the health check passing through the application makes sure it's actually working. Based on my experience, it's good to include in the health check all the application bootstrapping that are local and avoid any I/O like database, memcache and others. A missed new production configuration dependency that would make the application not start up properly would not allow the deployment rollout and keep high uptime. A health check that is not using the actual application would report it healthy while it will not be able to handle requests. If I understand things differently or there are other cases that you encountered where you think a health check not going through the app is helping please share so we can learn about it. Regards, Alex --00000000000093d4e205d63b5f07--