Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:128764 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id 298F41A00BC for ; Wed, 1 Oct 2025 19:56:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1759348500; bh=QIGKp9Xq5joXAvVJkPR/H5QcR8u914BeAWvivAmYV4I=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=leGt32VcVMxJWHUAR1qZ7YPWNNDz/eqY+LIWhd1LJT/KVq9ifyf6GVbjaQKY/i42G mcOgG2fjQ005/0ATDkk1KrFLGn3mvbj+rImnREGM3Ss5Mtu8zCprirftY9voVlEUT9 gk/DkJVz/1jXVWXy8WVb16VojoRMyKhCk9aNWKJcyawIphecQqadDu/znF9f9trxv9 EjmZM6m/faYBgVuG1nGiWXtPlM6vdFtlEpo9HGMRk5G9yAWl1+a8YVgt2irFp8iJvM hTIFxKDVrGS3kJf/UZDgzWIca8KZ7CH7ejldxM/UEor+SmacDX9kybGjynbvYdYOOU P9PhaH0CvI/yw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 0C4991801D4 for ; Wed, 1 Oct 2025 19:54:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,HTML_MESSAGE, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from mx1.dfw.automattic.com (mx1.dfw.automattic.com [192.0.84.151]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 1 Oct 2025 19:54:58 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mx1.dfw.automattic.com (Postfix) with ESMTP id 36E494A0023 for ; Wed, 1 Oct 2025 19:56:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; h=x-mailer:references:in-reply-to:date:date:subject:subject :mime-version:content-type:content-type:message-id:from:from; s= automattic1; t=1759348576; bh=QIGKp9Xq5joXAvVJkPR/H5QcR8u914BeAW vivAmYV4I=; b=SRuvyFwVtnhgkJOiyC48u+mFWC9eg7EJW1sFuM9Xfktdt4JjVb mcCMpdX9oMqnsQ8LEYEWxHcvw6j9vybITy79FekAnVYcIgFd1JoMbBHwFjscLtg2 ZnoYCo90OrkSQIe497AjuYpiThPGlVttp514iIWO81kCzis/UaXoLIEuxxa+izaL I6hVYEqRfZAV/vMphROeJT5lpcvq5UAbNKz9+pHszWAjRTjJ3WfTPpHGpEZLswKG jI1yRK+wUdxCqgd3YUNgs+puE+yVSGzuUGb10XA4VkxQ+U8tbe2DBPIRoS7cH1ns WAtPoPCjJDhiMBRyXHK88vCJ/J2x5D/v5O0w== X-Virus-Scanned: Debian amavisd-new at wordpress.com Received: from mx1.dfw.automattic.com ([127.0.0.1]) by localhost (mx1.dfw.automattic.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ohMCy85yPMXw for ; Wed, 1 Oct 2025 19:56:16 +0000 (UTC) Received: from smtp-gw2.dfw.automattic.com (smtp-gw2.dfw.automattic.com [192.0.95.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx1.dfw.automattic.com (Postfix) with ESMTPS id D106D4A001C for ; Wed, 1 Oct 2025 19:56:16 +0000 (UTC) Authentication-Results: mail.automattic.com; dkim=pass (2048-bit key; unprotected) header.d=automattic.com header.i=@automattic.com header.b="Qmzr21WA"; dkim-atps=neutral Received: from smtp-gw2.dfw.automattic.com (localhost.localdomain [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-gw2.dfw.automattic.com (Postfix) with ESMTPS id C2AC5A03D0 for ; Wed, 1 Oct 2025 19:56:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; s=automattic1; t=1759348576; bh=QIGKp9Xq5joXAvVJkPR/H5QcR8u914BeAWvivAmYV4I=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=Qmzr21WADeR3VcLvVKjvCEQGcQEb/w96NU4eE6KVg4IDVjTWGBiikG0XZIfOMkIuR kv3ZHaxfTA91y8nJ84mblrLXT72tkfUhYOnusQyeYRSTHJJlNRI9Kg4t69oyAAE1Hs i6qbtL4d8TQERxUJgtlpwiM/ilZVZu2iZ2LwnBlncdSsZ18xsaFZWJ0fxmlf64Izc4 BYG82BvzUeE0jxYve4S5YzwLznaQsWgzTxejAuPUH2vFlf1UhdUWT6j+tguwB/IzjK 1gis0Y0h2mJXN4A6fFlwo1t6PUZymIMBNlHwBsviCTlthDXeVEqohFJsd1s+sEFLw/ T/FpwF58SYs+g== Received: from mail-il1-f198.google.com (mail-il1-f198.google.com [209.85.166.198]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-gw2.dfw.automattic.com (Postfix) with ESMTPS id BCBAAA038F for ; Wed, 1 Oct 2025 19:56:16 +0000 (UTC) Received: by mail-il1-f198.google.com with SMTP id e9e14a558f8ab-4257626a814so3568675ab.0 for ; Wed, 01 Oct 2025 12:56:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759348576; x=1759953376; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bw0wN8Mtk9ZG+sD/uEJzgEjI3/P3rrpfL56TfcWQnMQ=; b=iwhF4hGiZXdBlbZnJ+mjDAgHjaRar4gNjwS777mavYmjJcB7qKCrYx10JpxLJP2ZmK kJxusU4rOKadEtt7mL5lSJPSUTOM1CeTpc2Pn9+WDdj5Ykcdc/R7M29HDyYf0c+/0IU0 uzifmnHeot3sHOdeGRWetU8nOq9Ux9rmcu47sjUt06VSi0L7SsU5BgKEmaZSb/UbtZRg qcS+cbSuLH4qptTZR35C/agIlu4p84Yz8Kv/CMScwQCNzwls90vUqGbdk0fpCo+G+jZC oXebZiGtoCnolQeD81xnEmRdAnMLKu5G9swv5Aqxrz2ziflXLojn2zyKKqEUFIwLE8X1 Zlyg== X-Gm-Message-State: AOJu0Yy/7h1czR3iJG4ORAViKYfaxJWsyAiPYgL7Qp/xqr/5/MGuEccW GASMhSBnn8WLPcgBCJcD8owRjhEixkKBa/eknbgM3GVHHEtl1rSSBdogoDa7Bch3V4v8kWrjuBH Awfgto8VWZA5XotjycTSpJeaMRF4iM1gv2pxDsgRzSf57bBW0mkQlYPZbyx0= X-Gm-Gg: ASbGnctjms00LJ/TZc9x6jssRe76rVBKSH7pyZKcmCRQy8IfXpknbN7tyYtbYs6Fwjn XV1Uh6g4JgLwZxT/La+cnm5QquZkQcLNix9MlXQSbKBrOB4ikwqkAyWcumbtgXpThtizC5ch4He PLn6p3tWo7N6lOsesAdkKsOSX1GblBLVfD0Qg64l5358ef6+rFNtRJcdFozT0WaaoazyX6Eqq8D xfiUPnVbBLHCYM3mF65w2kcSt5RcUanD9OAXpZPItJ6iPyZwl7TXSB/l1OQgq6iHCUqeetKlhcK IanD/WAK1gX92nK+Hfrv0DRys540XU/m+61NmNgVewL2VIPo+0Q1rLQTs51TJiH+VAyZo0w= X-Received: by 2002:a05:6e02:2148:b0:42d:88a3:2e4d with SMTP id e9e14a558f8ab-42d88a33136mr31736935ab.13.1759348576134; Wed, 01 Oct 2025 12:56:16 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE4ww3Y7CctiiiUSsR5W3GDjOOXc5MG2AlJIG9rYrBS1/WCg7jQjop9Ax+igUHK3uFGvaNc4A== X-Received: by 2002:a05:6e02:2148:b0:42d:88a3:2e4d with SMTP id e9e14a558f8ab-42d88a33136mr31736455ab.13.1759348575581; Wed, 01 Oct 2025 12:56:15 -0700 (PDT) Received: from smtpclient.apple ([216.21.168.53]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-57b5e9ea12esm138872173.8.2025.10.01.12.56.14 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Oct 2025 12:56:15 -0700 (PDT) X-Google-Original-From: Dennis Snell Message-ID: Content-Type: multipart/alternative; boundary="Apple-Mail=_C774EB55-5D0F-442E-94D9-FB804D806AB6" Precedence: list list-help: list-post: List-Id: x-ms-reactions: disallow Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81\)) Subject: Re: [PHP-DEV] [DISCUSSION] Validating regex pattern Date: Wed, 1 Oct 2025 14:56:04 -0500 In-Reply-To: Cc: PHP internals list To: Alexandre Daubois References: X-Mailer: Apple Mail (2.3826.700.81) From: dennis.snell@automattic.com (Dennis Snell) --Apple-Mail=_C774EB55-5D0F-442E-94D9-FB804D806AB6 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Oct 1, 2025, at 4:01=E2=80=AFAM, Alexandre Daubois = wrote: >=20 > Hi everyone, >=20 > I stumbled across the following issue, proposing to add a way to = validate regex. [1] >=20 > There is currently no way of knowing if a regex pattern is valid, = apart from writing clunky code. [2] Could you expand on how you define clunky here? I am curious, because I = validate PCRE patterns like this using the existing mechanisms. function is_valid_preg_pattern( $pattern ) { $is_valid =3D true; set_error_handler( function () use ( &$is_valid ) { $is_valid =3D = false; return true; }, E_WARNING ); preg_match( $pattern, '' ); restore_error_handler(); return $is_valid; } And this is based on the note in the man page for `preg_match()`=E2=80=A6 If the regex pattern passed does not compile to a valid regex, an = E_WARNING = is emitted. Given the possible error conditions, I thought this was comprehensive = and the only way for the `E_WARNING` to trigger when provided with the = empty string is if the pattern fails to compile. Given the concerns that = Juliette raised with regards to multiple regex engines, this one seems = like it should be universal too. If there=E2=80=99s something clunky about this, it would aid my = curiosity to learn. The `$errstr` parameter can also be inspected to = ensure that it starts with `preg_match()` if there=E2=80=99s a chance = any other warning could conflate with an unrecognized pattern. Caveat: I=E2=80=99m not aware of any performance implications for = calling `set_error_handler()` like this. >=20 > Two propositions emerged from the issue: either create a dedicated = "preg_validate()" function, or add a new flag to "filter_var()", namely = FILTER_VALIDATE_REGEX_PATTERN. >=20 > I would be in favor of the latter. The approach and implementation = would surely be simpler. I don't feel like we should do advanced error = management. Knowing if a pattern is valid or not would suffice for the = vast majority of cases. >=20 > I don't think the second approach would require an RFC. Christoph = thinks that this should at least be announced on the mailing list, so = here we are. >=20 > Looking forward to your feedback. >=20 > =E2=80=94 Alexandre Daubois >=20 > [1] https://github.com/php/php-src/issues/9289 > [2] = https://stackoverflow.com/questions/4440626/how-can-i-validate-regex Warmly, Dennis Snell --Apple-Mail=_C774EB55-5D0F-442E-94D9-FB804D806AB6 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On Oct 1, = 2025, at 4:01=E2=80=AFAM, Alexandre Daubois = <alex.daubois+php@gmail.com> wrote:

Hi = everyone,

I stumbled across the following issue, = proposing to add a way to validate regex. = [1]

There is currently no way of knowing if a = regex pattern is valid, apart from writing clunky code. = [2]

Could you expand = on how you define clunky here? I am curious, because I validate PCRE = patterns like this using the existing = mechanisms.

    function is_valid_preg_pattern( = $pattern ) {
        $is_valid =3D = true;

        = set_error_handler( function () use ( &$is_valid ) { $is_valid =3D = false; return true; }, E_WARNING );
        preg_match( = $pattern, '' );
        = restore_error_handler();

        return = $is_valid;
    }

And this is = based on the note in the man page for = `preg_match()`=E2=80=A6

  =   If = the regex pattern passed does not compile to a valid regex, = an E_WARNING is = emitted.

Given the possible error = conditions, I thought this was comprehensive and the only way for the = `E_WARNING` to trigger when provided with the empty string is if the = pattern fails to compile. Given the concerns that Juliette raised with = regards to multiple regex engines, this one seems like it should be = universal too.

If there=E2=80=99s something = clunky about this, it would aid my curiosity to learn. The `$errstr` = parameter can also be inspected to ensure that it starts with = `preg_match()` if there=E2=80=99s a chance any other warning could = conflate with an unrecognized pattern.

Caveat: I=E2=80=99m not aware of any = performance implications for calling `set_error_handler()` like = this.


Two propositions emerged from the issue: = either create a dedicated "preg_validate()" function, or add a new flag = to "filter_var()", namely = FILTER_VALIDATE_REGEX_PATTERN.

I would be in = favor of the latter. The approach and implementation would surely be = simpler. I don't feel like we should do advanced error management. = Knowing if a pattern is valid or not would suffice for the vast majority = of cases.

I don't think the second approach = would require an RFC. Christoph thinks that this should at least be = announced on the mailing list, so here we = are.

Looking forward to your = feedback.

=E2=80=94 Alexandre = Daubois



= --Apple-Mail=_C774EB55-5D0F-442E-94D9-FB804D806AB6--