Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:128761 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id 6C3CE1A00BC for ; Wed, 1 Oct 2025 17:28:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1759339640; bh=pFA6gbr+NbEDhEt0T7bn6eGBjAIcO84zApCznGm0FPU=; h=Subject:To:References:From:Date:In-Reply-To:From; b=LX9aiAdlsSr76yLPINR+4O9OHLN7y34IAe+zuzogxSWnc1leaL+lHaXGav82uri9Z al68LrZqU+NnVR4aq0xCEIi5V7f0fqfpwJUzWQIb0kLrtZzb1jN7Rd4kvPPKGX6cyt 6S1o5tXVHwZXuxe6LxJW/BMwmin5dxRt5wktA7JrdGtoaB6vy/GZ2013eu817Wt0wy nhLw3jJP5jmjXBixMAKaVQrPVaaDQGGultTckWTHysij7KJSc4JadVynK4gSuaT5Zy 6P0sNJbKI0tCDqXmNaPqub/1kIrYJkfP0y+rOPXpZNn7SFScwiACmzUV6FDavkySMK QPQTsGyUWlNvQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id AE33D180082 for ; Wed, 1 Oct 2025 17:27:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: ** X-Spam-Status: No, score=2.3 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_50, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING, HTML_MESSAGE,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_SOFTFAIL autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from slategray.cherry.relay.mailchannels.net (slategray.cherry.relay.mailchannels.net [23.83.223.169]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 1 Oct 2025 17:27:18 +0000 (UTC) X-Sender-Id: a2hosting|x-authuser|juliette@adviesenzo.nl Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id A84206C142E for ; Wed, 01 Oct 2025 17:28:35 +0000 (UTC) Received: from nl1-ss105.a2hosting.com (100-114-136-88.trex-nlb.outbound.svc.cluster.local [100.114.136.88]) (Authenticated sender: a2hosting) by relay.mailchannels.net (Postfix) with ESMTPA id 41D2F6C14DE for ; Wed, 01 Oct 2025 17:28:34 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1759339714; a=rsa-sha256; cv=none; b=Kg2gg9gYwsStC1/H/zOfKhSfDjDtX1jCosskilfe+cKuZ5kt+TJwmVTzD1w4xt/cjAceWA QZ0j67CcJrs1+Z/a8QF6OO55Lzw2UgLNopekvCDr8/4HCBMGxuGoiw20Ykii3zzBqnTSO/ h3H3atZCmGtI0ZAqvgkWgQb9wDKJqtnfDoMvBioWiVW0Qaa3sFNVG/khI0bZF7UmFYYtAa Ji6nKI/aytiLIYoS+ePPS2dL9v7A2kAPiPZUGLij7k6KhJdO287OsfoVwSde3k/3dzB6xl gN9nJVH+q3jgOBj+olRvc4TwzsYc3zjNDgHDRoHSseu0xr2e7+b07qcSL15oVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1759339714; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NICCts4XqCNSJQeVkMnynEOPstFbSxbLXojPTW3Nqu0=; b=wNDCexyvFExtVXmJnT5HYZXK3pSYxz4sGoyJ92cGlel95lv4RBYK9FwO2aH2F5UfdIRX1G lXH8t8+HrW5WHMLxoBaGppKpTxNOa80btIFXEJ9Jh9bcXGBjLU+jDtTJdYv3nvnT27SDC/ DLH2ytvNLoggl2x2XleucIbUieYei+RrsmHOFeZrVt7T3/dZJSVbOyOFUO/ihAnTik5zD/ 4VQMEClY2w2IywRtzwqbqMKMxhRd67YROaV4tXsiM1J2bxbbPG5JAZfdw6OLs0Ve+4F3J6 SZ8S3+jVoh0lYUPx37lg+nNMa0CCEGX5CoRP0Tx2IVUBLxr7CNToisM331g+Tw== ARC-Authentication-Results: i=1; rspamd-5bff5b7675-6z8hs; auth=pass smtp.auth=a2hosting smtp.mailfrom=php-internals_nospam@adviesenzo.nl X-Sender-Id: a2hosting|x-authuser|juliette@adviesenzo.nl X-MC-Relay: Neutral X-MailChannels-SenderId: a2hosting|x-authuser|juliette@adviesenzo.nl X-MailChannels-Auth-Id: a2hosting X-Spill-Abaft: 2d7c30013ecfa3e0_1759339714768_361576298 X-MC-Loop-Signature: 1759339714768:3685544632 X-MC-Ingress-Time: 1759339714767 Received: from nl1-ss105.a2hosting.com (nl1-ss105.a2hosting.com [85.187.142.69]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.114.136.88 (trex/7.1.3); Wed, 01 Oct 2025 17:28:34 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=adviesenzo.nl; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:To:Subject:Sender:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=NICCts4XqCNSJQeVkMnynEOPstFbSxbLXojPTW3Nqu0=; b=irv21OS9t/gnHamCOr3VaWESL8 HdHhXj+q1dkTSmDnj/ahgs7yREHKZS96WnubrcAJD1IpHSG4KWng002uE+T4mYDBh9w3VGhMi3BPD CnOKSJTXX2gyKzr6Zfi1LNQohbhrhe7I76hFU1TPhoylU1z5n8iEIZTXcS7sxxTfGN5Q=; Received: from mailnull by nl1-ss105.a2hosting.com with spam-scanner (Exim 4.98.2) (envelope-from ) id 1v40cy-0000000EdRk-1i5V for internals@lists.php.net; Wed, 01 Oct 2025 19:28:32 +0200 X-ImunifyEmail-Filter-Info: SUVfVkxfUEJMX0VNQUlMXzAxIE1JTUVfVU5LTk9XTiBJRV9W TF9QQkx fQUNDT1VOVF8wMSBWRVJJTE9DS19DQiBGUk9NX0hBU19ETiBBU04gRl JPTV9FUV9FTlZGUk9NIEFSQ19OQSBNSU1FX1RSQUNFIFJDVkRfVExTX 0FMTCBSQ1BUX0NPVU5UX09ORSBSQ1ZEX0NPVU5UX09ORSBJRV9WTF9Q QkxfQUNDT1VOVF8yMCBSQ1ZEX1ZJQV9TTVRQX0FVVEggVE9fTUFUQ0h fRU5WUkNQVF9BTEwgSUVfVkxfUEJMX0RPTUFJTl8wMSBJRV9WTF9QQk xfQUNDT1VOVF8wNSBNSURfUkhTX01BVENIX0ZST00gQkFZRVNfSEFNI FRPX0ROX05PTkU= X-ImunifyEmail-Filter-Action: no action X-ImunifyEmail-Filter-Score: 0.83 X-ImunifyEmail-Filter-Version: 3.8.18/202509290838 Received: from [31.201.40.213] (port=58381 helo=[192.168.1.16]) by nl1-ss105.a2hosting.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1v40cy-0000000EdQw-311B for internals@lists.php.net; Wed, 01 Oct 2025 19:28:32 +0200 Subject: Re: [PHP-DEV] [DISCUSSION] Validating regex pattern To: internals@lists.php.net References: Message-ID: <68DD64BE.5080308@adviesenzo.nl> Date: Wed, 1 Oct 2025 19:28:30 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 Precedence: list list-help: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------090403070508010504000106" X-AuthUser: juliette@adviesenzo.nl From: php-internals_nospam@adviesenzo.nl (Juliette Reinders Folmer) This is a multi-part message in MIME format. --------------090403070508010504000106 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 1-10-2025 11:01, Alexandre Daubois wrote: > Two propositions emerged from the issue: either create a dedicated > "preg_validate()" function, or add a new flag to "filter_var()", > namely FILTER_VALIDATE_REGEX_PATTERN. > > I would be in favor of the latter. The approach and implementation > would surely be simpler. I don't feel like we should do advanced error > management. Knowing if a pattern is valid or not would suffice for the > vast majority of cases. > > I don't think the second approach would require an RFC. > I'd love to see more robust ways to validate regexes, but I do not like this proposal, as any solution involving the filter extension feels wrong. Some background: Historically, PHP supported three regex engines (POSIX, PCRE, Oniguruma). The POSIX engine was dropped in PHP 7.0 and there is a draft RFC to drop support for Oniguruma [1], however, that still means that at this time PHP supports two different regex engines, which each have their own criteria for when a regex is a valid pattern, and for Oniguruma supports a multitude of regex dialects [2]. Involving an unrelated extension (filter), which may be unavailable (can be disabled [3]), in the validation just complicates things. It also makes the `FILTER_VALIDATE_REGEX_PATTERN` flag highly ambiguous as it is unclear against which engine/dialect the regex would be validated. It is my opinion that any regex pattern validation should be done in the same extension realm as the extension which will use the regex. Maybe the error code flags returned via `preg_last_error()` [4] should be made more specific to allow for detecting when a regex function failed due to an error in the regex. Maybe the extensions should get a "throw on invalid regex" option, either via an ini flag, a new function parameter or via an existing function like `mb_regex_set_options()`. Maybe there should be a `preg_validate()` function (and a `mb_ereg_validate()` function for that matter). I'm not sure what the best solution is, but going with an illogical solution just to try and avoid the RFC process is not the way to go IMO. Smile, Juliette 1: https://wiki.php.net/rfc/eol-oniguruma 2: https://www.php.net/manual/en/function.mb-regex-set-options.php 2: https://www.php.net/manual/en/filter.installation.php 3: https://www.php.net/manual/en/function.preg-last-error.php --------------090403070508010504000106 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
On 1-10-2025 11:01, Alexandre Daubois wrote:
Two propositions emerged from the issue: either create a dedicated "preg_validate()" function, or add a new flag to "filter_var()", namely FILTER_VALIDATE_REGEX_PATTERN.

I would be in favor of the latter. The approach and implementation would surely be simpler. I don't feel like we should do advanced error management. Knowing if a pattern is valid or not would suffice for the vast majority of cases.

I don't think the second approach would require an RFC.


I'd love to see more robust ways to validate regexes, but I do not like this proposal, as any solution involving the filter extension feels wrong.

Some background:
Historically, PHP supported three regex engines (POSIX, PCRE, Oniguruma). The POSIX engine was dropped in PHP 7.0 and there is a draft RFC to drop support for Oniguruma [1], however, that still means that at this time PHP supports two different regex engines, which each have their own criteria for when a regex is a valid pattern, and for Oniguruma supports a multitude of regex dialects [2].

Involving an unrelated extension (filter), which may be unavailable (can be disabled [3]), in the validation just complicates things.
It also makes the `FILTER_VALIDATE_REGEX_PATTERN` flag highly ambiguous as it is unclear against which engine/dialect the regex would be validated.

It is my opinion that any regex pattern validation should be done in the same extension realm as the extension which will use the regex.

Maybe the error code flags returned via `preg_last_error()` [4] should be made more specific to allow for detecting when a regex function failed due to an error in the regex.
Maybe the extensions should get a "throw on invalid regex" option, either via an ini flag, a new function parameter or via an existing function like `mb_regex_set_options()`.
Maybe there should be a `preg_validate()` function (and a `mb_ereg_validate()` function for that matter).

I'm not sure what the best solution is, but going with an illogical solution just to try and avoid the RFC process is not the way to go IMO.

Smile,
Juliette


1: https://wiki.php.net/rfc/eol-oniguruma
2: https://www.php.net/manual/en/function.mb-regex-set-options.php
2: https://www.php.net/manual/en/filter.installation.php
3: https://www.php.net/manual/en/function.preg-last-error.php
--------------090403070508010504000106--