Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:128763 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id 097871A00BC for ; Wed, 1 Oct 2025 18:36:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1759343685; bh=fqFhPVWMV71MO0ON/gliD7Vl38d3KAmJzAZRWM5zz0I=; h=Date:From:To:In-Reply-To:References:Subject:From; b=mWs4hpN+DTaOfTcFtily7i1IQktKNLCnURmggO+Mx7Tcy85v4NMUZEW9eeiSTPXwq gJDY7rAdElT+EuIJN8WSaQG2cKQEh+cPTrauS7SreulHB2Z4twRyCjktGt3EuZSkDi E78254EuiFKdn4KthsjS5Bot5dmxjA3nLOHgWdai7DdNTb35n2Xlmc25wH+e+2GHZy lMqUOpHPRdJkNT5vnMe34hanW/HfkJW+s0s1xYeCZezrvKNkm5vVvOHpjvHWEbV7d8 cpggCwKaUSQYX+je91tNKMsuOKZz/r6T8dclDmyjtq6xF4s436JKPBU9ze6bc/INPj 1HlSr/5zlIRSw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id E6998180055 for ; Wed, 1 Oct 2025 18:34:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.126.135]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 1 Oct 2025 18:34:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=thomasbley.de; s=s1-ionos; t=1759343748; x=1759948548; i=mails@thomasbley.de; bh=JVw7/QAnzIMNXqZIDGfRz+9HGr4huM77L0HDmByDC74=; h=X-UI-Sender-Class:Date:From:To:Message-ID:In-Reply-To:References: Subject:MIME-Version:Content-Type:cc:content-transfer-encoding: content-type:date:from:message-id:mime-version:reply-to:subject: to; b=dxyB9U+seyggM7iFZ7dGcnC9C9iKKGPiuL57+emqn3GjtWTBahhAqpMtO0MxgeGl bDQeaptB/JQrPT2GYy6YwGX5yFB58uOTCtok4KipobiZuxHjNwN0QR0/yVOQtCNbG BXYbXGe4kuO0D3Qff1EnD9SqsJNUQcVXQLayEA0rZtdvQvZe4iT57LyTLMrIWKqoE 3nfkrAHzJJHRWi+jucTDvRNbormVgdBmlefNsL8tCjoY/PX5Ct1xaKbdNyOi6ScQR md7FOodvEJj4G68vJz8+zmycL5sQ18a2Rl5OcrOXAsamhABYw/l2pfEI4whyz4L7P HqxYbveaaUFcMqr0ag== X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Received: from open-xchange-core-mw-default-28.open-xchange-core-mw-hazelcast-headless.open-xchange.svc.cluster.local ([10.73.156.112]) by mrelayeu.kundenserver.de (mreue009 [172.19.35.3]) with ESMTPSA (Nemesis) id 1Mk0FM-1ubXUi0j84-00kLSM; Wed, 01 Oct 2025 20:35:48 +0200 Date: Wed, 1 Oct 2025 20:35:47 +0200 (CEST) To: Juliette Reinders Folmer , internals@lists.php.net Message-ID: <930895821.141913.1759343747763@email.ionos.de> In-Reply-To: <68DD64BE.5080308@adviesenzo.nl> References: <68DD64BE.5080308@adviesenzo.nl> Subject: Re: [PHP-DEV] [DISCUSSION] Validating regex pattern Precedence: list list-help: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_141912_1679254418.1759343747760" X-Priority: 3 Importance: Normal X-Mailer: Open-Xchange Mailer v8.38.91 X-Originating-Client: open-xchange-appsuite X-Provags-ID: V03:K1:C26m0JTwm9HRISxb6SU3U1DYnNyYcRXnv7ptLwMkxV3P7Vm1Npk Pr1G0ulWITVJ78nlpQ3ICdRaPjZJGHppLAwjG3fhxNg4OWlRbtKmiGQFv3+0WuOLDJUsbKR 643/tAAytWjzNkVQNkLje5QIzUgaehrHDed29/cb9iY6QwP5C3pmmFtqBQ9aiQvv9UyMGNn lf/xvO2tjR7QjAZe8XoPw== UI-OutboundReport: notjunk:1;M01:P0:VZBLVH4Pjbw=;7nZ56tOzK54dLOSyBMqSE2LrzF5 neImarqxTWN9s7gei4CyPYQA3ZHfu552ASJA3Rvol3JJ4O1feZvS2vYIBhmVduh11m41p/3aZ o1ACo2Mp7ZhUVMgLYcwi1El3E8rMzMglmGb7X5kak2sAU5ZqIGnS5xdXuITs0yHPjHYdmrcug gJwtWuNwOqtLSPwCNJsEYZCu3ju9wzYeE7og+hzFYuBYLU7j2CvIto+7I3IKHhunCzNqizKdK +50l9LGeH9kwiLNg+F9h5gC4UUiD2j6AOML6cFyLIjHGwhxheVjC/e+GSIBaRqgrIoWBW4slw Ig9huyazBB5Igg9xYkcJsP2H/mYQbUZ5ezmHSmysWciDgx97Olm4iFQ1B7Pa6VAMAwY26e5G6 Tn6TPY3leWftkICmmpY51SV6n0KBPgl0gfLhHgBfo5fiWxCDNYEJxcBJyM2atJY5Td6EUejJH AkZq2NfsFey2dVT7cxXn0W8nghG3ztS/Cz9hFzvcbPo3b59Hs0liUep83tgjkmjsUhzKbeOpa 29kRKTNGAVU7vWTz0MwdpFdmmvIdbDITvVoyVFcLDq5AZ8uyP07nEJbvV1WkDynSBCYu/iaMl ZXoCGOIkWMl9ZiDg5yAe7G1nDZoTQb+O4G890QV/7A7A3sXT4p1F/CTh/pV+Q0EnbCcropAAl EwN3bXo7tOqEIZJjU5QRhAegnjqWhKeBhJa4rCIM7e82VjXB4EQP05QkyD+H6lV5s8TE4409J ilUBhEyIdVxmXlzyqMsc1iUxJFGuT1aJVZ3NjS4fSj86XvSvtU9f6BCAyC+AIqvcUFE3r72U3 NH/194TnnYgf2yVMzzWH/7hqHy31gBGkEHWHbgjb6g5e6OL/I+Ls0kzdgTTERpy4I2xwtu0gd OU9Usc5ZnOqmr77Gjf7uuJYL/NG9no2FHTbvON9tEsaggADlXhoyY5zZc79OgtglejSIfDwyo QFkG8QOXQN0MB5MKu8oHe3UPonwp07lHyJnfzoXlfKfQRp6Kr4MpB4JbWn8hY7IAnaWK26/Oe 1ZxLx84YFqnXNtcFyrIbU+fGjI/sWyVQLal78vBeNzdQV5p4qw035CBTA3vTZ5vI7kURvPuvF 1zCeoQqZ6KCW0BV0asGQHQO1raoVMHCjyb9KFEHNagj9QvX28WY4pbaagSBCHo62XrF1lglet FG0t+vId/JWgLbA30zPU0u9Vlddzw92XByR51RJ0xbUPDXWKlbzt7R8dEZ4fXo2hyGd9J4RXV BDujf/tx5BIPkUxoZHZydBAtq4/GR//rRGWmUSHZQWZ60UxDwJ5XHfC9p928w5kHqEXW+Dn/M /hi/dmigZpono0kWd25BLe6Rt49XtQPsBahXXA0WUgHFO3ZwxXwn0NtoPzkacouBUssgdTSdo fvBaYEBiA4Cq1DECL/TCLCS4vTDmOiOH2+tluMxn4PNuazE56GDsnai2aHf2tT06tcqPfZ0BU QT8d3z0xsbcbO0NlHHOCmzKmXQhycCGFOaBfcPIM0ey3LrvWr6HWrMCtMdZ4ZfTC+SqhsXdqb J810M8S4scM2Gel5mcwF+6+y9YqnSB3usIS2Htnzr+h0/DTHgKANBXPIJdKn/2sC88VjEP1WK cr26JkMggsR08+wY5K4GLq5I090f60pbYqOtuQ3yx/WGs0k8lGnzMN98ga6nOjrjDuGjNCtPC U0iD1WkkowmLObIsIPd1tU From: mails@thomasbley.de (Thomas Bley) ------=_Part_141912_1679254418.1759343747760 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable =20 > Juliette Reinders Folmer hat am 01.= 10.2025 19:28 CEST geschrieben: > =20 > =20 > On 1-10-2025 11:01, Alexandre Daubois wrote: >=20 > > Two propositions emerged from the issue: either create a dedicated "pr= eg_validate()" function, or add a new flag to "filter_var()", namely FILTE= R_VALIDATE_REGEX_PATTERN. > > =20 > > I would be in favor of the latter. The approach and implementation wou= ld surely be simpler. I don't feel like we should do advanced error manage= ment. Knowing if a pattern is valid or not would suffice for the vast majo= rity of cases. > > =20 > > I don't think the second approach would require an RFC. > >=20 > I'd love to see more robust ways to validate regexes, but I do not like = this proposal, as any solution involving the filter extension feels wrong. >=20 > Some background: > Historically, PHP supported three regex engines (POSIX, PCRE, Oniguruma)= . The POSIX engine was dropped in PHP 7.0 and there is a draft RFC to drop= support for Oniguruma [1], however, that still means that at this time PH= P supports two different regex engines, which each have their own criteria= for when a regex is a valid pattern, and for Oniguruma supports a multitu= de of regex dialects [2]. >=20 > Involving an unrelated extension (filter), which may be unavailable (can= be disabled [3]), in the validation just complicates things. > It also makes the `FILTER_VALIDATE_REGEX_PATTERN` flag highly ambiguous = as it is unclear against which engine/dialect the regex would be validated= . >=20 > It is my opinion that any regex pattern validation should be done in the= same extension realm as the extension which will use the regex. >=20 > Maybe the error code flags returned via `preg_last_error()` [4] should b= e made more specific to allow for detecting when a regex function failed d= ue to an error in the regex. > Maybe the extensions should get a "throw on invalid regex" option, eithe= r via an ini flag, a new function parameter or via an existing function li= ke `mb_regex_set_options()`. > Maybe there should be a `preg_validate()` function (and a `mb_ereg_valid= ate()` function for that matter). >=20 > I'm not sure what the best solution is, but going with an illogical solu= tion just to try and avoid the RFC process is not the way to go IMO. >=20 > Smile, > Juliette >=20 >=20 > 1: https://wiki.php.net/rfc/eol-oniguruma > 2: https://www.php.net/manual/en/function.mb-regex-set-options.php > 2: https://www.php.net/manual/en/filter.installation.php > 3: https://www.php.net/manual/en/function.preg-last-error.php https://ww= w.php.net/manual/en/function.preg-last-error.php >=20 currently we have: =20 @preg_match('/a[/', ''); echo preg_last_error_msg(); // gives: Internal error =20 The real error would be "Compilation failed: missing terminating ] for cha= racter class at offset 2". So having a better error message would help. =20 JS has a RegExp class that can be combined with try-catch: const re =3D new RegExp("ab+c", "i"); const re =3D new RegExp(/ab+c/, "i"); =20 Go has: r, err :=3D regexp.Compile("p([a-z]+)ch") =20 Rust has: let re =3D Regex::new(r"unclosed("); =20 So having a RegExp class in PHP would make sense to me. =20 Regards Thomas ------=_Part_141912_1679254418.1759343747760 MIME-Version: 1.0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
 
Juliette Reinders Folmer <php-internals_nospam@adviesenzo.nl> ha= t am 01.10.2025 19:28 CEST geschrieben:
 
 
On 1-10-2025 11:01, Alexandre Daubois wrote:
Two propositions emerged from the issue: either create a dedicated "p= reg_validate()" function, or add a new flag to "filter_var()", namely FILT= ER_VALIDATE_REGEX_PATTERN.=20
 
I would be in favor of the latter. The approach and implementation w= ould surely be simpler. I don't feel like we should do advanced error mana= gement. Knowing if a pattern is valid or not would suffice for the vast ma= jority of cases.
 
I don't think the second approach would require an RFC.

I'd love to see more robust ways to validate regexes, but I do not like= this proposal, as any solution involving the filter extension feels wrong= .

Some background:
Historically, PHP supported three regex engines (POSIX, PCRE, Oniguruma= ). The POSIX engine was dropped in PHP 7.0 and there is a draft RFC to dro= p support for Oniguruma [1], however, that still means that at this time P= HP supports two different regex engines, which each have their own criteri= a for when a regex is a valid pattern, and for Oniguruma supports a multit= ude of regex dialects [2].

Involving an unrelated extension (filter), which may be unavailable (ca= n be disabled [3]), in the validation just complicates things.
It also makes the `FILTER_VALIDATE_REGEX_PATTERN` flag highly ambiguous= as it is unclear against which engine/dialect the regex would be validate= d.

It is my opinion that any regex pattern validation should be done in th= e same extension realm as the extension which will use the regex.

Maybe the error code flags returned via `preg_last_error()` [4] should = be made more specific to allow for detecting when a regex function failed = due to an error in the regex.
Maybe the extensions should get a "throw on invalid regex" option, eith= er via an ini flag, a new function parameter or via an existing function l= ike `mb_regex_set_options()`.
Maybe there should be a `preg_validate()` function (and a `mb_ereg_vali= date()` function for that matter).

I'm not sure what the best solution is, but going with an illogical sol= ution just to try and avoid the RFC process is not the way to go IMO.

Smile,
Juliette


1: https://wiki.php.net/rfc/eol-oniguruma
2: https://www.php.net/manual/en/func= tion.mb-regex-set-options.php
2: https://www.php.net/manual/en/filter.install= ation.php
3: https://www.php.net/manual/en/function.preg-last-error.php=
currently we have:
 
@preg_match('/a[/', '');
echo preg_last_error_msg(); // gives: Internal error
 
The real error would be "Compilation failed: missing terminating ] for = character class at offset 2". So having a better error message would help.
 
JS has a RegExp class that can be combined with try-catch:
const re =3D new RegExp("ab+c", "i");
const re =3D new RegExp(/ab+c/, "i");
 
Go has:
r, err :=3D regexp.Compile("p([a-z]+)ch")
 
Rust has:
let re =3D Regex::new(r"unclosed(");
 
So having a RegExp class in PHP would make sense to me.
 
Regards
Thomas
------=_Part_141912_1679254418.1759343747760--