Hi everyone,
I stumbled across the following issue, proposing to add a way to validate
regex. [1]
There is currently no way of knowing if a regex pattern is valid, apart
from writing clunky code. [2]
Two propositions emerged from the issue: either create a dedicated
"preg_validate()" function, or add a new flag to "filter_var()", namely
FILTER_VALIDATE_REGEX_PATTERN.
I would be in favor of the latter. The approach and implementation would
surely be simpler. I don't feel like we should do advanced error
management. Knowing if a pattern is valid or not would suffice for the vast
majority of cases.
I don't think the second approach would require an RFC. Christoph thinks
that this should at least be announced on the mailing list, so here we are.
Looking forward to your feedback.
— Alexandre Daubois
[1] https://github.com/php/php-src/issues/9289
[2] https://stackoverflow.com/questions/4440626/how-can-i-validate-regex
Am 01.10.2025 um 11:01 schrieb Alexandre Daubois alex.daubois+php@gmail.com:
There is currently no way of knowing if a regex pattern is valid, apart from writing clunky code. [2]
Two propositions emerged from the issue: either create a dedicated "preg_validate()" function, or add a new flag to "filter_var()", namely FILTER_VALIDATE_REGEX_PATTERN.
My concern would be that dynamically creating regex patterns has quite a lot of possible different foot guns and using something like preg_validate/filter_var to prevent warnings seems to not really solve the problem but give a false sense of security.
You can end up with a mostly working version which will only trigger the fail path later on depending on user input.
It boils down to: If you are not confident that you construct the pattern in a safe way then what would you do if a validation function returns false? You can notify the developer but that is already accomplished with the preg_* warning when an invalid pattern is given. Creating an error page for the user on a warning is also already possible. That's why i'm on the fence whether a validation function does more good or harm.
Regards,
- Chris
Hi,
My concern would be that dynamically creating regex patterns has quite a lot of possible different foot guns and using something like preg_validate/filter_var to prevent warnings seems to not really solve the problem but give a false sense of security.
The purpose I see is not to give a sense of security, but to give
quick feedback whether the pattern is valid or not.
You can end up with a mostly working version which will only trigger the fail path later on depending on user input.
I'm not sure to understand the connection here: validating the regex
pattern itself and matching the pattern with something are two
different things.
It boils down to: If you are not confident that you construct the pattern in a safe way then what would you do if a validation function returns false? You can notify the developer but that is already accomplished with the preg_* warning when an invalid pattern is given. Creating an error page for the user on a warning is also already possible. That's why i'm on the fence whether a validation function does more good or harm.
I don't understand how it could be harmful. Early validation is useful
when it comes to avoiding unnecessary operations if we can already be
sure that it will fail later for obvious reasons. For me, it falls
into the same category as email or URL validation in filter_var.
That's also why I think it would be more appropriate as a flag for
this function rather than a dedicated function.
— Alexandre Daubois
On Wed, Oct 1, 2025, 15:22 Alexandre Daubois alex.daubois+php@gmail.com
wrote:
It boils down to: If you are not confident that you construct the
pattern in a safe way then what would you do if a validation function
returns false? You can notify the developer but that is already
accomplished with the preg_* warning when an invalid pattern is given.
Creating an error page for the user on a warning is also already possible.
That's why i'm on the fence whether a validation function does more good or
harm.I don't understand how it could be harmful. Early validation is useful
when it comes to avoiding unnecessary operations if we can already be
sure that it will fail later for obvious reasons. For me, it falls
into the same category as email or URL validation in filter_var.
That's also why I think it would be more appropriate as a flag for
this function rather than a dedicated function.
Emails and URLs are commonly expected end user inputs. Regular expressions
are not, and that is almost always a bad idea.
A bad idea which would be encouraged by making it easy to implement.
I am generally in favor of adding niche functionality, but this one does
worry me.
Cheers,
Andrey.
Two propositions emerged from the issue: either create a dedicated
"preg_validate()" function, or add a new flag to "filter_var()",
namely FILTER_VALIDATE_REGEX_PATTERN.I would be in favor of the latter. The approach and implementation
would surely be simpler. I don't feel like we should do advanced error
management. Knowing if a pattern is valid or not would suffice for the
vast majority of cases.I don't think the second approach would require an RFC.
I'd love to see more robust ways to validate regexes, but I do not like
this proposal, as any solution involving the filter extension feels wrong.
Some background:
Historically, PHP supported three regex engines (POSIX, PCRE,
Oniguruma). The POSIX engine was dropped in PHP 7.0 and there is a draft
RFC to drop support for Oniguruma [1], however, that still means that at
this time PHP supports two different regex engines, which each have
their own criteria for when a regex is a valid pattern, and for
Oniguruma supports a multitude of regex dialects [2].
Involving an unrelated extension (filter), which may be unavailable (can
be disabled [3]), in the validation just complicates things.
It also makes the FILTER_VALIDATE_REGEX_PATTERN
flag highly ambiguous
as it is unclear against which engine/dialect the regex would be validated.
It is my opinion that any regex pattern validation should be done in the
same extension realm as the extension which will use the regex.
Maybe the error code flags returned via preg_last_error()
[4] should
be made more specific to allow for detecting when a regex function
failed due to an error in the regex.
Maybe the extensions should get a "throw on invalid regex" option,
either via an ini flag, a new function parameter or via an existing
function like mb_regex_set_options()
.
Maybe there should be a preg_validate()
function (and a
mb_ereg_validate()
function for that matter).
I'm not sure what the best solution is, but going with an illogical
solution just to try and avoid the RFC process is not the way to go IMO.
Smile,
Juliette
1: https://wiki.php.net/rfc/eol-oniguruma
2: https://www.php.net/manual/en/function.mb-regex-set-options.php
2: https://www.php.net/manual/en/filter.installation.php
3: https://www.php.net/manual/en/function.preg-last-error.php
Juliette Reinders Folmer php-internals_nospam@adviesenzo.nl hat am 01.10.2025 19:28 CEST geschrieben:
Two propositions emerged from the issue: either create a dedicated "preg_validate()" function, or add a new flag to "filter_var()", namely FILTER_VALIDATE_REGEX_PATTERN.
I would be in favor of the latter. The approach and implementation would surely be simpler. I don't feel like we should do advanced error management. Knowing if a pattern is valid or not would suffice for the vast majority of cases.
I don't think the second approach would require an RFC.
I'd love to see more robust ways to validate regexes, but I do not like this proposal, as any solution involving the filter extension feels wrong.
Some background:
Historically, PHP supported three regex engines (POSIX, PCRE, Oniguruma). The POSIX engine was dropped in PHP 7.0 and there is a draft RFC to drop support for Oniguruma [1], however, that still means that at this time PHP supports two different regex engines, which each have their own criteria for when a regex is a valid pattern, and for Oniguruma supports a multitude of regex dialects [2].Involving an unrelated extension (filter), which may be unavailable (can be disabled [3]), in the validation just complicates things.
It also makes theFILTER_VALIDATE_REGEX_PATTERN
flag highly ambiguous as it is unclear against which engine/dialect the regex would be validated.It is my opinion that any regex pattern validation should be done in the same extension realm as the extension which will use the regex.
Maybe the error code flags returned via
preg_last_error()
[4] should be made more specific to allow for detecting when a regex function failed due to an error in the regex.
Maybe the extensions should get a "throw on invalid regex" option, either via an ini flag, a new function parameter or via an existing function likemb_regex_set_options()
.
Maybe there should be apreg_validate()
function (and amb_ereg_validate()
function for that matter).I'm not sure what the best solution is, but going with an illogical solution just to try and avoid the RFC process is not the way to go IMO.
Smile,
Juliette1: https://wiki.php.net/rfc/eol-oniguruma
2: https://www.php.net/manual/en/function.mb-regex-set-options.php
2: https://www.php.net/manual/en/filter.installation.php
3: https://www.php.net/manual/en/function.preg-last-error.php https://www.php.net/manual/en/function.preg-last-error.php
currently we have:
@preg_match('/a[/', '');
echo preg_last_error_msg()
; // gives: Internal error
The real error would be "Compilation failed: missing terminating ] for character class at offset 2". So having a better error message would help.
JS has a RegExp class that can be combined with try-catch:
const re = new RegExp("ab+c", "i");
const re = new RegExp(/ab+c/, "i");
Go has:
r, err := regexp.Compile("p([a-z]+)ch")
Rust has:
let re = Regex::new(r"unclosed(");
So having a RegExp class in PHP would make sense to me.
Regards
Thomas
Hi,
Historically, PHP supported three regex engines (POSIX, PCRE, Oniguruma). The POSIX engine was dropped in PHP 7.0 and there is a draft RFC to drop support for Oniguruma [1], however, that still means that at this time PHP supports two different regex engines, which each have their own criteria for when a regex is a valid pattern, and for Oniguruma supports a multitude of regex dialects [2].
I wasn't aware of that. Thanks for the info!
It also makes the
FILTER_VALIDATE_REGEX_PATTERN
flag highly ambiguous as it is unclear against which engine/dialect the regex would be validated.
Indeed, the multi-engine "issue" makes it clear that it would be ambiguous.
Maybe the error code flags returned via
preg_last_error()
[4] should be made more specific to allow for detecting when a regex function failed due to an error in the regex.
Maybe the extensions should get a "throw on invalid regex" option, either via an ini flag, a new function parameter or via an existing function likemb_regex_set_options()
.
I like the first solution, I like the second one even more. It would
be aligned with JSON_THROW_ON_ERROR.
I'm not sure what the best solution is, but going with an illogical solution just to try and avoid the RFC process is not the way to go IMO.
The purpose isn't to avoid the RFC process, I'll write one if
necessary. I just want to gather opinions and deduct if it's necessary
:)
— Alexandre Daubois
It also makes the
FILTER_VALIDATE_REGEX_PATTERN
flag highly ambiguous as it is unclear against which engine/dialect the regex would be validated.Indeed, the multi-engine "issue" makes it clear that it would be ambiguous.
Maybe the error code flags returned via
preg_last_error()
[4] should be made more specific to allow for detecting when a regex function failed due to an error in the regex.
Maybe the extensions should get a "throw on invalid regex" option, either via an ini flag, a new function parameter or via an existing function likemb_regex_set_options()
.I like the first solution, I like the second one even more. It would
be aligned with JSON_THROW_ON_ERROR.
I completely agree with this and the previous reply from Juliette.
I'm familiar with the upstream PCRE2 library, and they have made
several changes to the regex rules/syntax within the past couple
years. Unless the validation is made within the same realm of the
engine, it will have to be validated by the engine itself if we want
the validation to be accurate to the rules supported by the engine.
A PREG_THROW_ON_ERROR
flag would be the best of both worlds:
- Provides a nice safe-guard against potentially invalid regexps and
surfaces them up as an exception. - Provides validation functionality with engine-provided error messages.
- We do not need to keep up with the regex engines because validation
is done by the engine. - Works similar to the
JSON_THROW_ON_ERROR
flag.
Hi
Am 2025-10-02 13:03, schrieb Ayesh Karunaratne:
I completely agree with this and the previous reply from Juliette.
I also fully agree with Juliette: This is something that needs to be
done in ext/pcre.
A
PREG_THROW_ON_ERROR
flag would be the best of both worlds:
- Provides a nice safe-guard against potentially invalid regexps and
surfaces them up as an exception.- Provides validation functionality with engine-provided error
messages.- We do not need to keep up with the regex engines because validation
is done by the engine.- Works similar to the
JSON_THROW_ON_ERROR
flag.
I however don't think that PREG_THROW_ON_ERROR is the best solution
here. A flag that effectively always needs to be passed is not good
API design and functions like preg_replace()
don't currently support
flags. While it would definitely increase the scope of an RFC, I second
Thomas' notion of providing a new (object-oriented) pcre API that would
also allow for passing modifiers independently of the expression itself,
avoiding the need for delimiters (and escaping them when dynamically
constructing a regex).
Best regards
Tim Düsterhus
Hi everyone,
I stumbled across the following issue, proposing to add a way to validate regex. [1]
There is currently no way of knowing if a regex pattern is valid, apart from writing clunky code. [2]
Could you expand on how you define clunky here? I am curious, because I validate PCRE patterns like this using the existing mechanisms.
function is_valid_preg_pattern( $pattern ) {
$is_valid = true;
set_error_handler( function () use ( &$is_valid ) { $is_valid = false; return true; }, `E_WARNING` );
preg_match( $pattern, '' );
`restore_error_handler()`;
return $is_valid;
}
And this is based on the note in the man page for preg_match()
…
If the regex pattern passed does not compile to a valid regex, an `E_WARNING` <dfile:///Users/dmsnell/Library/Application%20Support/Dash/DocSets/PHP/PHP.docset/Contents/Resources/Documents/www.php.net/manual/en/errorfunc.constants.html#constant.e-warning> is emitted.
Given the possible error conditions, I thought this was comprehensive and the only way for the E_WARNING
to trigger when provided with the empty string is if the pattern fails to compile. Given the concerns that Juliette raised with regards to multiple regex engines, this one seems like it should be universal too.
If there’s something clunky about this, it would aid my curiosity to learn. The $errstr
parameter can also be inspected to ensure that it starts with preg_match()
if there’s a chance any other warning could conflate with an unrecognized pattern.
Caveat: I’m not aware of any performance implications for calling set_error_handler()
like this.
Two propositions emerged from the issue: either create a dedicated "preg_validate()" function, or add a new flag to "filter_var()", namely FILTER_VALIDATE_REGEX_PATTERN.
I would be in favor of the latter. The approach and implementation would surely be simpler. I don't feel like we should do advanced error management. Knowing if a pattern is valid or not would suffice for the vast majority of cases.
I don't think the second approach would require an RFC. Christoph thinks that this should at least be announced on the mailing list, so here we are.
Looking forward to your feedback.
— Alexandre Daubois
[1] https://github.com/php/php-src/issues/9289
[2] https://stackoverflow.com/questions/4440626/how-can-i-validate-regex
Warmly,
Dennis Snell
Hi,
function is_valid_preg_pattern( $pattern ) { $is_valid = true; set_error_handler( function () use ( &$is_valid ) { $is_valid = false; return true; }, `E_WARNING` ); preg_match( $pattern, '' ); `restore_error_handler()`; return $is_valid; }
Setting and restoring the global error handler to validate a pattern
is exactly what I would qualify as clunky/hacky. Also this requires
internal knowledge on how this works, which feels very wrong for
something like that.
— Alexandre Daubois