Hi Internals,
For quite some time now, PHP's sanitize filters have "Rustled My Jimmies".
These filters bother me because I can't really justify their existence. I
can understand that a few of them are sensible and may come in handy, but I
would like to talk about some of these in particular.
In PHP 8.1, we have deprecated FILTER_SANITIZE_STRING
which I deemed to be
a priority due to its confusing name and behaviour. The rest is slightly
less dangerous, but as was pointed out to me in a recent conversation with
a PHP developer, these filters are all very confusing.
I would like to have some opinions on the following filters. What do you
think we should do with them? Deprecate? Fix? Provide better documentation?
*FILTER_SANITIZE_ENCODED *- "URL-encode string, optionally strip or encode
special characters."
Now, what does that mean? PHP has two functions for URL encoding: urlencode
used for encoding query-string parts, and rawurlencode used for encoding
any other URL part (two different RFCs are followed by these functions).
Which of these RFCs is applied in this filter? Furthermore, the description
says that "special characters" can be stripped or encoded. Is one of these
actions the default and the other can be selected by a flag or are both
optional? What are these special characters? Are they special in the
context of URL? If so, why did we encode them first? If these are HTML
special characters (there's no single definition of special HTML chars),
then why does this filter encode them if the filter is for URL
sanitization? What does backtick have to do with any of this
(FILTER_FLAG_STRIP_BACKTICK)?
*FILTER_SANITIZE_ADD_SLASHES - "*Apply addslashes()
. (Available as of PHP
7.3.0)"
This filter was added as a replacement for magic_quotes filter. According
to PHP documentation, addslashes is supposed to be used when injecting PHP
variables into eval'd string. Real-life showed that this function is used
in a lot of places that have nothing to do with PHP's eval. I am not sure
if the sanitize filter is misused in a similar fashion, but judging from
the fact that it was meant as a replacement for magic_quotes, my guess is
that it's very likely still abused.
*FILTER_SANITIZE_EMAIL - "Remove all characters except letters, digits and
!#$%&'+-=?^_`{|}~@.[]."
Which RFC does this adhere to? It strips slashes and quoted parts, doesn't
allow IPv6 addresses and doesn't accept RFC 6530 email addresses. This
filter is ok for simple usage, but it isn't true to any known specification
AFAIK.
*FILTER_SANITIZE_SPECIAL_CHARS *- "HTML-encode '"<>& and characters with
ASCII value less than 32, optionally strip or encode other special
characters."
What's the intended purpose of this filter? "Special characters" are still
not clearly defined, but at least it's more clear than
the FILTER_SANITIZE_ENCODED
description. Same question about backticks
though: why? Why encode ASCII <32 chars?
*FILTER_SANITIZE_FULL_SPECIAL_CHARS *- "Equivalent to calling
htmlspecialchars()
with ENT_QUOTES
set. Encoding quotes can be disabled by
setting FILTER_FLAG_NO_ENCODE_QUOTES. Like htmlspecialchars()
, this filter
is aware of the default_charset and if a sequence of bytes is detected that
makes up an invalid character in the current character set then the entire
string is rejected resulting in a 0-length string. When using this filter
as a default filter, see the warning below about setting the default flags
to 0."
Not to be mistaken with FILTER_SANITIZE_SPECIAL_CHARS. As long as it's not
used with filter_input()
, it's the least problematic. We
have htmlspecialchars()
though, so how useful is this filter?
*FILTER_UNSAFE_RAW *- What makes it unsafe? Why isn't this just
called FILTER_RAW_STRING? If the value being filtered is something other
than a string, what will this filter return? Integers, floats, booleans and
nulls are converted to a string, Arrays and objects make the filter fail.
Let's quickly mention the filter flags.
The FILTER_FLAG_STRIP_LOW
flag will also remove tabs, carriage returns and
newlines as these are all less than 32 ASCII codes. When is this useful and
expected?
The FILTER_FLAG_ENCODE_LOW
flag "encodes" ASCII <32 codes presumably into
HTML entities, although that's not specified anywhere in the PHP manual.
The word HTML does not appear on the
https://www.php.net/manual/en/filter.filters.flags.php page. What do these
characters look like when presented by HTML? When is it ever useful to use
this flag?
FILTER_FLAG_ENCODE_AMP
& FILTER_FLAG_STRIP_BACKTICK
- why is this even a
thing?
Due to flags, FILTER_VALIDATE_EMAIL
will happily validate email addresses
that would be otherwise mangled by FILTER_SANITIZE_EMAIL.
These are just the things I found confusing and strange about the sanitize
filters. Let's try to put ourselves in the shoes of an average PHP
developer trying to comprehend these filters. It's quite easy to shoot
yourself in the foot if you try to use them. The PHP manual doesn't do a
good job of explaining them, but that's probably because they are not easy
to explain. I can't come up with good examples of when they should be used.
Regards,
Kamil
Hello Kamil,
I believe that PHP should not try to act as a “framework” that provides you
with ready solutions for such cases.
Being able to actually modify the default behaviour of some functions
through the ini .. is even scarier.
For 25 year writing in PHP I never relied on this “magic” for security:)
Regards,
Dimitar
Hi Internals,
For quite some time now, PHP's sanitize filters have "Rustled My Jimmies".
These filters bother me because I can't really justify their existence. I
can understand that a few of them are sensible and may come in handy, but I
would like to talk about some of these in particular.In PHP 8.1, we have deprecated
FILTER_SANITIZE_STRING
which I deemed to be
a priority due to its confusing name and behaviour. The rest is slightly
less dangerous, but as was pointed out to me in a recent conversation with
a PHP developer, these filters are all very confusing.I would like to have some opinions on the following filters. What do you
think we should do with them? Deprecate? Fix? Provide better documentation?
*FILTER_SANITIZE_ENCODED *- "URL-encode string, optionally strip or encode
special characters."
Now, what does that mean? PHP has two functions for URL encoding: urlencode
used for encoding query-string parts, and rawurlencode used for encoding
any other URL part (two different RFCs are followed by these functions).
Which of these RFCs is applied in this filter? Furthermore, the description
says that "special characters" can be stripped or encoded. Is one of these
actions the default and the other can be selected by a flag or are both
optional? What are these special characters? Are they special in the
context of URL? If so, why did we encode them first? If these are HTML
special characters (there's no single definition of special HTML chars),
then why does this filter encode them if the filter is for URL
sanitization? What does backtick have to do with any of this
(FILTER_FLAG_STRIP_BACKTICK)?*FILTER_SANITIZE_ADD_SLASHES - "*Apply
addslashes()
. (Available as of PHP
7.3.0)"
This filter was added as a replacement for magic_quotes filter. According
to PHP documentation, addslashes is supposed to be used when injecting PHP
variables into eval'd string. Real-life showed that this function is used
in a lot of places that have nothing to do with PHP's eval. I am not sure
if the sanitize filter is misused in a similar fashion, but judging from
the fact that it was meant as a replacement for magic_quotes, my guess is
that it's very likely still abused.*FILTER_SANITIZE_EMAIL - "Remove all characters except letters, digits and
!#$%&'+-=?^_`{|}~@.[]."
Which RFC does this adhere to? It strips slashes and quoted parts, doesn't
allow IPv6 addresses and doesn't accept RFC 6530 email addresses. This
filter is ok for simple usage, but it isn't true to any known specification
AFAIK.*FILTER_SANITIZE_SPECIAL_CHARS *- "HTML-encode '"<>& and characters with
ASCII value less than 32, optionally strip or encode other special
characters."
What's the intended purpose of this filter? "Special characters" are still
not clearly defined, but at least it's more clear than
theFILTER_SANITIZE_ENCODED
description. Same question about backticks
though: why? Why encode ASCII <32 chars?*FILTER_SANITIZE_FULL_SPECIAL_CHARS *- "Equivalent to calling
htmlspecialchars()
withENT_QUOTES
set. Encoding quotes can be disabled by
setting FILTER_FLAG_NO_ENCODE_QUOTES. Likehtmlspecialchars()
, this filter
is aware of the default_charset and if a sequence of bytes is detected that
makes up an invalid character in the current character set then the entire
string is rejected resulting in a 0-length string. When using this filter
as a default filter, see the warning below about setting the default flags
to 0."
Not to be mistaken with FILTER_SANITIZE_SPECIAL_CHARS. As long as it's not
used withfilter_input()
, it's the least problematic. We
havehtmlspecialchars()
though, so how useful is this filter?*FILTER_UNSAFE_RAW *- What makes it unsafe? Why isn't this just
called FILTER_RAW_STRING? If the value being filtered is something other
than a string, what will this filter return? Integers, floats, booleans and
nulls are converted to a string, Arrays and objects make the filter fail.
Let's quickly mention the filter flags.
The
FILTER_FLAG_STRIP_LOW
flag will also remove tabs, carriage returns and
newlines as these are all less than 32 ASCII codes. When is this useful and
expected?The
FILTER_FLAG_ENCODE_LOW
flag "encodes" ASCII <32 codes presumably into
HTML entities, although that's not specified anywhere in the PHP manual.
The word HTML does not appear on the
https://www.php.net/manual/en/filter.filters.flags.php page. What do these
characters look like when presented by HTML? When is it ever useful to use
this flag?
FILTER_FLAG_ENCODE_AMP
&FILTER_FLAG_STRIP_BACKTICK
- why is this even a
thing?Due to flags,
FILTER_VALIDATE_EMAIL
will happily validate email addresses
that would be otherwise mangled by FILTER_SANITIZE_EMAIL.These are just the things I found confusing and strange about the sanitize
filters. Let's try to put ourselves in the shoes of an average PHP
developer trying to comprehend these filters. It's quite easy to shoot
yourself in the foot if you try to use them. The PHP manual doesn't do a
good job of explaining them, but that's probably because they are not easy
to explain. I can't come up with good examples of when they should be used.Regards,
Kamil
All right if you are writing on PHP for 25 years, you noticed the PHP was
always about high-order web-focused functionality out-of-box. This is one
of basic benefits of PHP to other general-purpose languages where you can
write everything you want and you also have to write it since the language
itself is very basic. I'm for PHP to keep built-in solutions for most
common problems in the context of the web. Having passe ZCE exam and
writing just 15 years on php.
Hello Kamil,
I believe that PHP should not try to act as a “framework” that provides you
with ready solutions for such cases.Being able to actually modify the default behaviour of some functions
through the ini .. is even scarier.For 25 year writing in PHP I never relied on this “magic” for security:)
Regards,
DimitarHi Internals,
For quite some time now, PHP's sanitize filters have "Rustled My
Jimmies".
These filters bother me because I can't really justify their existence. I
can understand that a few of them are sensible and may come in handy,
but I
would like to talk about some of these in particular.In PHP 8.1, we have deprecated
FILTER_SANITIZE_STRING
which I deemed to
be
a priority due to its confusing name and behaviour. The rest is slightly
less dangerous, but as was pointed out to me in a recent conversation
with
a PHP developer, these filters are all very confusing.I would like to have some opinions on the following filters. What do you
think we should do with them? Deprecate? Fix? Provide better
documentation?
*FILTER_SANITIZE_ENCODED *- "URL-encode string, optionally strip or
encode
special characters."
Now, what does that mean? PHP has two functions for URL encoding:
urlencode
used for encoding query-string parts, and rawurlencode used for encoding
any other URL part (two different RFCs are followed by these functions).
Which of these RFCs is applied in this filter? Furthermore, the
description
says that "special characters" can be stripped or encoded. Is one of
these
actions the default and the other can be selected by a flag or are both
optional? What are these special characters? Are they special in the
context of URL? If so, why did we encode them first? If these are HTML
special characters (there's no single definition of special HTML chars),
then why does this filter encode them if the filter is for URL
sanitization? What does backtick have to do with any of this
(FILTER_FLAG_STRIP_BACKTICK)?*FILTER_SANITIZE_ADD_SLASHES - "*Apply
addslashes()
. (Available as of PHP
7.3.0)"
This filter was added as a replacement for magic_quotes filter. According
to PHP documentation, addslashes is supposed to be used when injecting
PHP
variables into eval'd string. Real-life showed that this function is used
in a lot of places that have nothing to do with PHP's eval. I am not sure
if the sanitize filter is misused in a similar fashion, but judging from
the fact that it was meant as a replacement for magic_quotes, my guess is
that it's very likely still abused.*FILTER_SANITIZE_EMAIL - "Remove all characters except letters, digits
and
!#$%&'+-=?^_`{|}~@.[]."
Which RFC does this adhere to? It strips slashes and quoted parts,
doesn't
allow IPv6 addresses and doesn't accept RFC 6530 email addresses. This
filter is ok for simple usage, but it isn't true to any known
specification
AFAIK.*FILTER_SANITIZE_SPECIAL_CHARS *- "HTML-encode '"<>& and characters with
ASCII value less than 32, optionally strip or encode other special
characters."
What's the intended purpose of this filter? "Special characters" are
still
not clearly defined, but at least it's more clear than
theFILTER_SANITIZE_ENCODED
description. Same question about backticks
though: why? Why encode ASCII <32 chars?*FILTER_SANITIZE_FULL_SPECIAL_CHARS *- "Equivalent to calling
htmlspecialchars()
withENT_QUOTES
set. Encoding quotes can be disabled
by
setting FILTER_FLAG_NO_ENCODE_QUOTES. Likehtmlspecialchars()
, this
filter
is aware of the default_charset and if a sequence of bytes is detected
that
makes up an invalid character in the current character set then the
entire
string is rejected resulting in a 0-length string. When using this filter
as a default filter, see the warning below about setting the default
flags
to 0."
Not to be mistaken with FILTER_SANITIZE_SPECIAL_CHARS. As long as it's
not
used withfilter_input()
, it's the least problematic. We
havehtmlspecialchars()
though, so how useful is this filter?*FILTER_UNSAFE_RAW *- What makes it unsafe? Why isn't this just
called FILTER_RAW_STRING? If the value being filtered is something other
than a string, what will this filter return? Integers, floats, booleans
and
nulls are converted to a string, Arrays and objects make the filter fail.
Let's quickly mention the filter flags.
The
FILTER_FLAG_STRIP_LOW
flag will also remove tabs, carriage returns
and
newlines as these are all less than 32 ASCII codes. When is this useful
and
expected?The
FILTER_FLAG_ENCODE_LOW
flag "encodes" ASCII <32 codes presumably into
HTML entities, although that's not specified anywhere in the PHP manual.
The word HTML does not appear on the
https://www.php.net/manual/en/filter.filters.flags.php page. What do
these
characters look like when presented by HTML? When is it ever useful to
use
this flag?
FILTER_FLAG_ENCODE_AMP
&FILTER_FLAG_STRIP_BACKTICK
- why is this even a
thing?Due to flags,
FILTER_VALIDATE_EMAIL
will happily validate email addresses
that would be otherwise mangled by FILTER_SANITIZE_EMAIL.These are just the things I found confusing and strange about the
sanitize
filters. Let's try to put ourselves in the shoes of an average PHP
developer trying to comprehend these filters. It's quite easy to shoot
yourself in the foot if you try to use them. The PHP manual doesn't do a
good job of explaining them, but that's probably because they are not
easy
to explain. I can't come up with good examples of when they should be
used.Regards,
Kamil
Hello Vasilii,
It’s okay to have different opinion I hope.
You are missing an important point here - beside my comments, the current
way this is developed brings confusion.
It would be great if you share your experience on this matter.
Regards,
Dimitar
On Sun, 2 Oct 2022 at 9:31, Vasilii Shpilchin vasilii.b.shpilchin@gmail.com
wrote:
All right if you are writing on PHP for 25 years, you noticed the PHP was
always about high-order web-focused functionality out-of-box. This is one
of basic benefits of PHP to other general-purpose languages where you can
write everything you want and you also have to write it since the language
itself is very basic. I'm for PHP to keep built-in solutions for most
common problems in the context of the web. Having passe ZCE exam and
writing just 15 years on php.Hello Kamil,
I believe that PHP should not try to act as a “framework” that provides
you
with ready solutions for such cases.Being able to actually modify the default behaviour of some functions
through the ini .. is even scarier.For 25 year writing in PHP I never relied on this “magic” for security:)
Regards,
DimitarHi Internals,
For quite some time now, PHP's sanitize filters have "Rustled My
Jimmies".
These filters bother me because I can't really justify their existence.
I
can understand that a few of them are sensible and may come in handy,
but I
would like to talk about some of these in particular.In PHP 8.1, we have deprecated
FILTER_SANITIZE_STRING
which I deemed to
be
a priority due to its confusing name and behaviour. The rest is slightly
less dangerous, but as was pointed out to me in a recent conversation
with
a PHP developer, these filters are all very confusing.I would like to have some opinions on the following filters. What do you
think we should do with them? Deprecate? Fix? Provide better
documentation?
*FILTER_SANITIZE_ENCODED *- "URL-encode string, optionally strip or
encode
special characters."
Now, what does that mean? PHP has two functions for URL encoding:
urlencode
used for encoding query-string parts, and rawurlencode used for encoding
any other URL part (two different RFCs are followed by these functions).
Which of these RFCs is applied in this filter? Furthermore, the
description
says that "special characters" can be stripped or encoded. Is one of
these
actions the default and the other can be selected by a flag or are both
optional? What are these special characters? Are they special in the
context of URL? If so, why did we encode them first? If these are HTML
special characters (there's no single definition of special HTML chars),
then why does this filter encode them if the filter is for URL
sanitization? What does backtick have to do with any of this
(FILTER_FLAG_STRIP_BACKTICK)?*FILTER_SANITIZE_ADD_SLASHES - "*Apply
addslashes()
. (Available as of
PHP
7.3.0)"
This filter was added as a replacement for magic_quotes filter.
According
to PHP documentation, addslashes is supposed to be used when injecting
PHP
variables into eval'd string. Real-life showed that this function is
used
in a lot of places that have nothing to do with PHP's eval. I am not
sure
if the sanitize filter is misused in a similar fashion, but judging from
the fact that it was meant as a replacement for magic_quotes, my guess
is
that it's very likely still abused.*FILTER_SANITIZE_EMAIL - "Remove all characters except letters, digits
and
!#$%&'+-=?^_`{|}~@.[]."
Which RFC does this adhere to? It strips slashes and quoted parts,
doesn't
allow IPv6 addresses and doesn't accept RFC 6530 email addresses. This
filter is ok for simple usage, but it isn't true to any known
specification
AFAIK.*FILTER_SANITIZE_SPECIAL_CHARS *- "HTML-encode '"<>& and characters with
ASCII value less than 32, optionally strip or encode other special
characters."
What's the intended purpose of this filter? "Special characters" are
still
not clearly defined, but at least it's more clear than
theFILTER_SANITIZE_ENCODED
description. Same question about backticks
though: why? Why encode ASCII <32 chars?*FILTER_SANITIZE_FULL_SPECIAL_CHARS *- "Equivalent to calling
htmlspecialchars()
withENT_QUOTES
set. Encoding quotes can be disabled
by
setting FILTER_FLAG_NO_ENCODE_QUOTES. Likehtmlspecialchars()
, this
filter
is aware of the default_charset and if a sequence of bytes is detected
that
makes up an invalid character in the current character set then the
entire
string is rejected resulting in a 0-length string. When using this
filter
as a default filter, see the warning below about setting the default
flags
to 0."
Not to be mistaken with FILTER_SANITIZE_SPECIAL_CHARS. As long as it's
not
used withfilter_input()
, it's the least problematic. We
havehtmlspecialchars()
though, so how useful is this filter?*FILTER_UNSAFE_RAW *- What makes it unsafe? Why isn't this just
called FILTER_RAW_STRING? If the value being filtered is something other
than a string, what will this filter return? Integers, floats, booleans
and
nulls are converted to a string, Arrays and objects make the filter
fail.
Let's quickly mention the filter flags.
The
FILTER_FLAG_STRIP_LOW
flag will also remove tabs, carriage returns
and
newlines as these are all less than 32 ASCII codes. When is this useful
and
expected?The
FILTER_FLAG_ENCODE_LOW
flag "encodes" ASCII <32 codes presumably
into
HTML entities, although that's not specified anywhere in the PHP manual.
The word HTML does not appear on the
https://www.php.net/manual/en/filter.filters.flags.php page. What do
these
characters look like when presented by HTML? When is it ever useful to
use
this flag?
FILTER_FLAG_ENCODE_AMP
&FILTER_FLAG_STRIP_BACKTICK
- why is this even a
thing?Due to flags,
FILTER_VALIDATE_EMAIL
will happily validate email
addresses
that would be otherwise mangled by FILTER_SANITIZE_EMAIL.These are just the things I found confusing and strange about the
sanitize
filters. Let's try to put ourselves in the shoes of an average PHP
developer trying to comprehend these filters. It's quite easy to shoot
yourself in the foot if you try to use them. The PHP manual doesn't do a
good job of explaining them, but that's probably because they are not
easy
to explain. I can't come up with good examples of when they should be
used.Regards,
Kamil
Hi Internals,
For quite some time now, PHP's sanitize filters have "Rustled My Jimmies".
These filters bother me because I can't really justify their existence. I
can understand that a few of them are sensible and may come in handy, but I
would like to talk about some of these in particular.In PHP 8.1, we have deprecated
FILTER_SANITIZE_STRING
which I deemed to be
a priority due to its confusing name and behaviour. The rest is slightly
less dangerous, but as was pointed out to me in a recent conversation with
a PHP developer, these filters are all very confusing.I would like to have some opinions on the following filters. What do you
think we should do with them? Deprecate? Fix? Provide better documentation?
*FILTER_SANITIZE_ENCODED *- "URL-encode string, optionally strip or encode
special characters."
Now, what does that mean? PHP has two functions for URL encoding: urlencode
used for encoding query-string parts, and rawurlencode used for encoding
any other URL part (two different RFCs are followed by these functions).
Which of these RFCs is applied in this filter? Furthermore, the description
says that "special characters" can be stripped or encoded. Is one of these
actions the default and the other can be selected by a flag or are both
optional? What are these special characters? Are they special in the
context of URL? If so, why did we encode them first? If these are HTML
special characters (there's no single definition of special HTML chars),
then why does this filter encode them if the filter is for URL
sanitization? What does backtick have to do with any of this
(FILTER_FLAG_STRIP_BACKTICK)?*FILTER_SANITIZE_ADD_SLASHES - "*Apply
addslashes()
. (Available as of PHP
7.3.0)"
This filter was added as a replacement for magic_quotes filter. According
to PHP documentation, addslashes is supposed to be used when injecting PHP
variables into eval'd string. Real-life showed that this function is used
in a lot of places that have nothing to do with PHP's eval. I am not sure
if the sanitize filter is misused in a similar fashion, but judging from
the fact that it was meant as a replacement for magic_quotes, my guess is
that it's very likely still abused.*FILTER_SANITIZE_EMAIL - "Remove all characters except letters, digits and
!#$%&'+-=?^_`{|}~@.[]."
Which RFC does this adhere to? It strips slashes and quoted parts, doesn't
allow IPv6 addresses and doesn't accept RFC 6530 email addresses. This
filter is ok for simple usage, but it isn't true to any known specification
AFAIK.*FILTER_SANITIZE_SPECIAL_CHARS *- "HTML-encode '"<>& and characters with
ASCII value less than 32, optionally strip or encode other special
characters."
What's the intended purpose of this filter? "Special characters" are still
not clearly defined, but at least it's more clear than
theFILTER_SANITIZE_ENCODED
description. Same question about backticks
though: why? Why encode ASCII <32 chars?*FILTER_SANITIZE_FULL_SPECIAL_CHARS *- "Equivalent to calling
htmlspecialchars()
withENT_QUOTES
set. Encoding quotes can be disabled by
setting FILTER_FLAG_NO_ENCODE_QUOTES. Likehtmlspecialchars()
, this filter
is aware of the default_charset and if a sequence of bytes is detected that
makes up an invalid character in the current character set then the entire
string is rejected resulting in a 0-length string. When using this filter
as a default filter, see the warning below about setting the default flags
to 0."
Not to be mistaken with FILTER_SANITIZE_SPECIAL_CHARS. As long as it's not
used withfilter_input()
, it's the least problematic. We
havehtmlspecialchars()
though, so how useful is this filter?*FILTER_UNSAFE_RAW *- What makes it unsafe? Why isn't this just
called FILTER_RAW_STRING? If the value being filtered is something other
than a string, what will this filter return? Integers, floats, booleans and
nulls are converted to a string, Arrays and objects make the filter fail.
Let's quickly mention the filter flags.
The
FILTER_FLAG_STRIP_LOW
flag will also remove tabs, carriage returns and
newlines as these are all less than 32 ASCII codes. When is this useful and
expected?The
FILTER_FLAG_ENCODE_LOW
flag "encodes" ASCII <32 codes presumably into
HTML entities, although that's not specified anywhere in the PHP manual.
The word HTML does not appear on the
https://www.php.net/manual/en/filter.filters.flags.php page. What do these
characters look like when presented by HTML? When is it ever useful to use
this flag?
FILTER_FLAG_ENCODE_AMP
&FILTER_FLAG_STRIP_BACKTICK
- why is this even a
thing?Due to flags,
FILTER_VALIDATE_EMAIL
will happily validate email addresses
that would be otherwise mangled by FILTER_SANITIZE_EMAIL.These are just the things I found confusing and strange about the sanitize
filters. Let's try to put ourselves in the shoes of an average PHP
developer trying to comprehend these filters. It's quite easy to shoot
yourself in the foot if you try to use them. The PHP manual doesn't do a
good job of explaining them, but that's probably because they are not easy
to explain. I can't come up with good examples of when they should be used.Regards,
Kamil
The filter extension has always been a stillborn mess. Its API is an absolute disaster and, as you note, its functionality is unclear at best, misleading at worst. Frankly it's worse than SPL.
I'd be entirely on board with jettisoning the entire thing, but baring that, ripping out large swaths of it that are misleading suits me fine.
--Larry Garfield
FILTER_SANITIZE_EMAIL should burn. If you have a bad email address, i can't
imagine the correct solution is to remove characters until it becomes
valid, short of a trim()
Hi Internals,
For quite some time now, PHP's sanitize filters have "Rustled My
Jimmies".
These filters bother me because I can't really justify their existence. I
can understand that a few of them are sensible and may come in handy,
but I
would like to talk about some of these in particular.In PHP 8.1, we have deprecated
FILTER_SANITIZE_STRING
which I deemed to
be
a priority due to its confusing name and behaviour. The rest is slightly
less dangerous, but as was pointed out to me in a recent conversation
with
a PHP developer, these filters are all very confusing.I would like to have some opinions on the following filters. What do you
think we should do with them? Deprecate? Fix? Provide better
documentation?
*FILTER_SANITIZE_ENCODED *- "URL-encode string, optionally strip or
encode
special characters."
Now, what does that mean? PHP has two functions for URL encoding:
urlencode
used for encoding query-string parts, and rawurlencode used for encoding
any other URL part (two different RFCs are followed by these functions).
Which of these RFCs is applied in this filter? Furthermore, the
description
says that "special characters" can be stripped or encoded. Is one of
these
actions the default and the other can be selected by a flag or are both
optional? What are these special characters? Are they special in the
context of URL? If so, why did we encode them first? If these are HTML
special characters (there's no single definition of special HTML chars),
then why does this filter encode them if the filter is for URL
sanitization? What does backtick have to do with any of this
(FILTER_FLAG_STRIP_BACKTICK)?*FILTER_SANITIZE_ADD_SLASHES - "*Apply
addslashes()
. (Available as of PHP
7.3.0)"
This filter was added as a replacement for magic_quotes filter. According
to PHP documentation, addslashes is supposed to be used when injecting
PHP
variables into eval'd string. Real-life showed that this function is used
in a lot of places that have nothing to do with PHP's eval. I am not sure
if the sanitize filter is misused in a similar fashion, but judging from
the fact that it was meant as a replacement for magic_quotes, my guess is
that it's very likely still abused.*FILTER_SANITIZE_EMAIL - "Remove all characters except letters, digits
and
!#$%&'+-=?^_`{|}~@.[]."
Which RFC does this adhere to? It strips slashes and quoted parts,
doesn't
allow IPv6 addresses and doesn't accept RFC 6530 email addresses. This
filter is ok for simple usage, but it isn't true to any known
specification
AFAIK.*FILTER_SANITIZE_SPECIAL_CHARS *- "HTML-encode '"<>& and characters with
ASCII value less than 32, optionally strip or encode other special
characters."
What's the intended purpose of this filter? "Special characters" are
still
not clearly defined, but at least it's more clear than
theFILTER_SANITIZE_ENCODED
description. Same question about backticks
though: why? Why encode ASCII <32 chars?*FILTER_SANITIZE_FULL_SPECIAL_CHARS *- "Equivalent to calling
htmlspecialchars()
withENT_QUOTES
set. Encoding quotes can be disabled
by
setting FILTER_FLAG_NO_ENCODE_QUOTES. Likehtmlspecialchars()
, this
filter
is aware of the default_charset and if a sequence of bytes is detected
that
makes up an invalid character in the current character set then the
entire
string is rejected resulting in a 0-length string. When using this filter
as a default filter, see the warning below about setting the default
flags
to 0."
Not to be mistaken with FILTER_SANITIZE_SPECIAL_CHARS. As long as it's
not
used withfilter_input()
, it's the least problematic. We
havehtmlspecialchars()
though, so how useful is this filter?*FILTER_UNSAFE_RAW *- What makes it unsafe? Why isn't this just
called FILTER_RAW_STRING? If the value being filtered is something other
than a string, what will this filter return? Integers, floats, booleans
and
nulls are converted to a string, Arrays and objects make the filter fail.
Let's quickly mention the filter flags.
The
FILTER_FLAG_STRIP_LOW
flag will also remove tabs, carriage returns
and
newlines as these are all less than 32 ASCII codes. When is this useful
and
expected?The
FILTER_FLAG_ENCODE_LOW
flag "encodes" ASCII <32 codes presumably into
HTML entities, although that's not specified anywhere in the PHP manual.
The word HTML does not appear on the
https://www.php.net/manual/en/filter.filters.flags.php page. What do
these
characters look like when presented by HTML? When is it ever useful to
use
this flag?
FILTER_FLAG_ENCODE_AMP
&FILTER_FLAG_STRIP_BACKTICK
- why is this even a
thing?Due to flags,
FILTER_VALIDATE_EMAIL
will happily validate email addresses
that would be otherwise mangled by FILTER_SANITIZE_EMAIL.These are just the things I found confusing and strange about the
sanitize
filters. Let's try to put ourselves in the shoes of an average PHP
developer trying to comprehend these filters. It's quite easy to shoot
yourself in the foot if you try to use them. The PHP manual doesn't do a
good job of explaining them, but that's probably because they are not
easy
to explain. I can't come up with good examples of when they should be
used.Regards,
KamilThe filter extension has always been a stillborn mess. Its API is an
absolute disaster and, as you note, its functionality is unclear at best,
misleading at worst. Frankly it's worse than SPL.I'd be entirely on board with jettisoning the entire thing, but baring
that, ripping out large swaths of it that are misleading suits me fine.--Larry Garfield
--
To unsubscribe, visit: https://www.php.net/unsub.php
On Sun, Oct 2, 2022 at 4:10 PM Larry Garfield larry@garfieldtech.com
wrote:
The filter extension has always been a stillborn mess. Its API is an
absolute disaster and, as you note, its functionality is unclear at best,
misleading at worst. Frankly it's worse than SPL.I'd be entirely on board with jettisoning the entire thing, but baring
that, ripping out large swaths of it that are misleading suits me fine.
The whole thing is seriously grim. Looking at the documentation for
filter_var for example, look at what it says for the third parameter,
$options
Associative array of options or bitwise disjunction of flags. If filter
accepts options, flags can be provided in "flags" field of array. For the
"callback" filter, callable type should be passed.
At a glance, I think all the examples mentioned in this thread have better
existing alternatives already in core and could just be deprecated then
removed. But it's worth asking, is that what we're talking about here, or
is there a suggestion of replacing the filter API with a more modern,
object API?
пн, 3 окт. 2022 г., 03:18 David Gebler davidgebler@gmail.com:
At a glance, I think all the examples mentioned in this thread have better
existing alternatives already in core and could just be deprecated then
removed. But it's worth asking, is that what we're talking about here, or
is there a suggestion of replacing the filter API with a more modern,
object API?
Is there a compelling need to have this in the core, as opposed to Composer
packages? The ecosystem has changed a lot since the original function was
introduced.
пн, 3 окт. 2022 г., 03:18 David Gebler davidgebler@gmail.com:
At a glance, I think all the examples mentioned in this thread have better
existing alternatives already in core and could just be deprecated then
removed. But it's worth asking, is that what we're talking about here, or
is there a suggestion of replacing the filter API with a more modern,
object API?Is there a compelling need to have this in the core, as opposed to Composer
packages? The ecosystem has changed a lot since the original function was
introduced.
Quite the opposite, in my opinion - there are compelling reasons not to have this in core.
It turns out that making a universal validation and sanitisation library is really hard, and breaking changes and diverging needs are pretty much guaranteed. That's pretty much the worst case for something distributed with the language, and exactly what Composer excels at.
The only thing that does belong in core are narrowly targeted low-level functions that someone might use to build such a library. Certainly not some huge OO monster reimplementing the whole of ext/filter and making a whole bunch of new mistakes.
Regards,
--
Rowan Tommins
[IMSoP]
My 2 cents on this.
We should keep what is web related IMO. It does not make any sense to
take things out, that later everyone will write by its own, or end up
using a 3rd party package.
PHP should have what is web related already to be use.
Another different thing is the naming, the implementation code, etc.
An RFC documenting each case would be very helpful, to centralize the
ideas on each case, instead of scrolling the mailing list.
Cheers.
Is there a compelling need to have this in the core, as opposed to
Composer packages? The ecosystem has changed a lot since the original
function was introduced.
I don't know that there is, I suspect the answer is probably not and
sanitization and validation is probably better left to userland code. The
only argument I can offer as devil's advocate is that certain validations
or transformations will be faster in core than in library scripts. I would
wager the most common implementation of such userland libraries today are
heavily reliant on preg_* functions so having some fast, low level baseline
in core for common tasks in this category might still make sense.
While we're on the topic, can I bring up FILTER_SANITIZE_NUMBER_FLOAT? Why
is the default behaviour of FILTER_SANITIZE_NUMBER_FLOAT
the same as
FILTER_SANITIZE_NUMBER_INT
unless you add extra flags to permit fractions?
Why is the constant name FILTER_SANITIZE_NUMBER_FLOAT
but its counterpart
for validation is FILTER_VALIDATE_FLOAT
(no NUMBER_)? Why does validating a
float return a float but sanitizing a float return a string?
What about FILTER_VALIDATE_EMAIL
which is notorious for being next to
useless?
Seems to me like there could at the very least be a plausible case for some
better to_float(), to_int(), is_valid_email() etc. type functions in core
to replace some of the filter API.
I believe we are still dIscussing about the sanitizing filters only. No
doubt the filter API in general should be kept in the core as it provides
functional access to input variables with the filter_input()
function. The
filter_input()
is the only alternative to accessing superglobal arrays
directly. I prefer to use them rather than userland helpers and facades
which may work differently to each other. If you wanted to get back to
superglobal arrays when coding without a framework in PHP 8.3 I won't
believe in that. The set of the sanitizing filters is not perfect, however;
some filters are great and userful:
FILTER_SANITIZE_EMAIL
- helps to clean up typical mess caused by
copy-pasting an email.
FILTER_SANITIZE_URI - similar thing but to URIs.
FILTER_SANITIZE_NUMBER_FLOAT
- nice since it provides a flag to control
scientific notation (did you know is_float("1e1") is false,
but is_float(1e1), however, you always get a string from input variables,
and there is no other way to handle this case without weird manipulations
on a string).
The purpose of some filters like FILTER_SANITIZE_STRING
is difficult to
get, I agree, but the idea to solve common edge-cases with built-in
high-quality functionality is great, PHP is a language for Web and should
consider web context.
Is there a compelling need to have this in the core, as opposed to
Composer packages? The ecosystem has changed a lot since the original
function was introduced.I don't know that there is, I suspect the answer is probably not and
sanitization and validation is probably better left to userland code. The
only argument I can offer as devil's advocate is that certain validations
or transformations will be faster in core than in library scripts. I would
wager the most common implementation of such userland libraries today are
heavily reliant on preg_* functions so having some fast, low level baseline
in core for common tasks in this category might still make sense.While we're on the topic, can I bring up FILTER_SANITIZE_NUMBER_FLOAT? Why
is the default behaviour ofFILTER_SANITIZE_NUMBER_FLOAT
the same as
FILTER_SANITIZE_NUMBER_INT
unless you add extra flags to permit fractions?
Why is the constant nameFILTER_SANITIZE_NUMBER_FLOAT
but its counterpart
for validation isFILTER_VALIDATE_FLOAT
(no NUMBER_)? Why does validating a
float return a float but sanitizing a float return a string?What about
FILTER_VALIDATE_EMAIL
which is notorious for being next to
useless?Seems to me like there could at the very least be a plausible case for some
better to_float(), to_int(), is_valid_email() etc. type functions in core
to replace some of the filter API.
What about
FILTER_VALIDATE_EMAIL
which is notorious for being next to
useless?
[...]
Seems to me like there could at the very least be a plausible case for some
better [...] is_valid_email() etc. type functions in core
to replace some of the filter API.
The "notorious" thing I know is that validating e-mail addresses is next
to impossible because of multiple overlapping standards, and a huge
number of esoteric variations that might or might not actually be
deliverable in practice. If you think the implementation can be
improved, that doesn't need a new is_valid_email() function, just a
tested and documented patch to the existing one; if it can't be
improved, then any new function will be just as useless.
In practice, the most common typos don't result in invalid e-mail
addresses anyway, just incorrect ones - "gamil.com" instead of
"gmail.com", and so on. For those, you don't need to Validate or
Sanitize; you need to Escape and Verify: escape what you're given
(context-dependent, so necessarily part of an SMTP or API client
library), attempt to send an e-mail, and wait for the user to verify
they've received it.
filter_input()
is the only alternative to accessing superglobal arrays
directly.
[...]
FILTER_SANITIZE_EMAIL
- helps to clean up typical mess caused by
copy-pasting an email.
FILTER_SANITIZE_URI - similar thing but to URIs.
FILTER_SANITIZE_NUMBER_FLOAT
- nice since it provides a flag to control
scientific notation
None of these sounds very useful to me, but I think that just confirms
the biggest problem with the extension: it's trying to be everything to
everyone, and ends up with a bewildering set of options as a result. I
don't think any rewrite or replacement can ever avoid that problem,
because it's inherent in the problem space.
I have a draft proposal I might share soon for some "strict cast"
functions, but even simple cases like "string to integer" could have a
dozen different implementations which would all be equally "valid"
according to some use case or opinion, so it's a bit of a quagmire.
Regards,
--
Rowan Tommins
[IMSoP]
On Tue, Oct 4, 2022 at 11:34 AM Rowan Tommins rowan.collins@gmail.com
wrote:
The "notorious" thing I know is that validating e-mail addresses is next
to impossible because of multiple overlapping standards, and a huge
number of esoteric variations that might or might not actually be
deliverable in practice. If you think the implementation can be
improved, that doesn't need a new is_valid_email() function, just a
tested and documented patch to the existing one; if it can't be
improved, then any new function will be just as useless
There are multiple RFC standards for email address format but AFAIK PHP's
FILTER_SANITIZE_EMAIL
doesn't conform to any of them.
The idea behind my suggestion for something like is_valid_email (whatever
it might be named) is as a step towards deprecating and removing the entire
existing filter API, which I think many of us agree is a mess. As you said
below "it's trying to be everything to everyone, and ends up with a
bewildering set of options" - a rewrite or replacement which also tries to
be everything to everyone won't solve that problem, but getting rid of it
entirely will.
That said, the nature of PHP as a web-first language means it's reasonable
to include some individual, smaller, better APIs for certain validations or
sanitizations on types of data which are very commonly encountered in HTTP
requests. Examples include strings we expect or want to be valid integers,
decimals, email addresses and URLs. I think these features should remain,
but I'd happily see them even as a set of new, individual core functions if
it meant binning off filter_var and filter_input in PHP 9.
Regardless, look - I don't want to derail here - if most people are happy
with just deprecating some of the crappier and more confusing sanitize
filters and leave it at that, I say great, go for it, it's still an
improvement. I'm just saying if someone's going to take the time to look at
that problem space, why not go more than half the distance and reconsider
the fundamental approach of something we all know is pretty sucky anyway?
Just food for thought.
There are multiple RFC standards for email address format but AFAIK
PHP'sFILTER_SANITIZE_EMAIL
doesn't conform to any of them.
FILTER_SANITIZE_EMAIL
is a very short list of characters which claims to
be based on RFC 822 section 6:
https://heap.space/xref/php-src/ext/filter/sanitizing_filters.c?r=4df3dd76#295
FILTER_VALIDATE_EMAIL
doesn't say exactly which standard it's attempting
to adhere to; it's one of many long unreadable regexes I've seen online
claiming to cover all possible addresses. (Actually, there are now two
regexes there, because there's a different version to support
FILTER_FLAG_EMAIL_UNICODE).
https://heap.space/xref/php-src/ext/filter/logical_filters.c?r=d8fc05c0#651
The idea behind my suggestion for something like is_valid_email
(whatever it might be named) is as a step towards deprecating and
removing the entire existing filter API, which I think many of us
agree is a mess.
You described FILTER_VALIDATE_EMAIL
as "notorious for being next to
useless"; that gives us two possibilities:
a) A new function will be just as useless, because it will be based on
the same implementation
b) There is a better implementation out there, which we should start
using in ext/filter right now
My gut feel is that (a) is true, and there is no point considering what
a new function would be called, because we don't know how to implement it.
Regards,
--
Rowan Tommins
[IMSoP]
Le 6 oct. 2022 à 10:19, Rowan Tommins rowan.collins@gmail.com a écrit :
You described
FILTER_VALIDATE_EMAIL
as "notorious for being next to useless"; that gives us two possibilities:a) A new function will be just as useless, because it will be based on the same implementation
b) There is a better implementation out there, which we should start using in ext/filter right nowMy gut feel is that (a) is true, and there is no point considering what a new function would be called, because we don't know how to implement it.
Hi,
While it may be difficult to validate an email according to some IETF’s RFC, the HTML standard has pragmatically adopted a pattern (used to validate <input type=email>
fields) that is both readable and suitable for most practical purposes. See:
https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address
—Claude
While it may be difficult to validate an email according to some
IETF’s RFC, the HTML standard has pragmatically adopted a pattern
(used to validate<input type=email>
fields) that is both readable
and suitable for most practical purposes. See:https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address
Well, it would be a more clearly documented source than the current
implementation, although the spec admits it's "wilfully" not following
e-mail standards. I'd be happy to see it committed, maybe for PHP 8.3.
I note that it doesn't support internationalized addresses in their
Unicode form, though, so it won't do for FILTER_FLAG_EMAIL_UNICODE.
Regards,
--
Rowan Tommins
[IMSoP]
There are multiple RFC standards for email address format but AFAIK
PHP'sFILTER_SANITIZE_EMAIL
doesn't conform to any of them.
FILTER_SANITIZE_EMAIL
is a very short list of characters which claims to
be based on RFC 822 section 6:
https://heap.space/xref/php-src/ext/filter/sanitizing_filters.c?r=4df3dd76#295
FILTER_VALIDATE_EMAIL
doesn't say exactly which standard it's attempting
to adhere to; it's one of many long unreadable regexes I've seen online
claiming to cover all possible addresses. (Actually, there are now two
regexes there, because there's a different version to support
FILTER_FLAG_EMAIL_UNICODE).
https://heap.space/xref/php-src/ext/filter/logical_filters.c?r=d8fc05c0#651The idea behind my suggestion for something like is_valid_email
(whatever it might be named) is as a step towards deprecating and
removing the entire existing filter API, which I think many of us
agree is a mess.You described
FILTER_VALIDATE_EMAIL
as "notorious for being next to
useless"; that gives us two possibilities:a) A new function will be just as useless, because it will be based on
the same implementation
b) There is a better implementation out there, which we should start
using in ext/filter right now
For (b), well, there is always the option of handling email addresses
the way the IETF intended instead of using regexes.
For example, SMTP::MakeValidEmailAddress() from:
https://github.com/cubiclesoft/ultimate-email
Does three things quite differently from ext/filter:
- It uses a custom state engine to implement half of the relevant IETF
EBNF grammars and then cheats for the other half. The very complex
specifications that the IETF (and W3C) produces should generally be
implemented as custom state engines (finite state machines or FSMs) in
software. A custom state engine can correctly identify certain common
input errors and both transparently and correctly fix those errors in
very specific instances as it processes the input (e.g. gmail,com ->
gmail.com happens often). State engines can also accurately and
correctly do things such as remove CFWS (comments and folding
whitespace) from email addresses, which are not necessary components of
an email address and CFWS causes all kinds of issues. State engines,
when done right, can even outperform all other functional
implementations. State engines can also read partial input and maintain
their internal state while using few resources to process very large
inputs (not particularly relevant in this case). The current
regex-based approach in ext/filter is obviously causing some problems
that can probably be fixed by using a custom state engine.
Important caveat: Custom state engines do run the risk of winding up in
an infinite loop when forgetting to properly transition between states
or forgetting to move pointers through the input, resulting in DoS
issues. Been there, done that - they are both very easy things to do.
-
It parses email addresses in reverse: Domain part first, local part
second. The EBNF grammars for the domain part are simpler and less
contentious than the grammars for the local part. Also, IIRC, the
domain portion can't contain '@' while the local portion can - it's been
a while since I looked at the specs though. -
It considers sanitization and validation as being the same function.
There is no separate SMTP::IsValidEmailAddress() in the library
because there is no need for one. If MakeValidEmailAddress() can't turn
an input into a valid email address string, it returns an error. If the
returned email address is not the same as the one that was input, the
original address can be viewed as technically "invalid." One shared
internal function for bothFILTER_SANITIZE_EMAIL
and
FILTER_VALIDATE_EMAIL
would produce consistent output/results.
Other thoughts: I'm aware that a regex is effectively defining a state
engine as a compact string. However, as evidenced by the two Perl CPAN
regexes for email addresses currently in use, regexes are limited in
utility/function and are somewhat inflexible, get more difficult to read
and comprehend once they get longer than a few dozen bytes, and can't
readily correct errors or other problems in complex input strings. The
~250 lines of userland code referenced above is also not perfect (e.g.
extracting characters using substr()
is rather inefficient) but it works
well enough. The userland code also performs a DNS MX record check by
default, but that is its own complex can of worms and was probably not
the best idea I've ever had. However, the three main concepts are the
important takeaways here, not the referenced userland code.
My gut feel is that (a) is true, and there is no point considering what
a new function would be called, because we don't know how to implement it.
Perhaps the above will help to at least provide some new ideas to think
about/ponder.
--
Thomas Hruska
CubicleSoft President
CubicleSoft has over 80 original open source projects and counting.
Plus a couple of commercial/retail products.
What software are you looking to build?
Hi,
FILTER_SANITIZE_ENCODED
FILTER_SANITIZE_SPECIAL_CHARS
See https://www.php.net/manual/en/function.filter-input.php https://www.php.net/manual/en/function.filter-input.php Example #1 for an example of use. Apparently, “escaping” is considered as part of “sanitizing”?
If you want to educate your users, you can consider to deprecate them in favor of FILTER_DEFAULT
followed by urlencode()
, respectively htmlspecialchars()
. Ditto for various other FILTER_SANITIZE_* filters.
FILTER_UNSAFE_RAW
My wild guess is that “unsafe” means that “it is dangerous to use the result in random contexts (i.e., without properly escaping it, because we assume that you don’t even know what “escape” means). Use FILTER_SANITIZE_ENCODED, FILTER_SANITIZE_SPECIAL_CHARS
and/or FILTER_SANITIZE_MAGIC_QUOTES if you want to be safe” (for some nonstandard definition of “safe”). Of course, it should be renamed, because “safety” may be achieved by alternative means.
—Claude
Kamil Tekiela tekiela246@gmail.com:
These are just the things I found confusing and strange about the sanitize
filters. Let's try to put ourselves in the shoes of an average PHP
developer trying to comprehend these filters. It's quite easy to shoot
yourself in the foot if you try to use them. The PHP manual doesn't do a
good job of explaining them, but that's probably because they are not easy
to explain. I can't come up with good examples of when they should be used.
I agree there are many confusing names/features/behaviors.
IMO, input validation and output sanitization should be 2 different
features.
https://wiki.sei.cmu.edu/confluence/display/seccode/Top+10+Secure+Coding+Practices
Input validation is the 1st secure coding principle for input data
handling. Output sanitization
is the 7th secure coding principle for output data handling. Filter module
is mixing these up.
(And input validation should not sanitize input, but validate. Otherwise,
the web app is not
OWASP TOP 10 compliant. i.e. OWASP TOP 10 A09:2021 requires to detect DAST
attacks)
I wrote the input validation part years ago, if anyone is interested.
https://github.com/yohgaki/validate-php (Obsolete C module. Do not use)
https://github.com/yohgaki/validate-php-scr (PHP library)
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi all,
For quite some time now, PHP's sanitize filters have "Rustled My
Jimmies". These filters bother me because I can't really justify their
existence. I can understand that a few of them are sensible and may
come in handy, but I would like to talk about some of these in
particular.
I want to provide some context to why we have ext/filter, and why the
filters that we currently have exist. At the time when we introduced
ext/filter (which I mostly wrote), we were beholden to the scourge of
"magic quotes".
In order for PHP to allow for a safer acceptance of input variables into
a script, we added the ext/filter API to do so. The filters and
sanitisers that we added were at that moment reasonable to add, and also
likely to be used. We did punt on a view, and I am sure we made some
'interesting' decisions.
For example the e-mail validator was not designed to allow for what the
full spec allowed, but instead what we thought would be in-put by
reasonable people.
The sanitising filters were added to get a rough, but reasonable filter
to make data safe for specific contexts.
Some of them were added so that people could easily upgrade, but for
example setting the default filter to "magic_quotes" (or "add_slashes").
They're probably less useful now, but that doesn't distract that they
might still be in use.
I do believe we need to be better in promoting ext/filter's good use,
of which there are plenty of cases. And evulating on how to improve
(and not *remove) filters and sanitisers would be useful too.
Do you have specific suggestions towards that?
cheers,
Derick