Hi all,
This RFC is to add functions that are suitable for input validations
for secure coding. IMHO, these additions are mandatory for PHP.
https://wiki.php.net/rfc/add_validate_functions_to_filter
Vote ends 2016/08/22 23:59:59 UTC
I don't mind suspend vote and continue discussion if there is issue.
It's rather long RFC. Thank you for reading and voting!
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
This RFC is to add functions that are suitable for input validations
for secure coding. IMHO, these additions are mandatory for PHP.https://wiki.php.net/rfc/add_validate_functions_to_filter
Vote ends 2016/08/22 23:59:59 UTCI don't mind suspend vote and continue discussion if there is issue.
It's rather long RFC. Thank you for reading and voting!
Note for voting.
There are 2 votes for RFC acceptance and target PHP version.
https://wiki.php.net/rfc/add_validate_functions_to_filter#proposed_voting_choices
You have to vote twice. i.e. It cannot store 2 votes results at once.
I do make this mistake. Please be careful!
Thank you for voting!
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi all,
This RFC is to add functions that are suitable for input validations
for secure coding. IMHO, these additions are mandatory for PHP.https://wiki.php.net/rfc/add_validate_functions_to_filter
Vote ends 2016/08/22 23:59:59 UTCI don't mind suspend vote and continue discussion if there is issue.
It's rather long RFC. Thank you for reading and voting!Note for voting.
There are 2 votes for RFC acceptance and target PHP version.https://wiki.php.net/rfc/add_validate_functions_to_filter#proposed_voting_choices
You have to vote twice. i.e. It cannot store 2 votes results at once.
I do make this mistake. Please be careful!Thank you for voting!
One more usual request.
Please describe reason(s) why you object proposal.
I would like to improve proposal even when it is declined. In addition,
I see votes based on misconception/misunderstanding on occasions.
Thank you!
P.S. I've missed last PR update in the RFC. RFC is fixed.
Exception can be disabled by a option now.
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Yasuo,
One more usual request.
Please describe reason(s) why you object proposal.
I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.
But for posterity:
i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.
ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.
iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.
iv) The RFC makes an assumption that programs should exit when validation fails.
"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."
and the code example:
catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when input validation fails
}
This assumption is bogus.
Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.
v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.
cheers
Dan
For the record - these are what my input validation functions look
like. They are bespoke to the application, and provide useful error
messages to the end user when an exception handler catches that
specific exception to a 4xx HTTP response.
function validateOrderAmount($value) : int {
$count = preg_match("/[^0-9]*/", $value);
if ($count) {
throw new InvalidOrderAmount("Der Wert muss nur Ziffern enthalten.");
}
$value = intval($value);
if ($value < 1) {
throw new InvalidOrderAmount("Der Wert muss eine oder mehrere sein .");
}
if ($value >= MAX_ORDER_AMOUNT) {
throw new InvalidOrderAmount("Sie können nur
".MAX_ORDER_AMOUNT." auf einmal bestellen ");
}
return $value;
}
Hi Dan,
Thank you for sharing idea!
One more usual request.
Please describe reason(s) why you object proposal.I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.
Without feedback, there is no clue which way should go or improve.
I didn't realize some of your idea and I think your feedback is great.
But for posterity:
i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.
When exception is enabled, offensive key name is written in exception message.
When exception is disabled, your statement is true. This could be improved.
Good feedback.
ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.
It seems there is misunderstanding.
These new functions are intended for "secure coding input validation" that
should never fail. It means something unexpected in input data that
cannot/shouldn't keep program running. Why do you need to parse
message?
All needed info, filter name, key and value, is in exception message and
exception object, BTW.
This one is good feedback, too.
I appreciate better error message suggestions.
iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.
I thought I don't have to have example of userland implementation, so
it's good feedback also.
Typical OO implementation uses number of setters to define validation rules.
In addition, it validates validation rule is OK for it. e.g. It will
check input data type at least. Setters and validation rule validation
makes execution slower obviously.
One may optimize validation rules to plain array (like I do).
In this case, performance is could be better than previously mentioned
validators do all in the production environment.
I also thought the performance issue is not much important because
there is no PHP feature to compare. All of us knew PHP function call
overheads are relatively large and proposed almost all in C implementation
would be faster than userland.
iv) The RFC makes an assumption that programs should exit when validation fails.
"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."and the code example:
catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when input validation fails
}This assumption is bogus.
Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.
There is misunderstanding on this.
As I wrote explicitly in the RFC, input validation and user input
mistakes must be handled differently.
"The input validation (or think it as assertion or requirement) error"
that this RFC is dealing, is should never happen conditions (or think
it as contract should never fail).
The point of having the input validation is accept only inputs that
program expects and can work correctly. Accepting unexpected
data that program cannot work correctly is pointless.
Don't misunderstood me. I'm not saying "You should reject user input mistakes".
"User input mistakes" and "input validation error" is totally different error.
v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.
I didn't recommend it either because it could not be used for input
validation easily, escaping or sanitization could be done for
dedicated API.
Having new module is one of my idea also. However, I realized many of
filter module codes could be reused after investigation. That's the
reason why I added to filter module. I also named new functions to
have "validate_" prefix, rather than "filter_" to emphasis it's
for validations. I renamed them to "filter_*" to comply CODING_STANDARDS.
This feedback is great because I'm worrying about the same thing.
Please feedback this kind of things during discussion so that I can
do something on issues.
Thank you for comments.
I think it's very helpful for improvements!
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Dan,
Thank you for sharing idea!
One more usual request.
Please describe reason(s) why you object proposal.I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.Without feedback, there is no clue which way should go or improve.
I didn't realize some of your idea and I think your feedback is great.But for posterity:
i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.When exception is enabled, offensive key name is written in exception message.
When exception is disabled, your statement is true. This could be improved.
Good feedback.ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.It seems there is misunderstanding.
These new functions are intended for "secure coding input validation" that
should never fail. It means something unexpected in input data that
cannot/shouldn't keep program running. Why do you need to parse
message?All needed info, filter name, key and value, is in exception message and
exception object, BTW.This one is good feedback, too.
I appreciate better error message suggestions.iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.I thought I don't have to have example of userland implementation, so
it's good feedback also.Typical OO implementation uses number of setters to define validation rules.
In addition, it validates validation rule is OK for it. e.g. It will
check input data type at least. Setters and validation rule validation
makes execution slower obviously.One may optimize validation rules to plain array (like I do).
In this case, performance is could be better than previously mentioned
validators do all in the production environment.I also thought the performance issue is not much important because
there is no PHP feature to compare. All of us knew PHP function call
overheads are relatively large and proposed almost all in C implementation
would be faster than userland.iv) The RFC makes an assumption that programs should exit when validation fails.
"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."and the code example:
catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when input validation fails
}This assumption is bogus.
Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.There is misunderstanding on this.
As I wrote explicitly in the RFC, input validation and user input
mistakes must be handled differently."The input validation (or think it as assertion or requirement) error"
that this RFC is dealing, is should never happen conditions (or think
it as contract should never fail).The point of having the input validation is accept only inputs that
program expects and can work correctly. Accepting unexpected
data that program cannot work correctly is pointless.Don't misunderstood me. I'm not saying "You should reject user input mistakes".
"User input mistakes" and "input validation error" is totally different error.v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.I didn't recommend it either because it could not be used for input
validation easily, escaping or sanitization could be done for
dedicated API.Having new module is one of my idea also. However, I realized many of
filter module codes could be reused after investigation. That's the
reason why I added to filter module. I also named new functions to
have "validate_" prefix, rather than "filter_" to emphasis it's
for validations. I renamed them to "filter_*" to comply CODING_STANDARDS.This feedback is great because I'm worrying about the same thing.
Please feedback this kind of things during discussion so that I can
do something on issues.Thank you for comments.
I think it's very helpful for improvements!
Hi Dan,
I don't mind suspend vote for a while to resolve issues if there
should be changes in the RFC. I also don't mind adding missing
features, e.g. helpful error messages when exception is disabled, to
my todo list. BTW, I'll document basic idea of secure coding and
emphasize how it should be done, so misuse would be few.
It seems you aren't objecting the idea itself.
Could you give more feedback what's missing to change your vote?
Thank you!
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Dan,
I don't mind suspend vote for a while ...
It seems you aren't objecting the idea itself.
I do object to the concept of the RFC. In my first point I said it belongs in userland.
And I strongly object to the idea of stopping and starting voting on RFCS. Please leave the vote open and if it fails take some time to think about the feedback.
If then, in at least 3 months time, you think you can improve the RFC significantly, then reintroduce the idea.
cheers
Dan
I don't mind suspend vote for a while to resolve issues if there
should be changes in the RFC. I also don't mind adding missing
features, e.g. helpful error messages when exception is disabled, to
my todo list. BTW, I'll document basic idea of secure coding and
emphasize how it should be done, so misuse would be few.
There is no such thing as suspended vote. The vote has to restart if there
are changes.
Hi Pierre,
I don't mind suspend vote for a while to resolve issues if there
should be changes in the RFC. I also don't mind adding missing
features, e.g. helpful error messages when exception is disabled, to
my todo list. BTW, I'll document basic idea of secure coding and
emphasize how it should be done, so misuse would be few.There is no such thing as suspended vote. The vote has to restart if there
are changes.
Thank you.
I'll restart vote if I have to make changes, other than more items in
discussion section.
Anyone would change votes if I rename functions to avoid possible
confusions? There is one opinion for better names so far.
It seems either "People misunderstand secure coding" and/or "People consider
validation codes should be in userland fully". (Or "No more additions
for filter module"?)
Although I do think repetitive validations for browser's request
headers/inputs in userland is awfully inefficient, efficiency is not
1st priority of security anyway.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
I'll restart vote if I have to make changes,
Yasuo,
No one is asking you to make changes to the RFC. Just leave the voting
open, and then close it on the agreed date.
You would be violating the agreed RFC process* by closing a vote
early, just so that you an resubmit a proposal straight away.
https://wiki.php.net/rfc/voting
Additionally, you seem to completely have ignored this:
Dan Ackroyd wrote:
And I strongly object to the idea of stopping and starting voting on RFCS. Please leave the vote open and if it fails take some time to think about the feedback.
It would benefit everyone if you stopped responding immediately and
instead took time to actually think about what people have been
saying. This RFC isn't going to be in PHP 7.1, so it is fine to wait 3
months to present a new version of the RFC.
cheers
Dan
- "Resurrecting Rejected Proposals
In order to save valuable time, it will not be allowed to bring up a
rejected proposal up for another vote, unless one of the following
happens:
6 months pass from the time of the previous vote, OR The author(s)
make substantial changes to the proposal."
Hi Dan,
I understood about RFC process.
Additionally, you seem to completely have ignored this:
Dan Ackroyd wrote:
And I strongly object to the idea of stopping and starting voting on RFCS. Please leave the vote open and if it fails take some time to think about the feedback.
It would benefit everyone if you stopped responding immediately and
instead took time to actually think about what people have been
saying. This RFC isn't going to be in PHP 7.1, so it is fine to wait 3
months to present a new version of the RFC.
It seems I've marked "already read" by mistake.
Thank you for reminding.
I got that you prefer userland implementation.
I'm planning to propose "Filter module deprecation" when this RFC
is declined, because current validation filter is not good enough to
do the job and makes situation worse than better... If deprecation
RFC is declined also, then I might try to improve this RFC again.
BTW, I cannot guess the reason behind "no" votes. I can guess
reasons for people participating discussions, though. Even when
RFC author could guess the reason, it would be nicer for voters
and author if one explains the reason why vote "no" in vote thread.
Explicit description is better than guess, IMHO. Besides, unlike
you, there are many people do not left any clue.
For example, I completely fail to understand the reason why
"Enable session.use_strict_mode by default" and "Precise Session
Management" RFC is declined. These are mandatory for session
security and not a matter of preference, but do it and/or how to do it.
If one fails to see why it is mandatory, should ask why. If one
think "it must be more efficient", then should insist patch
improvement. If one think proposal is wrong, then should point
out what's wrong. IMO. If opinion is the same, should mention
"Same here"/"Agree" at least.
It's okay to say "let's ignore such security issues" or "let it users
responsibility to secure session", but his/her opinion should be
expressed. It's not a political vote, but technical vote after all.
I guess most people voted "no" for
"Enable session.use_strict_mode by default" and "Precise Session
Management" is based on wrong assumption.
For this vote, I'm guessing preferences are strongly affected,
filter module nature and patch quality. The code is messy because
I didn't refactor code to minimize changes. It's still a guess,
though.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
"Dan Ackroyd" wrote in message
news:CA+kxMuRiOBQpmTeKqNyV8rX0GKCLrYixi--y5TcYUkdqpT746w@mail.gmail.com...
Hi Yasuo,
One more usual request.
Please describe reason(s) why you object proposal.I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.But for posterity:
i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.
I agree 100%
ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.
I agree 100%
iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.iv) The RFC makes an assumption that programs should exit when validation
fails."Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."
I DISagree 100%. Validation errors should NEVER terminate the program, they
should continue by displaying all the error messages to the user so that
he/she can correct his/her mistake and try the operation again.
and the code example:
catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when
input validation fails
}This assumption is bogus.
Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.
I agree 100%
v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.
I agree 100%
--
Tony Marston
"Dan Ackroyd" wrote in message
news:CA+kxMuRiOBQpmTeKqNyV8rX0GKCLrYixi--y5TcYUkdqpT746w@mail.gmail.com..."Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."I DISagree 100%. Validation errors should NEVER terminate the program,
they should continue by displaying all the error messages to the user so
that he/she can correct his/her mistake and try the operation again.
Yasuo (who Dan quoted here) refers to completely invalid input, such as
invalid UTF-8 byte sequences. I think, that in this case the app should
bail out without even given detailed information, as such grossly
invalid input most likely is an attempt to attack (or a severe browser bug).
--
Christoph M. Becker
Hi!
Yasuo (who Dan quoted here) refers to completely invalid input, such as
invalid UTF-8 byte sequences. I think, that in this case the app should
bail out without even given detailed information, as such grossly
invalid input most likely is an attempt to attack (or a severe browser bug).
I personally am not a big fan of "bail out without giving information",
unless that information somehow crosses security boundary (e.g.
displaying PHP error messages in production) or reveals unnecessary info
(this part is super-tricky in crypto, but ouside of crypto common sense
is usually not a bad guide).
Assume indeed you have a buggy release of Firefox that produces invalid
UTF-8 when your language is set to Hindi (this is almost true story btw,
I've seen bug not exactly that but somewhat similar). Now assume you get
a message from the user "all our office can not use your application
since new version was deployed!" and you walk the user through and it
indeed bails out, no additional info. How you debug that? You don't know
Hindi is the culprit. You may not have access to that office's
environment. Your users can't help much but scream "get our app working
again, we're losing money here!". And of course it works for you when
you try it and best time to talk to them is 4am on your side.
Now, how much easier your life would be if you app would just report
"invalid UTF-8 sequence encountered in parameter FirstName" before
bailing out? How many hours, pulled out hairs and 4am sessions would it
save? I think it's worth considering.
--
Stas Malyshev
smalyshev@gmail.com
Hi!
Yasuo (who Dan quoted here) refers to completely invalid input, such as
invalid UTF-8 byte sequences. I think, that in this case the app should
bail out without even given detailed information, as such grossly
invalid input most likely is an attempt to attack (or a severe browser bug).I personally am not a big fan of "bail out without giving information",
<snip>
unless that information somehow crosses security boundary (e.g.
displaying PHP error messages in production) or reveals unnecessary info
(this part is super-tricky in crypto, but ouside of crypto common sense
is usually not a bad guide).Now, how much easier your life would be if you app would just report
"invalid UTF-8 sequence encountered in parameter FirstName" before
bailing out? How many hours, pulled out hairs and 4am sessions would it
save? I think it's worth considering.
I once introduced a check erroring with "Malformed UTF-8 detected" to a
CMS. Until that was changed to "Bad request. Please <a href=".">try
again</a>.", we got a lot of support requests from confused users who
had bookmarked URLs with ISO-8859-* query strings. Even pointing out
which parameter was the culprit, wouldn't have changed that, I presume.
Of course, it makes sense to log very detailed information in this
case (amongst others, the byte sequence that was malformed), but
presenting them to visitors doesn't seem to be helpful – most of these
wouldn't even know what UTF-8 is.
--
Christoph M. Becker
Hi Christoph,
Yasuo (who Dan quoted here) refers to completely invalid input, such as
invalid UTF-8 byte sequences. I think, that in this case the app should
bail out without even given detailed information, as such grossly
invalid input most likely is an attempt to attack (or a severe browser bug).I personally am not a big fan of "bail out without giving information",
<snip>
unless that information somehow crosses security boundary (e.g.
displaying PHP error messages in production) or reveals unnecessary info
(this part is super-tricky in crypto, but ouside of crypto common sense
is usually not a bad guide).Now, how much easier your life would be if you app would just report
"invalid UTF-8 sequence encountered in parameter FirstName" before
bailing out? How many hours, pulled out hairs and 4am sessions would it
save? I think it's worth considering.I once introduced a check erroring with "Malformed UTF-8 detected" to a
CMS. Until that was changed to "Bad request. Please <a href=".">try
again</a>.", we got a lot of support requests from confused users who
had bookmarked URLs with ISO-8859-* query strings. Even pointing out
which parameter was the culprit, wouldn't have changed that, I presume.Of course, it makes sense to log very detailed information in this
case (amongst others, the byte sequence that was malformed), but
presenting them to visitors doesn't seem to be helpful – most of these
wouldn't even know what UTF-8 is.
Excellent example of input validation exception!
Software has history, therefore certain validation cannot be done automatically.
For the record, many security standards/guides require to "Canonicalize"
input data before input validation. If anyone would like to validate
"String", canonicalize first.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Tony,
Allow me to top post.
"The input validation" is not for legitimate users, but for attackers.
You shouldn't help attackers by explaining what/how wrong in attackers' inputs.
I've added discussion "Input validation and User input mistake
handling difference"
https://wiki.php.net/rfc/add_validate_functions_to_filter#input_validation_and_user_input_mistake_handling_difference
Please refer to the section for distinction.
BTW, the input validation that I'm proposing here is
required/recommended feature by ISO 27000/ISMS. Why shouldn't PHP
provide features that is needed to implement ISO 27000/ISMS
requirements?
"Dan Ackroyd" wrote in message
news:CA+kxMuRiOBQpmTeKqNyV8rX0GKCLrYixi--y5TcYUkdqpT746w@mail.gmail.com...Hi Yasuo,
One more usual request.
Please describe reason(s) why you object proposal.I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.But for posterity:
i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.I agree 100%
ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.I agree 100%
iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.iv) The RFC makes an assumption that programs should exit when validation
fails."Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."I DISagree 100%. Validation errors should NEVER terminate the program, they
should continue by displaying all the error messages to the user so that
he/she can correct his/her mistake and try the operation again.and the code example:
catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when
input validation fails
}This assumption is bogus.
Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.I agree 100%
v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.I agree 100%
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Allow me to top post.
"The input validation" is not for legitimate users, but for attackers.
You shouldn't help attackers by explaining what/how wrong in attackers' inputs.
What is expected as 'post' data input is defined when building the page.
That some people will intercept the page and try to use it to inject
'invalid' data in an attempt to perhaps gain access to data is a
separate problem, but still part of the validation process. One of the
hacks I had to deal with recently was simply an xss hole because nobody
filtered or trimmed the username. So you could just type what you
wanted. Simply add a suitable pattern to the html5 validation and the
casual hacker is averted ... but how many PHP examples actually use html5?
Of cause someone can build their own result set and bypass the browser
validation. Which is where some cleaver use of javascript might help to
add a security check to the submit packet. Outside PHP, but still part
of the overall picture. In any case once the get/post array is in PHP
there is a need to recheck everything once again and while the average
user may not happy simply to bounce the page if the username field now
has an invalid imput, other systems will want to log the attempt and
perhaps capture any source information. White screen crashes because
someone has broken the data can be difficult to unravel especially when
it's some consented effort to get in ... in my case someone trying every
possible Mysql hack against firebird :( So I end up with extra code to
filter the attack attempt and that tends to have to be at the variable
level.
It can be useful to give feedback simply to get them to give up without
an explanation why. Simply crashing the page means they try the next
option until they do get a response ...
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
"Yasuo Ohgaki" wrote in message
news:CAGa2bXZjgggpJsVXQMDJMqvnpTBUCAhazBwjRtBiPsb-BohQnQ@mail.gmail.com...
Hi Tony,
Allow me to top post.
"The input validation" is not for legitimate users, but for attackers.
You shouldn't help attackers by explaining what/how wrong in attackers'
inputs.
If your RFC is about preventing attacks then it should be labelled as such.
Input validation is a totally separate issue.
--
Tony Marston
I've added discussion "Input validation and User input mistake
handling difference"
https://wiki.php.net/rfc/add_validate_functions_to_filter#input_validation_and_user_input_mistake_handling_differencePlease refer to the section for distinction.
BTW, the input validation that I'm proposing here is
required/recommended feature by ISO 27000/ISMS. Why shouldn't PHP
provide features that is needed to implement ISO 27000/ISMS
requirements?On Mon, Aug 15, 2016 at 8:00 PM, Tony Marston TonyMarston@hotmail.com
wrote:"Dan Ackroyd" wrote in message
news:CA+kxMuRiOBQpmTeKqNyV8rX0GKCLrYixi--y5TcYUkdqpT746w@mail.gmail.com...Hi Yasuo,
One more usual request.
Please describe reason(s) why you object proposal.I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.But for posterity:
i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.I agree 100%
ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.I agree 100%
iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.iv) The RFC makes an assumption that programs should exit when
validation
fails."Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."I DISagree 100%. Validation errors should NEVER terminate the program,
they
should continue by displaying all the error messages to the user so that
he/she can correct his/her mistake and try the operation again.and the code example:
catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when
input validation fails
}This assumption is bogus.
Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.I agree 100%
v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.I agree 100%
To those who voted "no" for this,
"Dan Ackroyd" wrote in message
news:CA+kxMuRiOBQpmTeKqNyV8rX0GKCLrYixi--y5TcYUkdqpT746w@mail.gmail.com...Hi Yasuo,
One more usual request.
Please describe reason(s) why you object proposal.I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.But for posterity:
i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.I agree 100%
ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.I agree 100%
iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.iv) The RFC makes an assumption that programs should exit when validation
fails."Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."I DISagree 100%. Validation errors should NEVER terminate the program, they
should continue by displaying all the error messages to the user so that
he/she can correct his/her mistake and try the operation again.and the code example:
catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when
input validation fails
}This assumption is bogus.
Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.I agree 100%
v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.I agree 100%
--
Tony Marston
Could you explain why or express "same opinion here"?
IMHO, this opinion is based on misunderstanding of secure coding and
software security methodology. However, either way is helpful.
There are many that seem "the input validation should be implemented
by userland fully". If this is true, I'm going to propose "Filter
module" deprecation next. It's better not to have misleading and/or
half implemented module for security purpose.
Thank you!
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi all,
This RFC is to add functions that are suitable for input validations
for secure coding. IMHO, these additions are mandatory for PHP.https://wiki.php.net/rfc/add_validate_functions_to_filter
Vote ends 2016/08/22 23:59:59 UTCI don't mind suspend vote and continue discussion if there is issue.
It's rather long RFC. Thank you for reading and voting!
Thank you for voting!
The RFC is declined 1 vs 13
A bit surprised this result.
I requested the reason of objection, but many of them does not disclose why.
https://wiki.php.net/rfc/add_validate_functions_to_filter#proposed_voting_choices
bwoebi (bwoebi)
colinodell (colinodell)
danack (danack)
derick (derick)
diegopires (diegopires)
guilhermeblanco (guilhermeblanco)
kguest (kguest)
levim (levim)
lstrojny (lstrojny)
marcio (marcio)
nikic (nikic)
ocramius (ocramius)
peehaa (peehaa)
santiagolizardo (santiagolizardo)
I would like to summarize objection points during discussion.
I assume above of us voted no for these reasons.
- Input data validation cannot be done because client can be anything.
- Input data validation should show what's wrong, not exception.
- Input data validation error and input mistake error should be treated
by the same code to remove code redundancy. - Current filter module is good enough.
IMO. These are clearly wrong reasons of objection.
-
Almost all input data can be validated because of
- Web standards. e.g. Almost all form input must be "valid string".
- Client side validation. e.g. JS, HTML5.
- Many parameters are set by program and shouldn't be changed.
e.g. Select, radio, hidden, database record ID.
-
Showing what's wrong in input validation is ANTI practice of security.
- Developers should NOT show error details unless it has to, otherwise
it helps attackers to tamper system. - "You have broken encoding", "You have unallowed CNTRL char", etc, are
the same as "You have entered wrong user name", "You have entered
wrong password", "You have entered too long password", etc.
- Developers should NOT show error details unless it has to, otherwise
-
This is not reasonable choice for large applications that have higher
security requirements.- Strict input validation should check all inputs including request
headers and cookies. Checking these in business logic makes
things messy and complicated, hence easy to make mistakes.
- Strict input validation should check all inputs including request
-
Current filter module does not work for strict validations.
- I don't repeat. It just does not work well for strict validation.
(NOTE: "input validation" is "the input validation" mentioned in the RFC)
If you have question, I don't mind at all to explain more. I think most of
you misunderstood the concept.
If you have other reason(s), please let me know to improve RFC.
Thank you!
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Yasuo,
[...]
Thank you for voting!
The RFC is declined 1 vs 13
A bit surprised this result.I requested the reason of objection, but many of them does not disclose why.
[...]
lstrojny (lstrojny)
[...]
sorry for not chiming in earlier, but I indeed owe you an explanation. I believe making ext/filter a part of PHP created more trouble than it solved, even though I applaud it’s intention. Of course, filtering and validation are necessary essentials of any secure web application. I nevertheless strongly believe validation and filtering must live in userland.
Validation and filtering are often very much tied to the domain problem a user of PHP is to solving and the change rate of the application will be higher than the change rate of the language (hopefully). To give a more concrete example: let’s say our problem is we want to validate if a string is a valid domain because our business is registering domains. Nowadays, top level domains are introduced quite often and there is no way PHP could have a nice, up to date whitelist of TLDs all of the time and as a domain registration business it’s impossible for me to wait for the updated whitelist in PHP NEXT. That’s why I believe this is something that belongs to userland so the library that offers (domain) validation can follow a lifecycle that fits the problem it is trying to solve.
cu,
Lars
Hi Lars,
[...]
Thank you for voting!
The RFC is declined 1 vs 13
A bit surprised this result.I requested the reason of objection, but many of them does not disclose why.
[...]
lstrojny (lstrojny)
[...]sorry for not chiming in earlier, but I indeed owe you an explanation. I believe making ext/filter a part of PHP created more trouble than it solved, even though I applaud it’s intention. Of course, filtering and validation are necessary essentials of any secure web application. I nevertheless strongly believe validation and filtering must live in userland.
Validation and filtering are often very much tied to the domain problem a user of PHP is to solving and the change rate of the application will be higher than the change rate of the language (hopefully). To give a more concrete example: let’s say our problem is we want to validate if a string is a valid domain because our business is registering domains. Nowadays, top level domains are introduced quite often and there is no way PHP could have a nice, up to date whitelist of TLDs all of the time and as a domain registration business it’s impossible for me to wait for the updated whitelist in PHP NEXT. That’s why I believe this is something that belongs to userland so the library that offers (domain) validation can follow a lifecycle that fits the problem it is trying to solve.
Thank you for reply.
It seems many of us is mixed up what "input handling should do" and
"business logic should do".
There are number of ways how to implement input data validations and
input error checks, from ideal to poor, or even bad. The validator is
trying "to validate input string (format, used char, length,
existence, etc) is expected". Business logic should handle input
errors, logical consistency, etc. i.e. Domain whitelisting should be
handled by logic generally speaking.
I don't understand why new validator would cause more problems than
solving. If users validate all inputs (e.g. request headers, cookies,
all of post/get tampering), apps became much more secure. This task
does not belong to business(app) logic. Even when users use the
validator non optimal way, it will improve security.
Anyway, bottom line is "There are too many apps that do not validate
inputs properly", "Many users do not distinguish 'input validation'
and 'logic/mistake check'". It seems.
Regards,
P.S. I was about to reactivate DbC proposal. This kind of validation
is mandatory for DbC. Otherwise, DbC will cause more problems than
solving.
--
Yasuo Ohgaki
yohgaki@ohgaki.net
I don't understand why new validator would cause more problems than
solving. If users validate all inputs (e.g. request headers, cookies,
all of post/get tampering), apps became much more secure. This task
does not belong to business(app) logic. Even when users use the
validator non optimal way, it will improve security.
The whole problem with that statement is at what point do you
distinguish between an input being invalid because it does not meet some
validation such as bigger than X for 'validation' reasons rather than
'business logic' reasons. STILL in my book, it's the business logic that
defines the base validation but I don't need DbC as a straight jacket to
define that. Adding additional 'woolly' validation checks around the
base validation is a pointless exercise if the rules of the base
validation are available to use.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi Lester,
I don't understand why new validator would cause more problems than
solving. If users validate all inputs (e.g. request headers, cookies,
all of post/get tampering), apps became much more secure. This task
does not belong to business(app) logic. Even when users use the
validator non optimal way, it will improve security.The whole problem with that statement is at what point do you
distinguish between an input being invalid because it does not meet some
validation such as bigger than X for 'validation' reasons rather than
'business logic' reasons. STILL in my book, it's the business logic that
defines the base validation but I don't need DbC as a straight jacket to
define that. Adding additional 'woolly' validation checks around the
base validation is a pointless exercise if the rules of the base
validation are available to use.
Security purpose input validation (injection prevention mainly)
differs from what business logic does. Business logic should
focus on logical correctness while input validation should focus
on security.
I've audited number of MVC applications and have to admit that
input validations in models are poor. Besides input validation
should be done ASAP, model validation is very poor in many cases.
i.e. Not good enough for security purpose.
This is natural because what business logic should take care is
"Logic", not what data should look like, data have correct encoding,
make sure request headers/cookies/post/get are not tampered, etc.
Taking care of tampered data by business logic will reduce both
readability and maintainability. And more importantly, make code
less secure because programmers tend to focus on logic
in model, not input data validations.
Validations in model being less secure is proven already.
It is not a surprise since model is for "business logic".
(If app requirement is ok with validation with model, it's ok to
design so. Not all apps should have ideal secure coding.)
Why shouldn't we have more secure validation?
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Taking care of tampered data by business logic will reduce both
readability and maintainability. And more importantly, make code
less secure because programmers tend to focus on logic
in model, not input data validations.
This certainly makes sense. I guess the challenge is that in order to
know if data has been tampered, you need to have some knowledge of the
expected format. That expectation depends on what data you're expecting,
which depends - ultimately - on the domain objects being modelled.
More specifically, though, it depends on the interaction design - in an
HTML context, the forms being presented. So the validation needs
knowledge of the form controls - e.g. if a select box was shown, and the
value is not from the known list of options, the input has been tampered
with.
If that's the case, the logical place to build the validation is into a
form builder. At which point you've probably got a complex architecture
in userland, and filter_* functions are unlikely to be a natural fit.
If somebody's not using a library to build the form (e.g. they're
laying out the HTML by hand), are they likely to set up the complex
validation settings needed by the filter_* functions?
Regards,
Rowan Collins
[IMSoP]
If somebody's not using a library to build the form (e.g. they're
laying out the HTML by hand), are they likely to set up the complex
validation settings needed by the filter_* functions?
The main problem is the lack of well built libraries that also take care
of validation. Form Builders don't often include a good validation
model. I've been going through those hoops for the last couple of years.
If we have a set of validated parameters coming in from that form then
as you say do the rules then exist to build a filter array, while I'm
looking to those rules simply to be applied when I save each parameter
to it's internal variable.
A filter of "is this string corrupted with an injection attempt" seems
rather more difficult to define than "email"? And applying the first in
general on every string when there are as set of simple filters that can
be used ... as an alternative to the more difficult to define ones?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi Lester,
A filter of "is this string corrupted with an injection attempt" seems
rather more difficult to define than "email"? And applying the first in
general on every string when there are as set of simple filters that can
be used ... as an alternative to the more difficult to define ones?
Input validation code does not have to address all of injections. It's
output code responsibility to prevent injections in the first place.
i.e. Top 10 Secure Coding Practices - #7
https://www.securecoding.cert.org/confluence/display/seccode/Top+10+Secure+Coding+Practices
Nonetheless, ID validation being poor is not rare even with well
known code. parameters like ID is easy to make sure it's safe from any
injections.
e.g. https://groups.google.com/forum/#!topic/rubyonrails-security/ly-IH-fxr_Q
ID is not the only one, accept language, encoding, referer, etc are
common source of injections also.
Input validation code is for mitigation against unknown/unaddressed
vulnerabilities in entire code not only PHP code, but also language,
libraries written by C/C++ and/or external systems such as DB.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Lester,
A filter of "is this string corrupted with an injection attempt" seems
rather more difficult to define than "email"? And applying the first in
general on every string when there are as set of simple filters that can
be used ... as an alternative to the more difficult to define ones?Input validation code does not have to address all of injections. It's
output code responsibility to prevent injections in the first place.
i.e. Top 10 Secure Coding Practices - #7
https://www.securecoding.cert.org/confluence/display/seccode/Top+10+Secure+Coding+Practices
Your statement and those coding points don't go together.
1 ... Any input to PHP has to be untrusted since you can't rely on even
clean sources being intercepted.
8 ... Why not use the best available checks on the input side?
7 ... I've sanitised the data in the browser, but because of morons I
can't use it without addressing 1 and 8.
All this comes back to my simple idea of adding all these validation,
filtering and sanitation steps wrapped around the basic PHP variable.
And THAT also includes 'strict typing' since if we have the option to
select soft or hard failure when a problem is found in the variable we
can cover everybody’s 'need'!
Nonetheless, ID validation being poor is not rare even with well
known code. parameters like ID is easy to make sure it's safe from any
injections.
e.g. https://groups.google.com/forum/#!topic/rubyonrails-security/ly-IH-fxr_Q
I know the range of values available for 'id' it's provided by the
SEQUENCE source in the database but if you insist on 'autoinc' we can do
the job properly. So my filter on the variable :id is looking for a
number in a range. What could be a simpler validation than that?
ID is not the only one, accept language, encoding, referer, etc are
common source of injections also.Input validation code is for mitigation against unknown/unaddressed
vulnerabilities in entire code not only PHP code, but also language,
libraries written by C/C++ and/or external systems such as DB.
If you need to retain the raw input of non-php material then that is
just a more complex filter. Point 5 above - Default Deny - do not
forward anything that you do not need. So once you have applied rule 9,
and assured you know what you expect to receive, then only that is
passed on to rule 8. If that data being passed on has a potential to
carry a vulnerability forward it's because you have to allow for that
data to be forwarded anyway, so a filter to prevent it is pointless?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi Lester,
Hi Lester,
A filter of "is this string corrupted with an injection attempt" seems
rather more difficult to define than "email"? And applying the first in
general on every string when there are as set of simple filters that can
be used ... as an alternative to the more difficult to define ones?Input validation code does not have to address all of injections. It's
output code responsibility to prevent injections in the first place.
i.e. Top 10 Secure Coding Practices - #7
https://www.securecoding.cert.org/confluence/display/seccode/Top+10+Secure+Coding+PracticesYour statement and those coding points don't go together.
1 ... Any input to PHP has to be untrusted since you can't rely on even
clean sources being intercepted.
8 ... Why not use the best available checks on the input side?
7 ... I've sanitised the data in the browser, but because of morons I
can't use it without addressing 1 and 8.
Software that are executed by other process and/or computer is outside
of the software you're trying to protect. All other softwares that are
executed by other processes/computers shouldn't be trusted even if it
is written by you.
Rule #8 is for multiple layer protections. It's for fail safe. I think
you're familiar with defense in depth approach in network security.
e.g. Protect overall network via internet firewall, protect service
network via firewall, protect clients via client firewall.
Software architecture could be like network systems.
https://wiki.php.net/_detail/rfc/screenshot_from_2016-08-05_11-25-01.png?id=rfc%3Aadd_validate_functions_to_filter
i.e. Protect application via application input validations, protect
module via module validations, protect function/method via
function/module validations.
If you validate everything, software will be more secure, but it could
be executed very slow due to repetitive/excessive validations. That's
the reason why DbC validate everything (check every contract) during
development, but doesn't validate everything for production
environment.
Removing all validations is risky. Therefore, we need to keep
important validations in software for production environment.
Application level validations (contracts) cannot be removed obviously,
critical code such as command execution is better to keep validation
active.
All this comes back to my simple idea of adding all these validation,
filtering and sanitation steps wrapped around the basic PHP variable.
And THAT also includes 'strict typing' since if we have the option to
select soft or hard failure when a problem is found in the variable we
can cover everybody’s 'need'!
Forcing data type is "Weak form of validation".
Data types forces (≒ validates) certain form.
In spite of that forcing data type is valuable for security because
programmers are tends to expect int/float to have numeric data
representation.
However, strict data typing is not enough obviously. It does not force
certain form to "string", certain range of value to "int"/"float".
String is the most dangerous data, yet it is not covered by data
typing. Therefore, strict data typing is weak and secure coding
specialists recommends validation for strongly typed languages, too.
Nonetheless, ID validation being poor is not rare even with well
known code. parameters like ID is easy to make sure it's safe from any
injections.
e.g. https://groups.google.com/forum/#!topic/rubyonrails-security/ly-IH-fxr_QI know the range of values available for 'id' it's provided by the
SEQUENCE source in the database but if you insist on 'autoinc' we can do
the job properly. So my filter on the variable :id is looking for a
number in a range. What could be a simpler validation than that?
The example vulnerability is in Action Pack. ID does not have to be
numeric ID, yet some users (including Rails developer I suppose) are
blindly assume it's numeric.
If users are deploying proper validation and reject malformed ID, they
could avoid arbitrary code execution by malformed ID. We know there
are countless vulnerabilities that are similar to this.
ID is not the only one, accept language, encoding, referer, etc are
common source of injections also.Input validation code is for mitigation against unknown/unaddressed
vulnerabilities in entire code not only PHP code, but also language,
libraries written by C/C++ and/or external systems such as DB.If you need to retain the raw input of non-php material then that is
just a more complex filter. Point 5 above - Default Deny - do not
forward anything that you do not need. So once you have applied rule 9,
and assured you know what you expect to receive, then only that is
passed on to rule 8. If that data being passed on has a potential to
carry a vulnerability forward it's because you have to allow for that
data to be forwarded anyway, so a filter to prevent it is pointless?
It's not pointless as I described above.
I guess you are feeling secure coding is inefficient and repetitive.
However, this is the whole point of secure coding. We have to admit
that "people do mistakes". To build secure software, we need multiple
layer of protections. Input and output control is the most important.
http://cwe.mitre.org/top25/#Mitigations
#1 - Input control
#2 - Output control
Validations make software more maintainable, i.e. mistakes can be
found by validations in module/method/function, and more secure, i.e.
most vulnerabilities are due to unexpected input data. If data is
supposed to have certain form, it should be validated as soon as
possible to make software works correctly.
The reason why DbC can help securing code is "it can strictly force
data/object domain during development". What's missing is runtime
rule/validation to force input data domain. The proposed validator can
be used to specify "form" of input data at runtime. Therefore the
validator is mandatory. "Logical consistency" should be handled by
models at runtime.
It may differ from your software security model. Programmers are free
to choose which model to adopt. However, one shouldn't disturb
mandatory tool implementation for recommended security model by secure
coding specialists, IMHO. If you don't like/need it, it's free not to using
it after all.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
It may differ from your software security model. Programmers are free
to choose which model to adopt. However, one shouldn't disturb
mandatory tool implementation for recommended security model by secure
coding specialists, IMHO. If you don't like/need it, it's free not to using
it after all.
My security model is no different to yours. But in my model 'Add
validation functions to filter module' is adding another layer of checks
and I think I'm simply adding them in a different place.
I return to the original question which has not yet been answered. The
block of input data being supplied from what ever source needs to be
converted to a set of variables in PHP. That could be variables in a
class, an associative array as in $_POST or simple variables which are
probably ancient history now. If the definition of a variable is
improved to include ALL of the validation we ideally need and I include
setStrict(int) in that then at run time we can both validate input and
decide on the error model that is applied. I think DbC is a wrapper at
the development level as you describe it and we are back at the
'annotation' debate. What I'm still looking for is primary annotation
such as 'strict' if appropriate although I would look at that as
'between 0 and 200' rather than expecting a clean binary integer to be
supplied via some interface.
I can use the annotation information to build the browser side
validation, and know that I'm working with the same set of rules, and I
would also include escaping rules so that the general string data can
manage if material of a suspect nature is being processed. Such as
WRITING the script files that are needed to output the elements that a
blanket htmlentities()
filter would block! If one is building template
and javascript packages of code in the database then you need to filter
the malicious stuff before saving them and ensure the stored data is clean.
I could envisage loosening the validation checks on a secure private
network where malicious activity would be a firing offence, but the sort
of layer of security I'm looking at should not introduce any more delay
than the normal. The way it falls down is if people can't be bothered to
set the validation values up ... or create your filter array. Default
rules such as your crude filters are a point for discussion.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
If the definition of a variable is
improved to include ALL of the validation we ideally need and I include
setStrict(int) in that then at run time we can both validate input and
decide on the error model that is applied.
And I know I will get my head chewed of for combining threads, but
readOnly(); Seems to me the correct answer to the whole of the
'immutable' debate. The class simply creates a readOnly object,
validated against all the rules and stores it as a readOnly object. No
reason you can't simply call the class again and create a separate
object. But this is where my 'model' of the world is an associative
array set of data handled by a separate set of code. If you need 100
read only dates for a calendar you only need one set of code to generate
them. The created objects would all be validated against the date rules
and then locked so you can't modify them.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi Rowan,
Taking care of tampered data by business logic will reduce both
readability and maintainability. And more importantly, make code
less secure because programmers tend to focus on logic
in model, not input data validations.This certainly makes sense. I guess the challenge is that in order to know
if data has been tampered, you need to have some knowledge of the expected
format. That expectation depends on what data you're expecting, which
depends - ultimately - on the domain objects being modelled.More specifically, though, it depends on the interaction design - in an HTML
context, the forms being presented. So the validation needs knowledge of the
form controls - e.g. if a select box was shown, and the value is not from
the known list of options, the input has been tampered with.If that's the case, the logical place to build the validation is into a form
builder. At which point you've probably got a complex architecture in
userland, and filter_* functions are unlikely to be a natural fit.If somebody's not using a library to build the form (e.g. they're laying
out the HTML by hand), are they likely to set up the complex validation
settings needed by the filter_* functions?
I agree HTML form is the most important input. However, apps
has to deal with other inputs like GET/COOKIE/JSON/HTTP Request
headers/Inputs from other subsystems. Form builder does not work
well for input data other than forms...
Even with great form builder, letting validation to input code has
advantages.
One issue is "data handling is better to break down into 2 parts",
logic and format. Since model's responsibility is to handle logic,
programmers try to handle logic in model. It's not bad at all with
this. However, there are many apps have poor format checks
because of this.
2nd issue is coverage. "model deals with data that is handled by
the model." Data validation covered by a model does not match with
application inputs. There are many vulnerabilities "Oops, this value
is not validated" even for data like ID. This is not limited to PHP apps.
e.g. https://groups.google.com/forum/#!topic/rubyonrails-security/ly-IH-fxr_Q
3rd issue is location. Input data validation is better to be done as
soon as possible. When application accepts input, programmers
know what the possible inputs, and could cover all inputs. i.e.
Controller is the best place for input format validation.
"Data" domain can be defined as a single definition, but making
sure it has "Valid Format" by models/libraries is harder than it seems.
If it's easy, we don't have this many vulnerabilities in PHP apps.
The best practice of input validation is
"Establish and maintain control over all of your inputs."
http://cwe.mitre.org/top25/#Mitigations
There are many possible implementations for this.
If we apply DbC to application, we have to validate external inputs
somewhere to keep contract and make code works w/o any
problems. Natural location for input validation is controller.
Internal redirects without validation could be serious bug
such as authentication bypass. Input validation can mitigate
such bug also if validation is done at controller.
BTW, I don't think everyone has to validate input very strict
manner. It is ok to validate like
<?php
// Define loose input validation
$server_def = array(
// REQUEST HEADER
'HTTP_ACCEPT' => $text64b_spec, // Text up to 64 bytes
'HTTP_ACCEPT_ENCODING' => $text64b_spec,
'HTTP_ACCEPT_LANGUAGE' => $text64b_spec,
'HTTP_USER_AGENT' => $text512b_spec, // Text up to 512 bytes
'HTTP_COOKIE' => $text1k_spec, // Text up to 1024 bytes
'REQUEST_URI' => $text4k_spec, // Text up to 4096 bytes
);
$cookie_def = array(
// COOKIE
'PHPSESSID' => $phpsessid_spec,
'uid' => $uid_spec,
'last_access' => $int_spec,
);
$get_def = array(
// GET
'id' => $id_spec,
'other_id' => $id_spec,
'type1' => $alnum32b_spec, // Alpha numeric up to 32 bytes
'type2'=> $alnum32b_spec,
);
$post_def = array(
// POST
'text1' => $input_tag_spec, // <input type=text> default up to
// 512 bytes text
'text2' => $input_tag_spec,
'select1' => $select_tag_spec, // <select> values default up to
// 64 bytes text
'select2' => $select_tag_spec,
'radio1' => $radio_tag_spec, // <radio> values default up to
// 64 bytes text
'radio2' => $radio_tag_spec,
'submit' => $submit_tag_spec,
'textarea1k' => $textarea1k_spec, // <textarea> values default up
// to 1K text,
allow newline
'textarea100k' => $textare100k_spec, // <textarea> values default
// up to
100K text, allow newline
'CSRF_TOKEN' => $alnum32b_spec, // Alpha numeric up to 32 bytes
// and so on
);
$_SERVER = filter_require_var_array($_SERVER);
$_COOKIE = filter_require_var_array($_COOKIE);
$_GET = filter_require_var_array($_GET);
$_POST =filter_require_var_array($_POST);
?>
Since new string validator validates encoding and control chars
including newline, this validation rejects tampered inputs include
broken encoding, null char injection, newline injection and other
CNTRL char injection.
This definition is loose and weak, but we are sure "broken encoding,
null char injection, newline injection and other CNTRL char injection"
FREE at least, except newline injections by textarea inputs. It's
very important we are sure free from certain vulnerabilities even if
output code/library/subsystem is supposed to sanitize data for 100%
safety.
We still have these kind of vulnerabilities.
e.g. I've fixed mail header injections via extra headers parameter recently.
https://bugs.php.net/bug.php?id=68776
We cannot be sure if 3rd party modules are free from encoding, CNTRL
injection attacks.
How far developers would like to validate by the input validation code
is up to developers. The more validate strictly, more secure and
increases chance to avoid hidden vulnerabilities in your code or other
people's code. i.e. Framework, library, or even PHP/subsystems like
database.
Regards,
P.S.
To people against this RFC,
How many of you are against the idea of this RFC?
(I don't think Rowan against basic idea, BTW)
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Rowan,
This certainly makes sense. I guess the challenge is that in order to know
if data has been tampered, you need to have some knowledge of the expected
format. That expectation depends on what data you're expecting, which
depends - ultimately - on the domain objects being modelled.More specifically, though, it depends on the interaction design - in an HTML
context, the forms being presented. So the validation needs knowledge of the
form controls - e.g. if a select box was shown, and the value is not from
the known list of options, the input has been tampered with.
BTW, I don't think everyone has to validate input very strict
manner. It is ok to validate like<?php
// Define loose input validation$get_def = array(
// GET
'id' => $id_spec,
'other_id' => $id_spec,
'type1' => $alnum32b_spec, // Alpha numeric up to 32 bytes
'type2'=> $alnum32b_spec,
);$post_def = array(
// POST
'text1' => $input_tag_spec, // <input type=text> default up to
// 512 bytes text
'text2' => $input_tag_spec,
'select1' => $select_tag_spec, // <select> values default up to
// 64 bytes text
'select2' => $select_tag_spec,
'radio1' => $radio_tag_spec, // <radio> values default up to
// 64 bytes text
'radio2' => $radio_tag_spec,
'submit' => $submit_tag_spec,
'textarea1k' => $textarea1k_spec, // <textarea> values default up
// to 1K text,
allow newline
'textarea100k' => $textare100k_spec, // <textarea> values default
// up to
100K text, allow newline
'CSRF_TOKEN' => $alnum32b_spec, // Alpha numeric up to 32 bytes
// and so on
);$_GET = filter_require_var_array($_GET);
$_POST =filter_require_var_array($_POST);
?>
These "simple" examples are still very closely bound to the HTML form
(or API definition, or whatever).
If a change is made to the form, even these simple rules need to be
changed. Every time a field is added or removed, these validation rules
need to be updated.
Or consider for example a select box which only ever contains integer
IDs; the simple validation for this would be to reject non-numeric input
as tampering. But if the UI changes to a fancy combo box autocomplete
widget, non-numeric input might instead merit a user-friendly validation
message.
You could just about guarantee that most fields will never need to
accept control characters. But even newlines come and go - a "revision
comment" field might be one line (the traditional wiki style) or many
(common version control style).
3rd issue is location. Input data validation is better to be done as
soon as possible. When application accepts input, programmers
know what the possible inputs, and could cover all inputs. i.e.
Controller is the best place for input format validation.
You're mixing two things here, I think: one is when the validation is
run (how soon in the execution pipeline); the other is where it is
defined (which PHP class it is part of). I think what I'm getting at is
that the rules should be defined in one place (avoid code duplication,
ensure definitions are kept up to date as requirements change) even if
they are accessed in more than one place.
The method $formDefinition->isSubmittedDataSane($_POST) could be
implemented by generating, based on the set of fields expected, a spec
for ext/filter. But by the time you've handled all the cases,
implemented a bunch of custom callbacks for unsupported validation
types, and customised the error message slightly, you might as well just
implement the validation yourself.
So the challenge of any built-in filter module is this: if it's not
doing the whole job of form handling and validation, what specific part
of that task is it doing? And how does it fit with common ways of
implementing the rest? Perhaps if we provided a narrower focus, the
API could become simpler and more widely applicable.
For instance, if we set the very narrow aim of "provide an easy-to-use
set of primitive tests for use in a validation filter", we could:
- remove all array handling (assume users are capable of using foreach())
- remove all support for custom filters (a single-variable custom filter
does little more than call_user_func) - simplify the return possibilities (boolean: does this value pass this
test?) - remove some tests that are trivially implemented using other functions
How many of you are against the idea of this RFC?
(I don't think Rowan against basic idea, BTW)
I guess I'm against the idea of the RFC in the sense that it's aim is
too broad: we cannot implement safe validation in the language, we can
only give users the tools to do it. The RFC as it was proposed (and, I
think, ext/filter in general) tries too hard to "do everything for you",
without looking at where it would fit inside a larger application.
Regards,
Rowan Collins
[IMSoP]
Hi Rowan,
3rd issue is location. Input data validation is better to be done as
soon as possible. When application accepts input, programmers
know what the possible inputs, and could cover all inputs. i.e.
Controller is the best place for input format validation.You're mixing two things here, I think: one is when the validation is run
(how soon in the execution pipeline); the other is where it is defined
(which PHP class it is part of). I think what I'm getting at is that the
rules should be defined in one place (avoid code duplication, ensure
definitions are kept up to date as requirements change) even if they are
accessed in more than one place.
This might be the largest difference.
To make something secure than it is now, adding additional security
layer is effective, not single location/code.
Good example is web application firewall(WAF). It's a independent
security layer that does whole bunch of checks for additional
security. WAF is proven to be useful for web app code vulnerabilities
such as JavaScript/SQL injections because it does checks independent
from application code and most apps do very poor validations.
Maintaining WAF rules is not easy task, especially when WAF rules are
white-list based. (All of security guidelines recommend whitelist
based approach.) IMHO, most WAF protections should be implemented in
apps because strict validations with WAF is too hard and too
inefficient.
The method $formDefinition->isSubmittedDataSane($_POST) could be implemented
by generating, based on the set of fields expected, a spec for ext/filter.
But by the time you've handled all the cases, implemented a bunch of custom
callbacks for unsupported validation types, and customised the error message
slightly, you might as well just implement the validation yourself.So the challenge of any built-in filter module is this: if it's not doing
the whole job of form handling and validation, what specific part of that
task is it doing? And how does it fit with common ways of implementing the
rest? Perhaps if we provided a narrower focus, the API could become
simpler and more widely applicable.
My intention is to cover runtime validations required by DbC. DbC
validations are disabled for production systems, but some validations
must be executed at runtime, application level validations at least.
Even if there are some missing parts, the proposal is good enough to
start. IMO. I appreciate suggestions for improvements. It does not
have to be based on current filter module.
For instance, if we set the very narrow aim of "provide an easy-to-use set
of primitive tests for use in a validation filter", we could:
- remove all array handling (assume users are capable of using foreach())
- remove all support for custom filters (a single-variable custom filter
does little more than call_user_func)- simplify the return possibilities (boolean: does this value pass this
test?)- remove some tests that are trivially implemented using other functions
Suppose we have validation module. You are suggesting something like
$int = validate_int($var, $min, $max);
$bool = validate_bool($var, $allowed_bool_types);
// i.e. which type of bool 1/0, yes/no, on/off, true/false is allowed
// This isn't implemented. All of them are valid bools currently.
$str = validate_string($var, $min_len, $max_len);
$str = validate_string_encoding($var, $encoding);
$str = validate_string_chars($var, $allowed_chars);
$str = validate_string_regex($var, $regex);
$str = validate_string_degit($var, $min_len, $max_len);
$str = validate_string_callback($var, $callback);
Although it works, I prefer array definition because it's a lot easier
to write rule and efficient to execute.
$def = [
'int_var' => ['filter'=>FILTER_VALIDTE_INT, 'options'=>[$min, $max]],
'bool_var' => ['filter'=>FILTER_VALIDATE_BOOL,
'options'=>$allowed_bool_types],
'str_var' => [
['filter' => FILTER_VALIDATE_STRING,
'options' =>['min_bytes'=>$min_len, 'max_bytes'=>$max_len]],
['filter' => FILTER_VALIDATE_REGEX,
'options' => ['regex' => $regex]],
['filter' => FILTER_VALIDATE_CALLBACK,
'options' => ['callback' => $callback]],
]
];
$safe_input = filter_require_var_array($input, $def);
You can group definition easily with array. (Multiple filter support
is implemented by my patch) e.g.
$my_str_var_spec = [
['filter' => FILTER_VALIDATE_STRING,
'options' =>['min_bytes'=>$min_len, 'max_bytes'=>$max_len]],
['filter' => FILTER_VALIDATE_REGEX,
'options' => ['regex'=> $regex]],
['filter' => FILTER_VALIDATE_CALLBACK,
'options' =>['callback' => $callback]],
];
then previous definition became
$def = [
'int_var' => ['filter'=>FILTER_VALIDTE_INT,
'options'=>[$min, $max]],
'bool_var' => ['filter'=>FILTER_VALIDATE_BOOL,
'options'=>$allowed_bool_types],
'str_var' => $my_str_var_spec,
];
Rule reuse and centralizing validation rule is easy.
If you would like to build JavaScript validations on client side from
the definition, it's easy to build one because it's simple array
definition, not bunch of functions define validation rules.
How many of you are against the idea of this RFC?
(I don't think Rowan against basic idea, BTW)I guess I'm against the idea of the RFC in the sense that it's aim is too
broad: we cannot implement safe validation in the language, we can only give
users the tools to do it. The RFC as it was proposed (and, I think,
ext/filter in general) tries too hard to "do everything for you", without
looking at where it would fit inside a larger application.
There are many people who use filter module happily, why validation
cannot be implemented? Even external WAF does it. Divide and conquer
(input handling and logic handling), multiple layers of protections
works. We know interface is more stable than logic. Vulnerabilities
introduced often when logic is changed. Input validation can mitigate
risks.
I didn't spend much time for this because I reused filter module
framework/code and didn't do refactoring. If it seemed I tried to
hard, filter module authors worked too hard. I spent more time to
write english rather than code :)
The proposal provides primitive tool, but not too primitive. It does
not handle complex form nor client side JavaScript
validations, but it could be used for these tasks. (I changed PR so
that exception could be optional) Those fancy exciting things are left
to user implementation.
Anyway, we have $_POST/$_GET/$_COOKIE/$_FILES/$_SERVER/$_ENV
as basic inputs. Input validation is #1 requirement for code security.
PHP must have some tool that validates these easy and simple,
yet extensible.
Question would be what kind we'll have?
Simple functions? Different kind of array definition and validator
function? More comprehensive object based? Suggestions are
appreciated. I don't mind implement it from scratch. Idea only
suggestion is welcomed!
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
This might be the largest difference.
To make something secure than it is now, adding additional security
layer is effective, not single location/code.
My instinct is that this extra location would be a maintenance
nightmare, as it would need to keep up with changes elsewhere in the
application. In day to day use, what would make this extra security
layer any more likely to be comprehensively maintained than the existing
validation layer?
Suppose we have validation module. You are suggesting something like
$int = validate_int($var, $min, $max);
$bool = validate_bool($var, $allowed_bool_types);
// i.e. which type of bool 1/0, yes/no, on/off, true/false is allowed
// This isn't implemented. All of them are valid bools currently.
$str = validate_string($var, $min_len, $max_len);
$str = validate_string_encoding($var, $encoding);
$str = validate_string_chars($var, $allowed_chars);
$str = validate_string_regex($var, $regex);
$str = validate_string_degit($var, $min_len, $max_len);
$str = validate_string_callback($var, $callback);
No, I'm suggesting something like:
if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's just
a boolean-returning function
)
{
// ERROR
}
It was a deliberately narrow purpose as an example, but IMHO that's
perfectly readable, and the focus of the module would be on creating a
good set of validation rules, rather than worrying about how they're
applied.
[...] Rule reuse and centralizing validation rule is easy.
The language provides us plenty of ways to reuse code. For instance:
function validate_my_field($var) {
return
validate_int($var, $min, $max)
&& validate_bool($var, $allowed_bool_types)
// etc
}
Regards,
Rowan Collins
[IMSoP]
So, I’m trying to really understand what the goals of this RFC were/are.
Adding a bunch of new functions is IMO the wrong approach to this type of thing.
The existing filter_var/filter_input infrastructure works well, if you want to define more rules I would definitely encourage building on/improving that system not adding a bunch of extra functions.
I would be greatly in favour of adding some of the additional filter constants suggested (e.g. a FILTER_VALIDATE_STRING with the min/max bytes, or better yet, min/max chars based on current charset).
But the new functions whether the originally proposed ones (which seem to just be sugar in place of a userland foreach/array_walk etc with a throw for failed validation) or these more recent suggestions (which seem to be just wrappers around filter_var, no?) make no sense to me.
Cheers
Stephen
This might be the largest difference.
To make something secure than it is now, adding additional security
layer is effective, not single location/code.My instinct is that this extra location would be a maintenance nightmare, as it would need to keep up with changes elsewhere in the application. In day to day use, what would make this extra security layer any more likely to be comprehensively maintained than the existing validation layer?
Suppose we have validation module. You are suggesting something like
$int = validate_int($var, $min, $max);
$bool = validate_bool($var, $allowed_bool_types);
// i.e. which type of bool 1/0, yes/no, on/off, true/false is allowed
// This isn't implemented. All of them are valid bools currently.
$str = validate_string($var, $min_len, $max_len);
$str = validate_string_encoding($var, $encoding);
$str = validate_string_chars($var, $allowed_chars);
$str = validate_string_regex($var, $regex);
$str = validate_string_degit($var, $min_len, $max_len);
$str = validate_string_callback($var, $callback);No, I'm suggesting something like:
if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's just a boolean-returning function
)
{
// ERROR
}It was a deliberately narrow purpose as an example, but IMHO that's perfectly readable, and the focus of the module would be on creating a good set of validation rules, rather than worrying about how they're applied.
[...] Rule reuse and centralizing validation rule is easy.
The language provides us plenty of ways to reuse code. For instance:
function validate_my_field($var) {
return
validate_int($var, $min, $max)
&& validate_bool($var, $allowed_bool_types)
// etc
}Regards,
Rowan Collins
[IMSoP]
Hi Stephen,
Adding a bunch of new functions is IMO the wrong approach to this type of thing.
The existing filter_var/filter_input infrastructure works well, if you want to define more rules I would definitely encourage building on/improving that system not adding a bunch of extra functions.
Do you really think filter module works well as optimal validator?
It cannot enforce even whitelisting well...
What filter module is missing as validator currently are:
- Whitelisting concept (Implemented)
- Multiple rules for a variable (Implemented)
- String rules (Implemented)
- Optional rule (To be implemented. Refactoring is needed)
These are the missing features and cannot be fixed without additional
functions. (W/o modifying current function behaviors)
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Yasuo,
Hi Stephen,
Adding a bunch of new functions is IMO the wrong approach to this type of thing.
The existing filter_var/filter_input infrastructure works well, if you want to define more rules I would definitely encourage building on/improving that system not adding a bunch of extra functions.Do you really think filter module works well as optimal validator?
It’s not perfect, but nothing is. As I said, I believe the issues can largely be resolved by building on the existing functionality.
It cannot enforce even whitelisting well…
VALIDATE_INT already accepts $max and $min options. Those options could be applied to VALIDATE_FLOAT, and $charset, $accepted_chars, $max_len, $min_len could be implemented on a new VALIDATE_STRING filter.
I understand the use-case for multiple validation per input, and for validating multiple inputs, but frankly the way this implements that is both confusing to use, and has a less than ideal error-mode.
The “filter spec” input is an array of arrays of arrays, most of which will also contain an array for ‘options’. To me that’s getting dangerously close to JavaScript’s callback hell for impossible to read code.
The error mode is also not ideal in a real world use case in my opinion. If I am validating a dozen input fields, I do not want to know just the first one that failed. Can you imagine using a web form that made you submit 12 times to tell you each time you got a field wrong, rather than trying to validate them ALL and telling you ALL the errors at once?
Personally I think a better approach is:
- improve/adding to the filters available, and if desired, add extra flags/options e.g, to throw an exception on failure (which, btw was requested via bugs.php.net 6 years ago), to set min/max values for FILTER_VALIDATE_FLOAT, etc.
2a. Leave the multiple rules per input to userland (e.g. dev uses foreach, array_walk, etc on a rules array or what have you)
2b. maybe add an alternative to filter_(input/var)array where it’s 1 input and multiple rules, e.g. filter(input|var)_multiple
If you wanted to follow 2b, I’d suggest perhaps tackling it as a separate RFC - improving what can be validated isn’t necessarily tied to how you define what you want validated.
Cheers
Stephen
What filter module is missing as validator currently are:
- Whitelisting concept (Implemented)
- Multiple rules for a variable (Implemented)
- String rules (Implemented)
- Optional rule (To be implemented. Refactoring is needed)
These are the missing features and cannot be fixed without additional
functions. (W/o modifying current function behaviors)Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Stephen,
Adding a bunch of new functions is IMO the wrong approach to this type of thing.
The existing filter_var/filter_input infrastructure works well, if you want to define more rules I would definitely encourage building on/improving that system not adding a bunch of extra functions.Do you really think filter module works well as optimal validator?
It’s not perfect, but nothing is. As I said, I believe the issues can largely be resolved by building on the existing functionality.
It cannot enforce even whitelisting well…
VALIDATE_INT already accepts $max and $min options. Those options could be applied to VALIDATE_FLOAT, and $charset, $accepted_chars, $max_len, $min_len could be implemented on a new VALIDATE_STRING filter.
But it trims input because it is filter based. For example, int input like
' 1234 '
must be invalid with whitelisting approach, but it's allowed with
current implementation.
I understand the use-case for multiple validation per input, and for validating multiple inputs, but frankly the way this implements that is both confusing to use, and has a less than ideal error-mode.
I agree.
As filter, it works well mostly. As validator, it's unuseable due to
filter like behavior/non whitelisting behaviors.
The “filter spec” input is an array of arrays of arrays, most of which will also contain an array for ‘options’. To me that’s getting dangerously close to JavaScript’s callback hell for impossible to read code.
I can understand your concern. Issue would be callback validator, but
callback nesting would not be needed unlike JS callback hell. Since
it's simple array, content/rule can be viewed easily also.
The error mode is also not ideal in a real world use case in my opinion. If I am validating a dozen input fields, I do not want to know just the first one that failed. Can you imagine using a web form that made you submit 12 times to tell you each time you got a field wrong, rather than trying to validate them ALL and telling you ALL the errors at once?
You can omit validation where you would like. So your concern wouldn't
be problem. (I strongly suggest to validate all inputs for all entry
points, though)
Personally I think a better approach is:
- improve/adding to the filters available, and if desired, add extra flags/options e.g, to throw an exception on failure (which, btw was requested via bugs.php.net 6 years ago), to set min/max values for FILTER_VALIDATE_FLOAT, etc.
2a. Leave the multiple rules per input to userland (e.g. dev uses foreach, array_walk, etc on a rules array or what have you)
2b. maybe add an alternative to filter_(input/var)array where it’s 1 input and multiple rules, e.g. filter(input|var)_multipleIf you wanted to follow 2b, I’d suggest perhaps tackling it as a separate RFC - improving what can be validated isn’t necessarily tied to how you define what you want validated.
Thank you for the suggestion. There are issues this approach.
- Existing validator does not perform strict validations. We need new
validator rules. e.g.
FILTER_VALIDATE_STRICT_FLOAT
- Do not allow white space prefix/postfix, raise exception.
FILTER_VALIDATE_STRICT_INT - Do not allow white space prefix/postfix, raise exception.
Having
FILTER_VALIDATE_STRICT_FLOAT
and
FILTER_VALIDATE_FLOAT
would be problematic.
Current filter functions fallback to FILTER_UNSAFE_RAW
when something
goes wrong which is unacceptable for validator.
2a. Letting user to use foreach will results in the same, at least
similar, rule definition array. Please take a look at the last reply
to Rowan's post.
2b. This sounds good. I should think about implementation. 2a issue
that result in the same or similar array definition remains though.
Thank you for suggestions!
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Stephen,
Having
FILTER_VALIDATE_STRICT_FLOAT
and
FILTER_VALIDATE_FLOAT
would be problematic.
I forgot to mention filter module uses 32 bit int for filter flags. It
means we have only up to 31 filters. We don't have much space for new
filter flags.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Yasuo,
Hi Stephen,
Adding a bunch of new functions is IMO the wrong approach to this type of thing.
The existing filter_var/filter_input infrastructure works well, if you want to define more rules I would definitely encourage building on/improving that system not adding a bunch of extra functions.Do you really think filter module works well as optimal validator?
It’s not perfect, but nothing is. As I said, I believe the issues can largely be resolved by building on the existing functionality.
It cannot enforce even whitelisting well…
VALIDATE_INT already accepts $max and $min options. Those options could be applied to VALIDATE_FLOAT, and $charset, $accepted_chars, $max_len, $min_len could be implemented on a new VALIDATE_STRING filter.
But it trims input because it is filter based. For example, int input like
' 1234 '
must be invalid with whitelisting approach, but it's allowed with
current implementation.I understand the use-case for multiple validation per input, and for validating multiple inputs, but frankly the way this implements that is both confusing to use, and has a less than ideal error-mode.
I agree.
As filter, it works well mostly. As validator, it's unuseable due to
filter like behavior/non whitelisting behaviours.
As you mentioned, there isn’t much space left for new flags (about 8 slots left by my count?) but surely a FILTER_FLAG_ALLOW_WHITESPACE could be utilised by multiple filters to achieve the same thing, while still allowing for legacy behaviour.
The “filter spec” input is an array of arrays of arrays, most of which will also contain an array for ‘options’. To me that’s getting dangerously close to JavaScript’s callback hell for impossible to read code.
I can understand your concern. Issue would be callback validator, but
callback nesting would not be needed unlike JS callback hell. Since
it's simple array, content/rule can be viewed easily also.
Sorry, maybe I wasn’t clear. My issue wasn’t with the callback filter specifically. Callback hell in Javascript is the issue where you have a lot of nested callbacks, which can make the code quite difficult to read, even with indenting, matching brace highlighting etc.
My concern here is that you’d have a very deeply nested data structure essentially to avoid a user-space loop.
The error mode is also not ideal in a real world use case in my opinion. If I am validating a dozen input fields, I do not want to know just the first one that failed. Can you imagine using a web form that made you submit 12 times to tell you each time you got a field wrong, rather than trying to validate them ALL and telling you ALL the errors at once?
You can omit validation where you would like. So your concern wouldn't
be problem. (I strongly suggest to validate all inputs for all entry
points, though)
Again, I apologise, maybe I wasn’t clear.
Let’s say I expect to get 5 posted parameters (from a form, a direct http api call, etc). 3 are required, of those 1 must be an email, and the other two must match (e.g. password/confirm) and have a custom callback to match (e.g. to prevent ‘aaaaaa’). the other two are optional, but have a maximum length of 45 characters each.
Now, I want to validate all those fields, and give meaningful error messages back to the user.
With your proposal, if they have made provided somehow invalid data in all five fields, they will have to make at least 6 http requests (assuming each time they get an error response they manage to fix that field on their first re-try).
I can definitely see benefit in allowing exceptions when validation fails. I use a validation system that does something similar internally, but it includes catches so effectively, validating an request with 5 inputs will potentially give you a single exception, which then gives access to the error encountered for each input (which is also expressed as an exception)
Personally I think a better approach is:
- improve/adding to the filters available, and if desired, add extra flags/options e.g, to throw an exception on failure (which, btw was requested via bugs.php.net 6 years ago), to set min/max values for FILTER_VALIDATE_FLOAT, etc.
2a. Leave the multiple rules per input to userland (e.g. dev uses foreach, array_walk, etc on a rules array or what have you)
2b. maybe add an alternative to filter_(input/var)array where it’s 1 input and multiple rules, e.g. filter(input|var)_multipleIf you wanted to follow 2b, I’d suggest perhaps tackling it as a separate RFC - improving what can be validated isn’t necessarily tied to how you define what you want validated.
Thank you for the suggestion. There are issues this approach.
- Existing validator does not perform strict validations. We need new
validator rules. e.g.FILTER_VALIDATE_STRICT_FLOAT
- Do not allow white space prefix/postfix, raise exception.
FILTER_VALIDATE_STRICT_INT- Do not allow white space prefix/postfix, raise exception.
Having
FILTER_VALIDATE_STRICT_FLOAT
and
FILTER_VALIDATE_FLOAT
would be problematic.
As I mentioned above, I would imagine a single new flag could be used to make “all” filters have the strict behaviour. Presumably FILTER_VALIDATE_BOOLEAN
should treat ‘ true ‘ as an error too?
Current filter functions fallback to
FILTER_UNSAFE_RAW
when something
goes wrong which is unacceptable for validator.
I’m not sure I understand this. What do you mean by “when something goes wrong”. Can you elaborate on that?
2a. Letting user to use foreach will results in the same, at least
similar, rule definition array. Please take a look at the last reply
to Rowan's post.2b. This sounds good. I should think about implementation. 2a issue
that result in the same or similar array definition remains though.
Not necessarily. While they could be in the format you specified, they could also be defined individually or defined inline.
I really believe that combining multiple fields and multiple rule definitions complicates things, a lot.
Making improvements to the filter system but only for this new “complicated” function feels like it goes against your intent of trying to make it easier for developers to make safe(r) applications.
Cheers
Stephen
Thank you for suggestions!
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Rowan,
This might be the largest difference.
To make something secure than it is now, adding additional security
layer is effective, not single location/code.My instinct is that this extra location would be a maintenance nightmare, as
it would need to keep up with changes elsewhere in the application. In day
to day use, what would make this extra security layer any more likely to be
comprehensively maintained than the existing validation layer?
You are right and wrong.
WAF(Web application firewall) maintenance that is customized for
certain app is nightmare, I fully agree.
Dividing input validations and logic checks works. These two are
focusing on different objectives, format and logic. Finding
programmer's mistakes is a lot easier with software built-in input
validation, unlike WAF, because programmer knew what's the inputs.
UNIT tests help also. That's the reasons why it works much better
than WAF approach.
Suppose we have validation module. You are suggesting something like
$int = validate_int($var, $min, $max);
$bool = validate_bool($var, $allowed_bool_types);
// i.e. which type of bool 1/0, yes/no, on/off, true/false is allowed
// This isn't implemented. All of them are valid bools currently.
$str = validate_string($var, $min_len, $max_len);
$str = validate_string_encoding($var, $encoding);
$str = validate_string_chars($var, $allowed_chars);
$str = validate_string_regex($var, $regex);
$str = validate_string_degit($var, $min_len, $max_len);
$str = validate_string_callback($var, $callback);No, I'm suggesting something like:
if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's
just a boolean-returning function
)
{
// ERROR
}
My opinion is "it's better to separate this kind of required format
check from logic" to make logic simpler and more maintainable.
Think of this format check as "forcing types". Even if you force DATE
type, you still might have to check logical errors in logic(model).
e.g. Reservation date for a service can only be set within certain
range.
If you're on the same boat as me. You don't have to return anything,
but catch exception. Above code would be
<?php
$validate_my_var = function($var) use ($callback) {
validate_int($var, $min, $max);
validate_bool($var, $allowed_bool_types);
validate_string($var, $min_len, $max_len);
validate_string_encoding($var, $encoding);
validate_string_chars($var, $allowed_chars);
validate_string_regex($var, $regex);
validate_string_degit($var, $min_len, $max_len);
$callback($var);
}
simply call it for a variable.
I would use array of definition as you described in previous mail, too.
$validation_def = [
'my_var' => $validate_my_var,
'some_var' => $validate_some_var,
'another_var' => $validate_another_var,
];
function validate_array($input, $definition) {
foreach($definition as $key => $func) {
// Cannot handle "optional" rule well with this code.
$func($input[$key]);
}
}
try {
validate_array($_POST, $validation_def);
} catch (InputValidationException $e) {
// cleanup and die
}
?>
It seems a lot like proposed code with a lot more typing. IMO.
However, I'm OK with this kind of implementation, too.
i.e. Providing basic validate_*() functions.
Half implemented filter module validator problem remains, though.
It was a deliberately narrow purpose as an example, but IMHO that's
perfectly readable, and the focus of the module would be on creating a good
set of validation rules, rather than worrying about how they're applied.[...] Rule reuse and centralizing validation rule is easy.
The language provides us plenty of ways to reuse code. For instance:
function validate_my_field($var) {
return
validate_int($var, $min, $max)
&& validate_bool($var, $allowed_bool_types)
// etc
OK. I understood you prefer simple validation functions solution.
PHP is generic programming language, so most features could be
implemented by basic constructs. Code reuse works with simple
functions. However, how about definition reuse? It cannot be reused
easily to generate JavaScript validation code, for example. If example
code made definition reusable, it would look more like proposed code.
We have working filter, and half implemented validator by filter
module. Leaving it as it is now is not good, IMHO. Why not finish
validator implementation and provide ready to use tool? Resulting code
would be similar to simple validation functions implementation,
somewhat at least, anyway.
What filter module is missing as validator currently are:
- Whitelisting concept (Implemented)
- Multiple rules for a variable (Implemented)
- String rules (Implemented)
- Optional rule (To be implemented. Refactoring is needed)
My PR implements above 3 out of 4 mandatory features to filter module
validation.
I still fail to see what's wrong for improving/finishing filter module
implementation and what's wrong in my improvement proposal.
Suggestions are appreciated always!
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
No, I'm suggesting something like:
if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's
just a boolean-returning function
And I am looking for some way of packaging that into something I can
read and write dynamically for each $var ...
$var->set_validation_rules($rules); And $rules is going to be an array
of items which can then be used for related parallel activities such as
populating the browser validation.
So the above script is replaced by $var->is_valid(); or if you prefer it
throws an exception when you try and set the variable with an invalid
input ( or one that does not match a 'strict' rule ).
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi Lester,
No, I'm suggesting something like:
if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's
just a boolean-returning functionAnd I am looking for some way of packaging that into something I can
read and write dynamically for each $var ...
This could be done by convention rather than configuration.
You need some rule for variable names. If var name is ID, it must be
numeric string always for example.
Convention is developer defined rule, so this is left to developer how to
do it.
$var->set_validation_rules($rules); And $rules is going to be an array
of items which can then be used for related parallel activities such as
populating the browser validation.So the above script is replaced by $var->is_valid(); or if you prefer it
throws an exception when you try and set the variable with an invalid
input ( or one that does not match a 'strict' rule ).
I think convention rather than configuration works. However, not all checks
should/can be done by model because model treats data related to the
model leave other vars behind. Leftover could be cause of vulnerabilities.
IIRC, Magento had vulnerability that allows malicious access due to
internal redirects. This kind of problem can be mitigated by strict
input validation at the time inputs are accepted.
Anyway, your way would work with autoboxing.
https://wiki.php.net/rfc/autoboxing
and this proposal.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
No, I'm suggesting something like:
if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's
just a boolean-returning functionAnd I am looking for some way of packaging that into something I can
read and write dynamically for each $var ...$var->set_validation_rules($rules); And $rules is going to be an array
of items which can then be used for related parallel activities such as
populating the browser validation.So the above script is replaced by $var->is_valid(); or if you prefer it
throws an exception when you try and set the variable with an invalid
input ( or one that does not match a 'strict' rule ).Anyway, your way would work with autoboxing.
https://wiki.php.net/rfc/autoboxing
and this proposal.
And it can even work without autoboxing; just wrap the scalars in
objects manually.
--
Christoph M. Becker
And it can even work without autoboxing; just wrap the scalars in
objects manually.
And we come full circle. YES everybody can add their own user land
wrappers to do this, but if code is built into the core to provide a
standard to work with then we don't have everybody re-inventing the
wheel. And there is no need to 'Add validation functions to filter
module' simply because that code already exists in the right place ...
wrapping the base variable.
Some of the elements of proper variable validation have been squeezed in
via the 'strict mode, and other soft type rules, where a simple
expansion to a basic element will cover just about everything everybody
is demanding? We don't need a strictly typed language but if strict
rules can be added to the core validation functions then we have much
better flexability to use them or not, and none of the debate of where
'strict mode=1' enables it. Just as a read_only rule gets rid of the
need for yet another whole family of classes.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Why shouldn't we have more secure validation?
No argument about that ... only that ALL validation requires rules. If
you have rules for preventing 'injection attacks' they only need to be
applied to data that could allow that injection to be carried forward.
If I expect a valid email address, and the string supplied is not a
valid email address, then I kill anything that is provided instead.
The legacy code which I have had validation problems with have
basically just been poor design from simply mirroring the post data to a
new URL if they want to use some third party service. Heavy handed
filtering of injection paths also kill the data that the silly clone
mirroring can't be bothered to filter properly. Convincing others that
the correct approach IS to filter data properly is an up hill struggle
when they can't be bothered to learn the interface to the service they
are bouncing over to. "It's too difficult to maintain as the API's will
keep changing". But if PHP has a set of base rules that can be applied
in parallel to the same rules browser space, then one can simplify the
processing elements that can then be mirrored cleanly, or halted if the
material needed to create the mirror is no longer valid.
Taking care of tampered data by business logic will reduce both
readability and maintainability. And more importantly, make code
less secure because programmers tend to focus on logic
in model, not input data validations.
That one has a packet of data validated in the browser which one is now
processing in the server and it is subject to tampering is the extra
validation you are talking about. How do you distinguish between what
was valid, but has now been contaminated without also checking that the
expected strings ARE still valid?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk