8 years ago by Yasuo Ohgaki — view source

unread

Hi all,

This RFC is to add functions that are suitable for input validations
for secure coding. IMHO, these additions are mandatory for PHP.

https://wiki.php.net/rfc/add_validate_functions_to_filter
Vote ends 2016/08/22 23:59:59 UTC

I don't mind suspend vote and continue discussion if there is issue.
It's rather long RFC. Thank you for reading and voting!

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

This RFC is to add functions that are suitable for input validations
for secure coding. IMHO, these additions are mandatory for PHP.

https://wiki.php.net/rfc/add_validate_functions_to_filter
Vote ends 2016/08/22 23:59:59 UTC

I don't mind suspend vote and continue discussion if there is issue.
It's rather long RFC. Thank you for reading and voting!

Note for voting.
There are 2 votes for RFC acceptance and target PHP version.

https://wiki.php.net/rfc/add_validate_functions_to_filter#proposed_voting_choices

You have to vote twice. i.e. It cannot store 2 votes results at once.
I do make this mistake. Please be careful!

Thank you for voting!

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

Hi all,

This RFC is to add functions that are suitable for input validations
for secure coding. IMHO, these additions are mandatory for PHP.

https://wiki.php.net/rfc/add_validate_functions_to_filter
Vote ends 2016/08/22 23:59:59 UTC

I don't mind suspend vote and continue discussion if there is issue.
It's rather long RFC. Thank you for reading and voting!

Note for voting.
There are 2 votes for RFC acceptance and target PHP version.

https://wiki.php.net/rfc/add_validate_functions_to_filter#proposed_voting_choices

You have to vote twice. i.e. It cannot store 2 votes results at once.
I do make this mistake. Please be careful!

Thank you for voting!

One more usual request.
Please describe reason(s) why you object proposal.
I would like to improve proposal even when it is declined. In addition,
I see votes based on misconception/misunderstanding on occasions.

Thank you!

P.S. I've missed last PR update in the RFC. RFC is fixed.
Exception can be disabled by a option now.

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Dan Ackroyd — view source

unread

Hi Yasuo,

One more usual request.
Please describe reason(s) why you object proposal.

I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.

But for posterity:

i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.

ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.

iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.

iv) The RFC makes an assumption that programs should exit when validation fails.

"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."

and the code example:

catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when input validation fails
}

This assumption is bogus.

Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.

v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.

cheers
Dan

For the record - these are what my input validation functions look
like. They are bespoke to the application, and provide useful error
messages to the end user when an exception handler catches that
specific exception to a 4xx HTTP response.

function validateOrderAmount($value) : int {
$count = preg_match("/[^0-9]*/", $value);

 if ($count) {
    throw new InvalidOrderAmount("Der Wert muss nur Ziffern enthalten.");
 }

$value = intval($value);

if ($value < 1) {
    throw new InvalidOrderAmount("Der Wert muss eine oder mehrere sein .");
}

if ($value >= MAX_ORDER_AMOUNT) {
    throw new InvalidOrderAmount("Sie können nur

".MAX_ORDER_AMOUNT." auf einmal bestellen ");
}

return $value;

}

8 years ago by Yasuo Ohgaki — view source

unread

Hi Dan,

Thank you for sharing idea!

One more usual request.
Please describe reason(s) why you object proposal.

I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.

Without feedback, there is no clue which way should go or improve.
I didn't realize some of your idea and I think your feedback is great.

But for posterity:

i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.

When exception is enabled, offensive key name is written in exception message.

When exception is disabled, your statement is true. This could be improved.
Good feedback.

ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.

It seems there is misunderstanding.
These new functions are intended for "secure coding input validation" that
should never fail. It means something unexpected in input data that
cannot/shouldn't keep program running. Why do you need to parse
message?

All needed info, filter name, key and value, is in exception message and
exception object, BTW.

This one is good feedback, too.
I appreciate better error message suggestions.

iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.

I thought I don't have to have example of userland implementation, so
it's good feedback also.

Typical OO implementation uses number of setters to define validation rules.
In addition, it validates validation rule is OK for it. e.g. It will
check input data type at least. Setters and validation rule validation
makes execution slower obviously.

One may optimize validation rules to plain array (like I do).
In this case, performance is could be better than previously mentioned
validators do all in the production environment.

I also thought the performance issue is not much important because
there is no PHP feature to compare. All of us knew PHP function call
overheads are relatively large and proposed almost all in C implementation
would be faster than userland.

iv) The RFC makes an assumption that programs should exit when validation fails.

"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."

and the code example:

catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when input validation fails
}

This assumption is bogus.

Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.

There is misunderstanding on this.
As I wrote explicitly in the RFC, input validation and user input
mistakes must be handled differently.

"The input validation (or think it as assertion or requirement) error"
that this RFC is dealing, is should never happen conditions (or think
it as contract should never fail).

The point of having the input validation is accept only inputs that
program expects and can work correctly. Accepting unexpected
data that program cannot work correctly is pointless.

Don't misunderstood me. I'm not saying "You should reject user input mistakes".
"User input mistakes" and "input validation error" is totally different error.

v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.

I didn't recommend it either because it could not be used for input
validation easily, escaping or sanitization could be done for
dedicated API.

Having new module is one of my idea also. However, I realized many of
filter module codes could be reused after investigation. That's the
reason why I added to filter module. I also named new functions to
have "validate_" prefix, rather than "filter_" to emphasis it's
for validations. I renamed them to "filter_*" to comply CODING_STANDARDS.

This feedback is great because I'm worrying about the same thing.
Please feedback this kind of things during discussion so that I can
do something on issues.

Thank you for comments.
I think it's very helpful for improvements!

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

Hi Dan,

Thank you for sharing idea!

One more usual request.
Please describe reason(s) why you object proposal.

I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.

Without feedback, there is no clue which way should go or improve.
I didn't realize some of your idea and I think your feedback is great.

But for posterity:

i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.

When exception is enabled, offensive key name is written in exception message.

When exception is disabled, your statement is true. This could be improved.
Good feedback.

ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.

It seems there is misunderstanding.
These new functions are intended for "secure coding input validation" that
should never fail. It means something unexpected in input data that
cannot/shouldn't keep program running. Why do you need to parse
message?

All needed info, filter name, key and value, is in exception message and
exception object, BTW.

This one is good feedback, too.
I appreciate better error message suggestions.

iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.

I thought I don't have to have example of userland implementation, so
it's good feedback also.

Typical OO implementation uses number of setters to define validation rules.
In addition, it validates validation rule is OK for it. e.g. It will
check input data type at least. Setters and validation rule validation
makes execution slower obviously.

One may optimize validation rules to plain array (like I do).
In this case, performance is could be better than previously mentioned
validators do all in the production environment.

I also thought the performance issue is not much important because
there is no PHP feature to compare. All of us knew PHP function call
overheads are relatively large and proposed almost all in C implementation
would be faster than userland.

iv) The RFC makes an assumption that programs should exit when validation fails.

"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."

and the code example:

catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when input validation fails
}

This assumption is bogus.

Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.

There is misunderstanding on this.
As I wrote explicitly in the RFC, input validation and user input
mistakes must be handled differently.

"The input validation (or think it as assertion or requirement) error"
that this RFC is dealing, is should never happen conditions (or think
it as contract should never fail).

The point of having the input validation is accept only inputs that
program expects and can work correctly. Accepting unexpected
data that program cannot work correctly is pointless.

Don't misunderstood me. I'm not saying "You should reject user input mistakes".
"User input mistakes" and "input validation error" is totally different error.

v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.

I didn't recommend it either because it could not be used for input
validation easily, escaping or sanitization could be done for
dedicated API.

Having new module is one of my idea also. However, I realized many of
filter module codes could be reused after investigation. That's the
reason why I added to filter module. I also named new functions to
have "validate_" prefix, rather than "filter_" to emphasis it's
for validations. I renamed them to "filter_*" to comply CODING_STANDARDS.

This feedback is great because I'm worrying about the same thing.
Please feedback this kind of things during discussion so that I can
do something on issues.

Thank you for comments.
I think it's very helpful for improvements!

Hi Dan,

I don't mind suspend vote for a while to resolve issues if there
should be changes in the RFC. I also don't mind adding missing
features, e.g. helpful error messages when exception is disabled, to
my todo list. BTW, I'll document basic idea of secure coding and
emphasize how it should be done, so misuse would be few.

It seems you aren't objecting the idea itself.
Could you give more feedback what's missing to change your vote?

Thank you!

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Dan Ackroyd — view source

unread

Hi Dan,

I don't mind suspend vote for a while ...

It seems you aren't objecting the idea itself.

I do object to the concept of the RFC. In my first point I said it belongs in userland.

And I strongly object to the idea of stopping and starting voting on RFCS. Please leave the vote open and if it fails take some time to think about the feedback.

If then, in at least 3 months time, you think you can improve the RFC significantly, then reintroduce the idea.

cheers
Dan

8 years ago by Pierre Joye — view source

unread

I don't mind suspend vote for a while to resolve issues if there
should be changes in the RFC. I also don't mind adding missing
features, e.g. helpful error messages when exception is disabled, to
my todo list. BTW, I'll document basic idea of secure coding and
emphasize how it should be done, so misuse would be few.

There is no such thing as suspended vote. The vote has to restart if there
are changes.

8 years ago by Yasuo Ohgaki — view source

unread

Hi Pierre,

I don't mind suspend vote for a while to resolve issues if there
should be changes in the RFC. I also don't mind adding missing
features, e.g. helpful error messages when exception is disabled, to
my todo list. BTW, I'll document basic idea of secure coding and
emphasize how it should be done, so misuse would be few.

There is no such thing as suspended vote. The vote has to restart if there
are changes.

Thank you.
I'll restart vote if I have to make changes, other than more items in
discussion section.
Anyone would change votes if I rename functions to avoid possible
confusions? There is one opinion for better names so far.

It seems either "People misunderstand secure coding" and/or "People consider
validation codes should be in userland fully". (Or "No more additions
for filter module"?)

Although I do think repetitive validations for browser's request
headers/inputs in userland is awfully inefficient, efficiency is not
1st priority of security anyway.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Dan Ackroyd — view source

unread

I'll restart vote if I have to make changes,

Yasuo,

No one is asking you to make changes to the RFC. Just leave the voting
open, and then close it on the agreed date.

You would be violating the agreed RFC process* by closing a vote
early, just so that you an resubmit a proposal straight away.
https://wiki.php.net/rfc/voting

Additionally, you seem to completely have ignored this:

Dan Ackroyd wrote:

And I strongly object to the idea of stopping and starting voting on RFCS. Please leave the vote open and if it fails take some time to think about the feedback.

It would benefit everyone if you stopped responding immediately and
instead took time to actually think about what people have been
saying. This RFC isn't going to be in PHP 7.1, so it is fine to wait 3
months to present a new version of the RFC.

cheers
Dan

"Resurrecting Rejected Proposals

In order to save valuable time, it will not be allowed to bring up a
rejected proposal up for another vote, unless one of the following
happens:

6 months pass from the time of the previous vote, OR The author(s)
make substantial changes to the proposal."

8 years ago by Yasuo Ohgaki — view source

unread

Hi Dan,

I understood about RFC process.

Additionally, you seem to completely have ignored this:

Dan Ackroyd wrote:

And I strongly object to the idea of stopping and starting voting on RFCS. Please leave the vote open and if it fails take some time to think about the feedback.

It would benefit everyone if you stopped responding immediately and
instead took time to actually think about what people have been
saying. This RFC isn't going to be in PHP 7.1, so it is fine to wait 3
months to present a new version of the RFC.

It seems I've marked "already read" by mistake.
Thank you for reminding.
I got that you prefer userland implementation.

I'm planning to propose "Filter module deprecation" when this RFC
is declined, because current validation filter is not good enough to
do the job and makes situation worse than better... If deprecation
RFC is declined also, then I might try to improve this RFC again.

BTW, I cannot guess the reason behind "no" votes. I can guess
reasons for people participating discussions, though. Even when
RFC author could guess the reason, it would be nicer for voters
and author if one explains the reason why vote "no" in vote thread.
Explicit description is better than guess, IMHO. Besides, unlike
you, there are many people do not left any clue.

For example, I completely fail to understand the reason why
"Enable session.use_strict_mode by default" and "Precise Session
Management" RFC is declined. These are mandatory for session
security and not a matter of preference, but do it and/or how to do it.

If one fails to see why it is mandatory, should ask why. If one
think "it must be more efficient", then should insist patch
improvement. If one think proposal is wrong, then should point
out what's wrong. IMO. If opinion is the same, should mention
"Same here"/"Agree" at least.

It's okay to say "let's ignore such security issues" or "let it users
responsibility to secure session", but his/her opinion should be
expressed. It's not a political vote, but technical vote after all.

I guess most people voted "no" for
"Enable session.use_strict_mode by default" and "Precise Session
Management" is based on wrong assumption.

For this vote, I'm guessing preferences are strongly affected,
filter module nature and patch quality. The code is messy because
I didn't refactor code to minimize changes. It's still a guess,
though.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Tony Marston — view source

unread

"Dan Ackroyd" wrote in message
news:CA+kxMuRiOBQpmTeKqNyV8rX0GKCLrYixi--y5TcYUkdqpT746w@mail.gmail.com...

Hi Yasuo,

One more usual request.
Please describe reason(s) why you object proposal.

I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.

But for posterity:

i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.

I agree 100%

ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.

I agree 100%

iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.

iv) The RFC makes an assumption that programs should exit when validation
fails.

"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."

I DISagree 100%. Validation errors should NEVER terminate the program, they
should continue by displaying all the error messages to the user so that
he/she can correct his/her mistake and try the operation again.

and the code example:

catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when
input validation fails
}

This assumption is bogus.

Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.

I agree 100%

v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.

I agree 100%

--
Tony Marston

8 years ago by Christoph M. Becker — view source

unread

"Dan Ackroyd" wrote in message
news:CA+kxMuRiOBQpmTeKqNyV8rX0GKCLrYixi--y5TcYUkdqpT746w@mail.gmail.com...

"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."

I DISagree 100%. Validation errors should NEVER terminate the program,
they should continue by displaying all the error messages to the user so
that he/she can correct his/her mistake and try the operation again.

Yasuo (who Dan quoted here) refers to completely invalid input, such as
invalid UTF-8 byte sequences. I think, that in this case the app should
bail out without even given detailed information, as such grossly
invalid input most likely is an attempt to attack (or a severe browser bug).

--
Christoph M. Becker

8 years ago by Stanislav Malyshev — view source

unread

Hi!

Yasuo (who Dan quoted here) refers to completely invalid input, such as
invalid UTF-8 byte sequences. I think, that in this case the app should
bail out without even given detailed information, as such grossly
invalid input most likely is an attempt to attack (or a severe browser bug).

I personally am not a big fan of "bail out without giving information",
unless that information somehow crosses security boundary (e.g.
displaying PHP error messages in production) or reveals unnecessary info
(this part is super-tricky in crypto, but ouside of crypto common sense
is usually not a bad guide).

Assume indeed you have a buggy release of Firefox that produces invalid
UTF-8 when your language is set to Hindi (this is almost true story btw,
I've seen bug not exactly that but somewhat similar). Now assume you get
a message from the user "all our office can not use your application
since new version was deployed!" and you walk the user through and it
indeed bails out, no additional info. How you debug that? You don't know
Hindi is the culprit. You may not have access to that office's
environment. Your users can't help much but scream "get our app working
again, we're losing money here!". And of course it works for you when
you try it and best time to talk to them is 4am on your side.

Now, how much easier your life would be if you app would just report
"invalid UTF-8 sequence encountered in parameter FirstName" before
bailing out? How many hours, pulled out hairs and 4am sessions would it
save? I think it's worth considering.

--
Stas Malyshev
smalyshev@gmail.com

8 years ago by Christoph M. Becker — view source

unread

Hi!

Yasuo (who Dan quoted here) refers to completely invalid input, such as
invalid UTF-8 byte sequences. I think, that in this case the app should
bail out without even given detailed information, as such grossly
invalid input most likely is an attempt to attack (or a severe browser bug).

I personally am not a big fan of "bail out without giving information",
unless that information somehow crosses security boundary (e.g.
displaying PHP error messages in production) or reveals unnecessary info
(this part is super-tricky in crypto, but ouside of crypto common sense
is usually not a bad guide).
<snip>
Now, how much easier your life would be if you app would just report
"invalid UTF-8 sequence encountered in parameter FirstName" before
bailing out? How many hours, pulled out hairs and 4am sessions would it
save? I think it's worth considering.

I once introduced a check erroring with "Malformed UTF-8 detected" to a
CMS. Until that was changed to "Bad request. Please <a href=".">try
again</a>.", we got a lot of support requests from confused users who
had bookmarked URLs with ISO-8859-* query strings. Even pointing out
which parameter was the culprit, wouldn't have changed that, I presume.

Of course, it makes sense to log very detailed information in this
case (amongst others, the byte sequence that was malformed), but
presenting them to visitors doesn't seem to be helpful – most of these
wouldn't even know what UTF-8 is.

--
Christoph M. Becker

8 years ago by Yasuo Ohgaki — view source

unread

Hi Christoph,

Yasuo (who Dan quoted here) refers to completely invalid input, such as
invalid UTF-8 byte sequences. I think, that in this case the app should
bail out without even given detailed information, as such grossly
invalid input most likely is an attempt to attack (or a severe browser bug).

I personally am not a big fan of "bail out without giving information",
unless that information somehow crosses security boundary (e.g.
displaying PHP error messages in production) or reveals unnecessary info
(this part is super-tricky in crypto, but ouside of crypto common sense
is usually not a bad guide).
<snip>
Now, how much easier your life would be if you app would just report
"invalid UTF-8 sequence encountered in parameter FirstName" before
bailing out? How many hours, pulled out hairs and 4am sessions would it
save? I think it's worth considering.

I once introduced a check erroring with "Malformed UTF-8 detected" to a
CMS. Until that was changed to "Bad request. Please <a href=".">try
again</a>.", we got a lot of support requests from confused users who
had bookmarked URLs with ISO-8859-* query strings. Even pointing out
which parameter was the culprit, wouldn't have changed that, I presume.

Of course, it makes sense to log very detailed information in this
case (amongst others, the byte sequence that was malformed), but
presenting them to visitors doesn't seem to be helpful – most of these
wouldn't even know what UTF-8 is.

Excellent example of input validation exception!
Software has history, therefore certain validation cannot be done automatically.

For the record, many security standards/guides require to "Canonicalize"
input data before input validation. If anyone would like to validate
"String", canonicalize first.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

Hi Tony,

Allow me to top post.

"The input validation" is not for legitimate users, but for attackers.
You shouldn't help attackers by explaining what/how wrong in attackers' inputs.

I've added discussion "Input validation and User input mistake
handling difference"
https://wiki.php.net/rfc/add_validate_functions_to_filter#input_validation_and_user_input_mistake_handling_difference

Please refer to the section for distinction.

BTW, the input validation that I'm proposing here is
required/recommended feature by ISO 27000/ISMS. Why shouldn't PHP
provide features that is needed to implement ISO 27000/ISMS
requirements?

"Dan Ackroyd" wrote in message
news:CA+kxMuRiOBQpmTeKqNyV8rX0GKCLrYixi--y5TcYUkdqpT746w@mail.gmail.com...

Hi Yasuo,

One more usual request.
Please describe reason(s) why you object proposal.

I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.

But for posterity:

i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.

I agree 100%

ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.

I agree 100%

iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.

iv) The RFC makes an assumption that programs should exit when validation
fails.

"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."

I DISagree 100%. Validation errors should NEVER terminate the program, they
should continue by displaying all the error messages to the user so that
he/she can correct his/her mistake and try the operation again.

and the code example:

catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when
input validation fails
}

This assumption is bogus.

Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.

I agree 100%

v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.

I agree 100%

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

Allow me to top post.

"The input validation" is not for legitimate users, but for attackers.
You shouldn't help attackers by explaining what/how wrong in attackers' inputs.

What is expected as 'post' data input is defined when building the page.
That some people will intercept the page and try to use it to inject
'invalid' data in an attempt to perhaps gain access to data is a
separate problem, but still part of the validation process. One of the
hacks I had to deal with recently was simply an xss hole because nobody
filtered or trimmed the username. So you could just type what you
wanted. Simply add a suitable pattern to the html5 validation and the
casual hacker is averted ... but how many PHP examples actually use html5?

Of cause someone can build their own result set and bypass the browser
validation. Which is where some cleaver use of javascript might help to
add a security check to the submit packet. Outside PHP, but still part
of the overall picture. In any case once the get/post array is in PHP
there is a need to recheck everything once again and while the average
user may not happy simply to bounce the page if the username field now
has an invalid imput, other systems will want to log the attempt and
perhaps capture any source information. White screen crashes because
someone has broken the data can be difficult to unravel especially when
it's some consented effort to get in ... in my case someone trying every
possible Mysql hack against firebird :( So I end up with extra code to
filter the attack attempt and that tends to have to be at the variable
level.

It can be useful to give feedback simply to get them to give up without
an explanation why. Simply crashing the page means they try the next
option until they do get a response ...

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Tony Marston — view source

unread

"Yasuo Ohgaki" wrote in message
news:CAGa2bXZjgggpJsVXQMDJMqvnpTBUCAhazBwjRtBiPsb-BohQnQ@mail.gmail.com...

Hi Tony,

Allow me to top post.

"The input validation" is not for legitimate users, but for attackers.
You shouldn't help attackers by explaining what/how wrong in attackers'
inputs.

If your RFC is about preventing attacks then it should be labelled as such.
Input validation is a totally separate issue.

--
Tony Marston

I've added discussion "Input validation and User input mistake
handling difference"
https://wiki.php.net/rfc/add_validate_functions_to_filter#input_validation_and_user_input_mistake_handling_difference

Please refer to the section for distinction.

BTW, the input validation that I'm proposing here is
required/recommended feature by ISO 27000/ISMS. Why shouldn't PHP
provide features that is needed to implement ISO 27000/ISMS
requirements?

On Mon, Aug 15, 2016 at 8:00 PM, Tony Marston TonyMarston@hotmail.com
wrote:

"Dan Ackroyd" wrote in message
news:CA+kxMuRiOBQpmTeKqNyV8rX0GKCLrYixi--y5TcYUkdqpT746w@mail.gmail.com...

Hi Yasuo,

One more usual request.
Please describe reason(s) why you object proposal.

I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.

But for posterity:

i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.

I agree 100%

ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.

I agree 100%

iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.

iv) The RFC makes an assumption that programs should exit when
validation
fails.

"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."

I DISagree 100%. Validation errors should NEVER terminate the program,
they
should continue by displaying all the error messages to the user so that
he/she can correct his/her mistake and try the operation again.

and the code example:

catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when
input validation fails
}

This assumption is bogus.

Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.

I agree 100%

v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.

I agree 100%

8 years ago by Yasuo Ohgaki — view source

unread

To those who voted "no" for this,

"Dan Ackroyd" wrote in message
news:CA+kxMuRiOBQpmTeKqNyV8rX0GKCLrYixi--y5TcYUkdqpT746w@mail.gmail.com...

Hi Yasuo,

One more usual request.
Please describe reason(s) why you object proposal.

I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.

But for posterity:

i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.

I agree 100%

ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.

I agree 100%

iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.

iv) The RFC makes an assumption that programs should exit when validation
fails.

"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."

I DISagree 100%. Validation errors should NEVER terminate the program, they
should continue by displaying all the error messages to the user so that
he/she can correct his/her mistake and try the operation again.

and the code example:

catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when
input validation fails
}

This assumption is bogus.

Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.

I agree 100%

v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.

I agree 100%

--
Tony Marston

Could you explain why or express "same opinion here"?

IMHO, this opinion is based on misunderstanding of secure coding and
software security methodology. However, either way is helpful.

There are many that seem "the input validation should be implemented
by userland fully". If this is true, I'm going to propose "Filter
module" deprecation next. It's better not to have misleading and/or
half implemented module for security purpose.

Thank you!

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

Hi all,

This RFC is to add functions that are suitable for input validations
for secure coding. IMHO, these additions are mandatory for PHP.

https://wiki.php.net/rfc/add_validate_functions_to_filter
Vote ends 2016/08/22 23:59:59 UTC

I don't mind suspend vote and continue discussion if there is issue.
It's rather long RFC. Thank you for reading and voting!

Thank you for voting!
The RFC is declined 1 vs 13
A bit surprised this result.

I requested the reason of objection, but many of them does not disclose why.

https://wiki.php.net/rfc/add_validate_functions_to_filter#proposed_voting_choices
bwoebi (bwoebi)
colinodell (colinodell)
danack (danack)
derick (derick)
diegopires (diegopires)
guilhermeblanco (guilhermeblanco)
kguest (kguest)
levim (levim)
lstrojny (lstrojny)
marcio (marcio)
nikic (nikic)
ocramius (ocramius)
peehaa (peehaa)
santiagolizardo (santiagolizardo)

I would like to summarize objection points during discussion.
I assume above of us voted no for these reasons.

Input data validation cannot be done because client can be anything.
Input data validation should show what's wrong, not exception.
Input data validation error and input mistake error should be treated
by the same code to remove code redundancy.
Current filter module is good enough.

IMO. These are clearly wrong reasons of objection.

Almost all input data can be validated because of
- Web standards. e.g. Almost all form input must be "valid string".
- Client side validation. e.g. JS, HTML5.
- Many parameters are set by program and shouldn't be changed.
  e.g. Select, radio, hidden, database record ID.
Showing what's wrong in input validation is ANTI practice of security.
- Developers should NOT show error details unless it has to, otherwise
  it helps attackers to tamper system.
- "You have broken encoding", "You have unallowed CNTRL char", etc, are
  the same as "You have entered wrong user name", "You have entered
  wrong password", "You have entered too long password", etc.
This is not reasonable choice for large applications that have higher
security requirements.
- Strict input validation should check all inputs including request
  headers and cookies. Checking these in business logic makes
  things messy and complicated, hence easy to make mistakes.
Current filter module does not work for strict validations.
- I don't repeat. It just does not work well for strict validation.

(NOTE: "input validation" is "the input validation" mentioned in the RFC)

If you have question, I don't mind at all to explain more. I think most of
you misunderstood the concept.

If you have other reason(s), please let me know to improve RFC.
Thank you!

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lars Strojny — view source

unread

Hi Yasuo,

[...]

Thank you for voting!
The RFC is declined 1 vs 13
A bit surprised this result.

I requested the reason of objection, but many of them does not disclose why.
[...]
lstrojny (lstrojny)
[...]

sorry for not chiming in earlier, but I indeed owe you an explanation. I believe making ext/filter a part of PHP created more trouble than it solved, even though I applaud it’s intention. Of course, filtering and validation are necessary essentials of any secure web application. I nevertheless strongly believe validation and filtering must live in userland.
Validation and filtering are often very much tied to the domain problem a user of PHP is to solving and the change rate of the application will be higher than the change rate of the language (hopefully). To give a more concrete example: let’s say our problem is we want to validate if a string is a valid domain because our business is registering domains. Nowadays, top level domains are introduced quite often and there is no way PHP could have a nice, up to date whitelist of TLDs all of the time and as a domain registration business it’s impossible for me to wait for the updated whitelist in PHP NEXT. That’s why I believe this is something that belongs to userland so the library that offers (domain) validation can follow a lifecycle that fits the problem it is trying to solve.

cu,
Lars

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lars,

[...]

Thank you for voting!
The RFC is declined 1 vs 13
A bit surprised this result.

I requested the reason of objection, but many of them does not disclose why.
[...]
lstrojny (lstrojny)
[...]

sorry for not chiming in earlier, but I indeed owe you an explanation. I believe making ext/filter a part of PHP created more trouble than it solved, even though I applaud it’s intention. Of course, filtering and validation are necessary essentials of any secure web application. I nevertheless strongly believe validation and filtering must live in userland.
Validation and filtering are often very much tied to the domain problem a user of PHP is to solving and the change rate of the application will be higher than the change rate of the language (hopefully). To give a more concrete example: let’s say our problem is we want to validate if a string is a valid domain because our business is registering domains. Nowadays, top level domains are introduced quite often and there is no way PHP could have a nice, up to date whitelist of TLDs all of the time and as a domain registration business it’s impossible for me to wait for the updated whitelist in PHP NEXT. That’s why I believe this is something that belongs to userland so the library that offers (domain) validation can follow a lifecycle that fits the problem it is trying to solve.

Thank you for reply.

It seems many of us is mixed up what "input handling should do" and
"business logic should do".

There are number of ways how to implement input data validations and
input error checks, from ideal to poor, or even bad. The validator is
trying "to validate input string (format, used char, length,
existence, etc) is expected". Business logic should handle input
errors, logical consistency, etc. i.e. Domain whitelisting should be
handled by logic generally speaking.

I don't understand why new validator would cause more problems than
solving. If users validate all inputs (e.g. request headers, cookies,
all of post/get tampering), apps became much more secure. This task
does not belong to business(app) logic. Even when users use the
validator non optimal way, it will improve security.

Anyway, bottom line is "There are too many apps that do not validate
inputs properly", "Many users do not distinguish 'input validation'
and 'logic/mistake check'". It seems.

Regards,

P.S. I was about to reactivate DbC proposal. This kind of validation
is mandatory for DbC. Otherwise, DbC will cause more problems than
solving.

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

I don't understand why new validator would cause more problems than
solving. If users validate all inputs (e.g. request headers, cookies,
all of post/get tampering), apps became much more secure. This task
does not belong to business(app) logic. Even when users use the
validator non optimal way, it will improve security.

The whole problem with that statement is at what point do you
distinguish between an input being invalid because it does not meet some
validation such as bigger than X for 'validation' reasons rather than
'business logic' reasons. STILL in my book, it's the business logic that
defines the base validation but I don't need DbC as a straight jacket to
define that. Adding additional 'woolly' validation checks around the
base validation is a pointless exercise if the rules of the base
validation are available to use.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

I don't understand why new validator would cause more problems than
solving. If users validate all inputs (e.g. request headers, cookies,
all of post/get tampering), apps became much more secure. This task
does not belong to business(app) logic. Even when users use the
validator non optimal way, it will improve security.

The whole problem with that statement is at what point do you
distinguish between an input being invalid because it does not meet some
validation such as bigger than X for 'validation' reasons rather than
'business logic' reasons. STILL in my book, it's the business logic that
defines the base validation but I don't need DbC as a straight jacket to
define that. Adding additional 'woolly' validation checks around the
base validation is a pointless exercise if the rules of the base
validation are available to use.

Security purpose input validation (injection prevention mainly)
differs from what business logic does. Business logic should
focus on logical correctness while input validation should focus
on security.

I've audited number of MVC applications and have to admit that
input validations in models are poor. Besides input validation
should be done ASAP, model validation is very poor in many cases.
i.e. Not good enough for security purpose.

This is natural because what business logic should take care is
"Logic", not what data should look like, data have correct encoding,
make sure request headers/cookies/post/get are not tampered, etc.

Taking care of tampered data by business logic will reduce both
readability and maintainability. And more importantly, make code
less secure because programmers tend to focus on logic
in model, not input data validations.

Validations in model being less secure is proven already.
It is not a surprise since model is for "business logic".
(If app requirement is ok with validation with model, it's ok to
design so. Not all apps should have ideal secure coding.)

Why shouldn't we have more secure validation?

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Rowan Collins — view source

unread

Taking care of tampered data by business logic will reduce both
readability and maintainability. And more importantly, make code
less secure because programmers tend to focus on logic
in model, not input data validations.

This certainly makes sense. I guess the challenge is that in order to
know if data has been tampered, you need to have some knowledge of the
expected format. That expectation depends on what data you're expecting,
which depends - ultimately - on the domain objects being modelled.

More specifically, though, it depends on the interaction design - in an
HTML context, the forms being presented. So the validation needs
knowledge of the form controls - e.g. if a select box was shown, and the
value is not from the known list of options, the input has been tampered
with.

If that's the case, the logical place to build the validation is into a
form builder. At which point you've probably got a complex architecture
in userland, and filter_* functions are unlikely to be a natural fit.

If somebody's not using a library to build the form (e.g. they're
laying out the HTML by hand), are they likely to set up the complex
validation settings needed by the filter_* functions?

Regards,

Rowan Collins
[IMSoP]

8 years ago by Lester Caine — view source

unread

If somebody's not using a library to build the form (e.g. they're
laying out the HTML by hand), are they likely to set up the complex
validation settings needed by the filter_* functions?

The main problem is the lack of well built libraries that also take care
of validation. Form Builders don't often include a good validation
model. I've been going through those hoops for the last couple of years.

If we have a set of validated parameters coming in from that form then
as you say do the rules then exist to build a filter array, while I'm
looking to those rules simply to be applied when I save each parameter
to it's internal variable.

A filter of "is this string corrupted with an injection attempt" seems
rather more difficult to define than "email"? And applying the first in
general on every string when there are as set of simple filters that can
be used ... as an alternative to the more difficult to define ones?

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

A filter of "is this string corrupted with an injection attempt" seems
rather more difficult to define than "email"? And applying the first in
general on every string when there are as set of simple filters that can
be used ... as an alternative to the more difficult to define ones?

Input validation code does not have to address all of injections. It's
output code responsibility to prevent injections in the first place.
i.e. Top 10 Secure Coding Practices - #7
https://www.securecoding.cert.org/confluence/display/seccode/Top+10+Secure+Coding+Practices

Nonetheless, ID validation being poor is not rare even with well
known code. parameters like ID is easy to make sure it's safe from any
injections.
e.g. https://groups.google.com/forum/#!topic/rubyonrails-security/ly-IH-fxr_Q

ID is not the only one, accept language, encoding, referer, etc are
common source of injections also.

Input validation code is for mitigation against unknown/unaddressed
vulnerabilities in entire code not only PHP code, but also language,
libraries written by C/C++ and/or external systems such as DB.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

Hi Lester,

A filter of "is this string corrupted with an injection attempt" seems
rather more difficult to define than "email"? And applying the first in
general on every string when there are as set of simple filters that can
be used ... as an alternative to the more difficult to define ones?

Input validation code does not have to address all of injections. It's
output code responsibility to prevent injections in the first place.
i.e. Top 10 Secure Coding Practices - #7
https://www.securecoding.cert.org/confluence/display/seccode/Top+10+Secure+Coding+Practices

Your statement and those coding points don't go together.
1 ... Any input to PHP has to be untrusted since you can't rely on even
clean sources being intercepted.
8 ... Why not use the best available checks on the input side?
7 ... I've sanitised the data in the browser, but because of morons I
can't use it without addressing 1 and 8.

All this comes back to my simple idea of adding all these validation,
filtering and sanitation steps wrapped around the basic PHP variable.
And THAT also includes 'strict typing' since if we have the option to
select soft or hard failure when a problem is found in the variable we
can cover everybody’s 'need'!

Nonetheless, ID validation being poor is not rare even with well
known code. parameters like ID is easy to make sure it's safe from any
injections.
e.g. https://groups.google.com/forum/#!topic/rubyonrails-security/ly-IH-fxr_Q

I know the range of values available for 'id' it's provided by the
SEQUENCE source in the database but if you insist on 'autoinc' we can do
the job properly. So my filter on the variable :id is looking for a
number in a range. What could be a simpler validation than that?

ID is not the only one, accept language, encoding, referer, etc are
common source of injections also.

Input validation code is for mitigation against unknown/unaddressed
vulnerabilities in entire code not only PHP code, but also language,
libraries written by C/C++ and/or external systems such as DB.

If you need to retain the raw input of non-php material then that is
just a more complex filter. Point 5 above - Default Deny - do not
forward anything that you do not need. So once you have applied rule 9,
and assured you know what you expect to receive, then only that is
passed on to rule 8. If that data being passed on has a potential to
carry a vulnerability forward it's because you have to allow for that
data to be forwarded anyway, so a filter to prevent it is pointless?

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

Hi Lester,

A filter of "is this string corrupted with an injection attempt" seems
rather more difficult to define than "email"? And applying the first in
general on every string when there are as set of simple filters that can
be used ... as an alternative to the more difficult to define ones?

Input validation code does not have to address all of injections. It's
output code responsibility to prevent injections in the first place.
i.e. Top 10 Secure Coding Practices - #7
https://www.securecoding.cert.org/confluence/display/seccode/Top+10+Secure+Coding+Practices

Your statement and those coding points don't go together.
1 ... Any input to PHP has to be untrusted since you can't rely on even
clean sources being intercepted.
8 ... Why not use the best available checks on the input side?
7 ... I've sanitised the data in the browser, but because of morons I
can't use it without addressing 1 and 8.

Software that are executed by other process and/or computer is outside
of the software you're trying to protect. All other softwares that are
executed by other processes/computers shouldn't be trusted even if it
is written by you.

Rule #8 is for multiple layer protections. It's for fail safe. I think
you're familiar with defense in depth approach in network security.
e.g. Protect overall network via internet firewall, protect service
network via firewall, protect clients via client firewall.

Software architecture could be like network systems.
https://wiki.php.net/_detail/rfc/screenshot_from_2016-08-05_11-25-01.png?id=rfc%3Aadd_validate_functions_to_filter
i.e. Protect application via application input validations, protect
module via module validations, protect function/method via
function/module validations.

If you validate everything, software will be more secure, but it could
be executed very slow due to repetitive/excessive validations. That's
the reason why DbC validate everything (check every contract) during
development, but doesn't validate everything for production
environment.

Removing all validations is risky. Therefore, we need to keep
important validations in software for production environment.
Application level validations (contracts) cannot be removed obviously,
critical code such as command execution is better to keep validation
active.

All this comes back to my simple idea of adding all these validation,
filtering and sanitation steps wrapped around the basic PHP variable.
And THAT also includes 'strict typing' since if we have the option to
select soft or hard failure when a problem is found in the variable we
can cover everybody’s 'need'!

Forcing data type is "Weak form of validation".
Data types forces (≒ validates) certain form.
In spite of that forcing data type is valuable for security because
programmers are tends to expect int/float to have numeric data
representation.

However, strict data typing is not enough obviously. It does not force
certain form to "string", certain range of value to "int"/"float".
String is the most dangerous data, yet it is not covered by data
typing. Therefore, strict data typing is weak and secure coding
specialists recommends validation for strongly typed languages, too.

Nonetheless, ID validation being poor is not rare even with well
known code. parameters like ID is easy to make sure it's safe from any
injections.
e.g. https://groups.google.com/forum/#!topic/rubyonrails-security/ly-IH-fxr_Q

I know the range of values available for 'id' it's provided by the
SEQUENCE source in the database but if you insist on 'autoinc' we can do
the job properly. So my filter on the variable :id is looking for a
number in a range. What could be a simpler validation than that?

The example vulnerability is in Action Pack. ID does not have to be
numeric ID, yet some users (including Rails developer I suppose) are
blindly assume it's numeric.

If users are deploying proper validation and reject malformed ID, they
could avoid arbitrary code execution by malformed ID. We know there
are countless vulnerabilities that are similar to this.

ID is not the only one, accept language, encoding, referer, etc are
common source of injections also.

Input validation code is for mitigation against unknown/unaddressed
vulnerabilities in entire code not only PHP code, but also language,
libraries written by C/C++ and/or external systems such as DB.

If you need to retain the raw input of non-php material then that is
just a more complex filter. Point 5 above - Default Deny - do not
forward anything that you do not need. So once you have applied rule 9,
and assured you know what you expect to receive, then only that is
passed on to rule 8. If that data being passed on has a potential to
carry a vulnerability forward it's because you have to allow for that
data to be forwarded anyway, so a filter to prevent it is pointless?

It's not pointless as I described above.

I guess you are feeling secure coding is inefficient and repetitive.
However, this is the whole point of secure coding. We have to admit
that "people do mistakes". To build secure software, we need multiple
layer of protections. Input and output control is the most important.
http://cwe.mitre.org/top25/#Mitigations
#1 - Input control
#2 - Output control

Validations make software more maintainable, i.e. mistakes can be
found by validations in module/method/function, and more secure, i.e.
most vulnerabilities are due to unexpected input data. If data is
supposed to have certain form, it should be validated as soon as
possible to make software works correctly.

The reason why DbC can help securing code is "it can strictly force
data/object domain during development". What's missing is runtime
rule/validation to force input data domain. The proposed validator can
be used to specify "form" of input data at runtime. Therefore the
validator is mandatory. "Logical consistency" should be handled by
models at runtime.

It may differ from your software security model. Programmers are free
to choose which model to adopt. However, one shouldn't disturb
mandatory tool implementation for recommended security model by secure
coding specialists, IMHO. If you don't like/need it, it's free not to using
it after all.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

It may differ from your software security model. Programmers are free
to choose which model to adopt. However, one shouldn't disturb
mandatory tool implementation for recommended security model by secure
coding specialists, IMHO. If you don't like/need it, it's free not to using
it after all.

My security model is no different to yours. But in my model 'Add
validation functions to filter module' is adding another layer of checks
and I think I'm simply adding them in a different place.

I return to the original question which has not yet been answered. The
block of input data being supplied from what ever source needs to be
converted to a set of variables in PHP. That could be variables in a
class, an associative array as in $_POST or simple variables which are
probably ancient history now. If the definition of a variable is
improved to include ALL of the validation we ideally need and I include
setStrict(int) in that then at run time we can both validate input and
decide on the error model that is applied. I think DbC is a wrapper at
the development level as you describe it and we are back at the
'annotation' debate. What I'm still looking for is primary annotation
such as 'strict' if appropriate although I would look at that as
'between 0 and 200' rather than expecting a clean binary integer to be
supplied via some interface.

I can use the annotation information to build the browser side
validation, and know that I'm working with the same set of rules, and I
would also include escaping rules so that the general string data can
manage if material of a suspect nature is being processed. Such as
WRITING the script files that are needed to output the elements that a
blanket htmlentities() filter would block! If one is building template
and javascript packages of code in the database then you need to filter
the malicious stuff before saving them and ensure the stored data is clean.

I could envisage loosening the validation checks on a secure private
network where malicious activity would be a firing offence, but the sort
of layer of security I'm looking at should not introduce any more delay
than the normal. The way it falls down is if people can't be bothered to
set the validation values up ... or create your filter array. Default
rules such as your crude filters are a point for discussion.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Lester Caine — view source

unread

If the definition of a variable is
improved to include ALL of the validation we ideally need and I include
setStrict(int) in that then at run time we can both validate input and
decide on the error model that is applied.

And I know I will get my head chewed of for combining threads, but
readOnly(); Seems to me the correct answer to the whole of the
'immutable' debate. The class simply creates a readOnly object,
validated against all the rules and stores it as a readOnly object. No
reason you can't simply call the class again and create a separate
object. But this is where my 'model' of the world is an associative
array set of data handled by a separate set of code. If you need 100
read only dates for a calendar you only need one set of code to generate
them. The created objects would all be validated against the date rules
and then locked so you can't modify them.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Yasuo Ohgaki — view source

unread

Hi Rowan,

Taking care of tampered data by business logic will reduce both
readability and maintainability. And more importantly, make code
less secure because programmers tend to focus on logic
in model, not input data validations.

This certainly makes sense. I guess the challenge is that in order to know
if data has been tampered, you need to have some knowledge of the expected
format. That expectation depends on what data you're expecting, which
depends - ultimately - on the domain objects being modelled.

More specifically, though, it depends on the interaction design - in an HTML
context, the forms being presented. So the validation needs knowledge of the
form controls - e.g. if a select box was shown, and the value is not from
the known list of options, the input has been tampered with.

If that's the case, the logical place to build the validation is into a form
builder. At which point you've probably got a complex architecture in
userland, and filter_* functions are unlikely to be a natural fit.

If somebody's not using a library to build the form (e.g. they're laying
out the HTML by hand), are they likely to set up the complex validation
settings needed by the filter_* functions?

I agree HTML form is the most important input. However, apps
has to deal with other inputs like GET/COOKIE/JSON/HTTP Request
headers/Inputs from other subsystems. Form builder does not work
well for input data other than forms...

Even with great form builder, letting validation to input code has
advantages.

One issue is "data handling is better to break down into 2 parts",
logic and format. Since model's responsibility is to handle logic,
programmers try to handle logic in model. It's not bad at all with
this. However, there are many apps have poor format checks
because of this.

2nd issue is coverage. "model deals with data that is handled by
the model." Data validation covered by a model does not match with
application inputs. There are many vulnerabilities "Oops, this value
is not validated" even for data like ID. This is not limited to PHP apps.
e.g. https://groups.google.com/forum/#!topic/rubyonrails-security/ly-IH-fxr_Q

3rd issue is location. Input data validation is better to be done as
soon as possible. When application accepts input, programmers
know what the possible inputs, and could cover all inputs. i.e.
Controller is the best place for input format validation.
"Data" domain can be defined as a single definition, but making
sure it has "Valid Format" by models/libraries is harder than it seems.
If it's easy, we don't have this many vulnerabilities in PHP apps.

The best practice of input validation is
"Establish and maintain control over all of your inputs."
http://cwe.mitre.org/top25/#Mitigations

There are many possible implementations for this.
If we apply DbC to application, we have to validate external inputs
somewhere to keep contract and make code works w/o any
problems. Natural location for input validation is controller.
Internal redirects without validation could be serious bug
such as authentication bypass. Input validation can mitigate
such bug also if validation is done at controller.

BTW, I don't think everyone has to validate input very strict
manner. It is ok to validate like

<?php
// Define loose input validation

$server_def = array(
// REQUEST HEADER
'HTTP_ACCEPT' => $text64b_spec, // Text up to 64 bytes
'HTTP_ACCEPT_ENCODING' => $text64b_spec,
'HTTP_ACCEPT_LANGUAGE' => $text64b_spec,
'HTTP_USER_AGENT' => $text512b_spec, // Text up to 512 bytes
'HTTP_COOKIE' => $text1k_spec, // Text up to 1024 bytes
'REQUEST_URI' => $text4k_spec, // Text up to 4096 bytes
);

$cookie_def = array(
// COOKIE
'PHPSESSID' => $phpsessid_spec,
'uid' => $uid_spec,
'last_access' => $int_spec,
);

$get_def = array(
// GET
'id' => $id_spec,
'other_id' => $id_spec,
'type1' => $alnum32b_spec, // Alpha numeric up to 32 bytes
'type2'=> $alnum32b_spec,
);

$post_def = array(
// POST
'text1' => $input_tag_spec, // <input type=text> default up to
// 512 bytes text
'text2' => $input_tag_spec,
'select1' => $select_tag_spec, // <select> values default up to
// 64 bytes text
'select2' => $select_tag_spec,
'radio1' => $radio_tag_spec, // <radio> values default up to
// 64 bytes text
'radio2' => $radio_tag_spec,
'submit' => $submit_tag_spec,
'textarea1k' => $textarea1k_spec, // <textarea> values default up
// to 1K text,
allow newline
'textarea100k' => $textare100k_spec, // <textarea> values default
// up to
100K text, allow newline
'CSRF_TOKEN' => $alnum32b_spec, // Alpha numeric up to 32 bytes
// and so on
);

$_SERVER = filter_require_var_array($_SERVER);
$_COOKIE = filter_require_var_array($_COOKIE);
$_GET = filter_require_var_array($_GET);
$_POST =filter_require_var_array($_POST);
?>

Since new string validator validates encoding and control chars
including newline, this validation rejects tampered inputs include
broken encoding, null char injection, newline injection and other
CNTRL char injection.

This definition is loose and weak, but we are sure "broken encoding,
null char injection, newline injection and other CNTRL char injection"
FREE at least, except newline injections by textarea inputs. It's
very important we are sure free from certain vulnerabilities even if
output code/library/subsystem is supposed to sanitize data for 100%
safety.

We still have these kind of vulnerabilities.
e.g. I've fixed mail header injections via extra headers parameter recently.
https://bugs.php.net/bug.php?id=68776
We cannot be sure if 3rd party modules are free from encoding, CNTRL
injection attacks.

How far developers would like to validate by the input validation code
is up to developers. The more validate strictly, more secure and
increases chance to avoid hidden vulnerabilities in your code or other
people's code. i.e. Framework, library, or even PHP/subsystems like
database.

Regards,

P.S.
To people against this RFC,

How many of you are against the idea of this RFC?
(I don't think Rowan against basic idea, BTW)

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Rowan Collins — view source

unread

Hi Rowan,

This certainly makes sense. I guess the challenge is that in order to know
if data has been tampered, you need to have some knowledge of the expected
format. That expectation depends on what data you're expecting, which
depends - ultimately - on the domain objects being modelled.

More specifically, though, it depends on the interaction design - in an HTML
context, the forms being presented. So the validation needs knowledge of the
form controls - e.g. if a select box was shown, and the value is not from
the known list of options, the input has been tampered with.

BTW, I don't think everyone has to validate input very strict
manner. It is ok to validate like

<?php
// Define loose input validation

$get_def = array(
// GET
'id' => $id_spec,
'other_id' => $id_spec,
'type1' => $alnum32b_spec, // Alpha numeric up to 32 bytes
'type2'=> $alnum32b_spec,
);

$post_def = array(
// POST
'text1' => $input_tag_spec, // <input type=text> default up to
// 512 bytes text
'text2' => $input_tag_spec,
'select1' => $select_tag_spec, // <select> values default up to
// 64 bytes text
'select2' => $select_tag_spec,
'radio1' => $radio_tag_spec, // <radio> values default up to
// 64 bytes text
'radio2' => $radio_tag_spec,
'submit' => $submit_tag_spec,
'textarea1k' => $textarea1k_spec, // <textarea> values default up
// to 1K text,
allow newline
'textarea100k' => $textare100k_spec, // <textarea> values default
// up to
100K text, allow newline
'CSRF_TOKEN' => $alnum32b_spec, // Alpha numeric up to 32 bytes
// and so on
);

$_GET = filter_require_var_array($_GET);
$_POST =filter_require_var_array($_POST);
?>

These "simple" examples are still very closely bound to the HTML form
(or API definition, or whatever).

If a change is made to the form, even these simple rules need to be
changed. Every time a field is added or removed, these validation rules
need to be updated.

Or consider for example a select box which only ever contains integer
IDs; the simple validation for this would be to reject non-numeric input
as tampering. But if the UI changes to a fancy combo box autocomplete
widget, non-numeric input might instead merit a user-friendly validation
message.

You could just about guarantee that most fields will never need to
accept control characters. But even newlines come and go - a "revision
comment" field might be one line (the traditional wiki style) or many
(common version control style).

3rd issue is location. Input data validation is better to be done as
soon as possible. When application accepts input, programmers
know what the possible inputs, and could cover all inputs. i.e.
Controller is the best place for input format validation.

You're mixing two things here, I think: one is when the validation is
run (how soon in the execution pipeline); the other is where it is
defined (which PHP class it is part of). I think what I'm getting at is
that the rules should be defined in one place (avoid code duplication,
ensure definitions are kept up to date as requirements change) even if
they are accessed in more than one place.

The method $formDefinition->isSubmittedDataSane($_POST) could be
implemented by generating, based on the set of fields expected, a spec
for ext/filter. But by the time you've handled all the cases,
implemented a bunch of custom callbacks for unsupported validation
types, and customised the error message slightly, you might as well just
implement the validation yourself.

So the challenge of any built-in filter module is this: if it's not
doing the whole job of form handling and validation, what specific part
of that task is it doing? And how does it fit with common ways of
implementing the rest? Perhaps if we provided a narrower focus, the
API could become simpler and more widely applicable.

For instance, if we set the very narrow aim of "provide an easy-to-use
set of primitive tests for use in a validation filter", we could:

remove all array handling (assume users are capable of using foreach())
remove all support for custom filters (a single-variable custom filter
does little more than call_user_func)
simplify the return possibilities (boolean: does this value pass this
test?)
remove some tests that are trivially implemented using other functions

How many of you are against the idea of this RFC?
(I don't think Rowan against basic idea, BTW)

I guess I'm against the idea of the RFC in the sense that it's aim is
too broad: we cannot implement safe validation in the language, we can
only give users the tools to do it. The RFC as it was proposed (and, I
think, ext/filter in general) tries too hard to "do everything for you",
without looking at where it would fit inside a larger application.

Regards,

Rowan Collins
[IMSoP]

8 years ago by Yasuo Ohgaki — view source

unread

Hi Rowan,

3rd issue is location. Input data validation is better to be done as
soon as possible. When application accepts input, programmers
know what the possible inputs, and could cover all inputs. i.e.
Controller is the best place for input format validation.

You're mixing two things here, I think: one is when the validation is run
(how soon in the execution pipeline); the other is where it is defined
(which PHP class it is part of). I think what I'm getting at is that the
rules should be defined in one place (avoid code duplication, ensure
definitions are kept up to date as requirements change) even if they are
accessed in more than one place.

This might be the largest difference.

To make something secure than it is now, adding additional security
layer is effective, not single location/code.

Good example is web application firewall(WAF). It's a independent
security layer that does whole bunch of checks for additional
security. WAF is proven to be useful for web app code vulnerabilities
such as JavaScript/SQL injections because it does checks independent
from application code and most apps do very poor validations.

Maintaining WAF rules is not easy task, especially when WAF rules are
white-list based. (All of security guidelines recommend whitelist
based approach.) IMHO, most WAF protections should be implemented in
apps because strict validations with WAF is too hard and too
inefficient.

The method $formDefinition->isSubmittedDataSane($_POST) could be implemented
by generating, based on the set of fields expected, a spec for ext/filter.
But by the time you've handled all the cases, implemented a bunch of custom
callbacks for unsupported validation types, and customised the error message
slightly, you might as well just implement the validation yourself.

So the challenge of any built-in filter module is this: if it's not doing
the whole job of form handling and validation, what specific part of that
task is it doing? And how does it fit with common ways of implementing the
rest? Perhaps if we provided a narrower focus, the API could become
simpler and more widely applicable.

My intention is to cover runtime validations required by DbC. DbC
validations are disabled for production systems, but some validations
must be executed at runtime, application level validations at least.

Even if there are some missing parts, the proposal is good enough to
start. IMO. I appreciate suggestions for improvements. It does not
have to be based on current filter module.

For instance, if we set the very narrow aim of "provide an easy-to-use set
of primitive tests for use in a validation filter", we could:

remove all array handling (assume users are capable of using foreach())

remove all support for custom filters (a single-variable custom filter
does little more than call_user_func)

simplify the return possibilities (boolean: does this value pass this
test?)

remove some tests that are trivially implemented using other functions

Suppose we have validation module. You are suggesting something like

$int = validate_int($var, $min, $max);
$bool = validate_bool($var, $allowed_bool_types);
// i.e. which type of bool 1/0, yes/no, on/off, true/false is allowed
// This isn't implemented. All of them are valid bools currently.
$str = validate_string($var, $min_len, $max_len);
$str = validate_string_encoding($var, $encoding);
$str = validate_string_chars($var, $allowed_chars);
$str = validate_string_regex($var, $regex);
$str = validate_string_degit($var, $min_len, $max_len);
$str = validate_string_callback($var, $callback);

Although it works, I prefer array definition because it's a lot easier
to write rule and efficient to execute.

$def = [
'int_var' => ['filter'=>FILTER_VALIDTE_INT, 'options'=>[$min, $max]],
'bool_var' => ['filter'=>FILTER_VALIDATE_BOOL,
'options'=>$allowed_bool_types],
'str_var' => [
['filter' => FILTER_VALIDATE_STRING,
'options' =>['min_bytes'=>$min_len, 'max_bytes'=>$max_len]],
['filter' => FILTER_VALIDATE_REGEX,
'options' => ['regex' => $regex]],
['filter' => FILTER_VALIDATE_CALLBACK,
'options' => ['callback' => $callback]],
]
];
$safe_input = filter_require_var_array($input, $def);

You can group definition easily with array. (Multiple filter support
is implemented by my patch) e.g.

$my_str_var_spec = [
['filter' => FILTER_VALIDATE_STRING,
'options' =>['min_bytes'=>$min_len, 'max_bytes'=>$max_len]],
['filter' => FILTER_VALIDATE_REGEX,
'options' => ['regex'=> $regex]],
['filter' => FILTER_VALIDATE_CALLBACK,
'options' =>['callback' => $callback]],
];

then previous definition became

$def = [
'int_var' => ['filter'=>FILTER_VALIDTE_INT,
'options'=>[$min, $max]],
'bool_var' => ['filter'=>FILTER_VALIDATE_BOOL,
'options'=>$allowed_bool_types],
'str_var' => $my_str_var_spec,
];

Rule reuse and centralizing validation rule is easy.

If you would like to build JavaScript validations on client side from
the definition, it's easy to build one because it's simple array
definition, not bunch of functions define validation rules.

How many of you are against the idea of this RFC?
(I don't think Rowan against basic idea, BTW)

I guess I'm against the idea of the RFC in the sense that it's aim is too
broad: we cannot implement safe validation in the language, we can only give
users the tools to do it. The RFC as it was proposed (and, I think,
ext/filter in general) tries too hard to "do everything for you", without
looking at where it would fit inside a larger application.

There are many people who use filter module happily, why validation
cannot be implemented? Even external WAF does it. Divide and conquer
(input handling and logic handling), multiple layers of protections
works. We know interface is more stable than logic. Vulnerabilities
introduced often when logic is changed. Input validation can mitigate
risks.

I didn't spend much time for this because I reused filter module
framework/code and didn't do refactoring. If it seemed I tried to
hard, filter module authors worked too hard. I spent more time to
write english rather than code :)

The proposal provides primitive tool, but not too primitive. It does
not handle complex form nor client side JavaScript
validations, but it could be used for these tasks. (I changed PR so
that exception could be optional) Those fancy exciting things are left
to user implementation.

Anyway, we have $_POST/$_GET/$_COOKIE/$_FILES/$_SERVER/$_ENV
as basic inputs. Input validation is #1 requirement for code security.
PHP must have some tool that validates these easy and simple,
yet extensible.

Question would be what kind we'll have?

Simple functions? Different kind of array definition and validator
function? More comprehensive object based? Suggestions are
appreciated. I don't mind implement it from scratch. Idea only
suggestion is welcomed!

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Rowan Collins — view source

unread

This might be the largest difference.

To make something secure than it is now, adding additional security
layer is effective, not single location/code.

My instinct is that this extra location would be a maintenance
nightmare, as it would need to keep up with changes elsewhere in the
application. In day to day use, what would make this extra security
layer any more likely to be comprehensively maintained than the existing
validation layer?

Suppose we have validation module. You are suggesting something like

$int = validate_int($var, $min, $max);
$bool = validate_bool($var, $allowed_bool_types);
// i.e. which type of bool 1/0, yes/no, on/off, true/false is allowed
// This isn't implemented. All of them are valid bools currently.
$str = validate_string($var, $min_len, $max_len);
$str = validate_string_encoding($var, $encoding);
$str = validate_string_chars($var, $allowed_chars);
$str = validate_string_regex($var, $regex);
$str = validate_string_degit($var, $min_len, $max_len);
$str = validate_string_callback($var, $callback);

No, I'm suggesting something like:

if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's just
a boolean-returning function
)
{
// ERROR
}

It was a deliberately narrow purpose as an example, but IMHO that's
perfectly readable, and the focus of the module would be on creating a
good set of validation rules, rather than worrying about how they're
applied.

[...] Rule reuse and centralizing validation rule is easy.

The language provides us plenty of ways to reuse code. For instance:

function validate_my_field($var) {
return
validate_int($var, $min, $max)
&& validate_bool($var, $allowed_bool_types)
// etc
}

Regards,

Rowan Collins
[IMSoP]

8 years ago by Stephen Reay — view source

unread

So, I’m trying to really understand what the goals of this RFC were/are.

Adding a bunch of new functions is IMO the wrong approach to this type of thing.
The existing filter_var/filter_input infrastructure works well, if you want to define more rules I would definitely encourage building on/improving that system not adding a bunch of extra functions.

I would be greatly in favour of adding some of the additional filter constants suggested (e.g. a FILTER_VALIDATE_STRING with the min/max bytes, or better yet, min/max chars based on current charset).
But the new functions whether the originally proposed ones (which seem to just be sugar in place of a userland foreach/array_walk etc with a throw for failed validation) or these more recent suggestions (which seem to be just wrappers around filter_var, no?) make no sense to me.

Cheers

Stephen

This might be the largest difference.

To make something secure than it is now, adding additional security
layer is effective, not single location/code.

My instinct is that this extra location would be a maintenance nightmare, as it would need to keep up with changes elsewhere in the application. In day to day use, what would make this extra security layer any more likely to be comprehensively maintained than the existing validation layer?

Suppose we have validation module. You are suggesting something like

$int = validate_int($var, $min, $max);
$bool = validate_bool($var, $allowed_bool_types);
// i.e. which type of bool 1/0, yes/no, on/off, true/false is allowed
// This isn't implemented. All of them are valid bools currently.
$str = validate_string($var, $min_len, $max_len);
$str = validate_string_encoding($var, $encoding);
$str = validate_string_chars($var, $allowed_chars);
$str = validate_string_regex($var, $regex);
$str = validate_string_degit($var, $min_len, $max_len);
$str = validate_string_callback($var, $callback);

No, I'm suggesting something like:

if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's just a boolean-returning function
)
{
// ERROR
}

It was a deliberately narrow purpose as an example, but IMHO that's perfectly readable, and the focus of the module would be on creating a good set of validation rules, rather than worrying about how they're applied.

[...] Rule reuse and centralizing validation rule is easy.

The language provides us plenty of ways to reuse code. For instance:

function validate_my_field($var) {
return
validate_int($var, $min, $max)
&& validate_bool($var, $allowed_bool_types)
// etc
}

Regards,

Rowan Collins
[IMSoP]

8 years ago by Yasuo Ohgaki — view source

unread

Hi Stephen,

Adding a bunch of new functions is IMO the wrong approach to this type of thing.
The existing filter_var/filter_input infrastructure works well, if you want to define more rules I would definitely encourage building on/improving that system not adding a bunch of extra functions.

Do you really think filter module works well as optimal validator?
It cannot enforce even whitelisting well...

What filter module is missing as validator currently are:

Whitelisting concept (Implemented)
Multiple rules for a variable (Implemented)
String rules (Implemented)
Optional rule (To be implemented. Refactoring is needed)

These are the missing features and cannot be fixed without additional
functions. (W/o modifying current function behaviors)

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Stephen Reay — view source

unread

Hi Yasuo,

Hi Stephen,

Adding a bunch of new functions is IMO the wrong approach to this type of thing.
The existing filter_var/filter_input infrastructure works well, if you want to define more rules I would definitely encourage building on/improving that system not adding a bunch of extra functions.

Do you really think filter module works well as optimal validator?

It’s not perfect, but nothing is. As I said, I believe the issues can largely be resolved by building on the existing functionality.

It cannot enforce even whitelisting well…

VALIDATE_INT already accepts $max and $min options. Those options could be applied to VALIDATE_FLOAT, and $charset, $accepted_chars, $max_len, $min_len could be implemented on a new VALIDATE_STRING filter.

I understand the use-case for multiple validation per input, and for validating multiple inputs, but frankly the way this implements that is both confusing to use, and has a less than ideal error-mode.

The “filter spec” input is an array of arrays of arrays, most of which will also contain an array for ‘options’. To me that’s getting dangerously close to JavaScript’s callback hell for impossible to read code.

The error mode is also not ideal in a real world use case in my opinion. If I am validating a dozen input fields, I do not want to know just the first one that failed. Can you imagine using a web form that made you submit 12 times to tell you each time you got a field wrong, rather than trying to validate them ALL and telling you ALL the errors at once?

Personally I think a better approach is:

improve/adding to the filters available, and if desired, add extra flags/options e.g, to throw an exception on failure (which, btw was requested via bugs.php.net 6 years ago), to set min/max values for FILTER_VALIDATE_FLOAT, etc.

2a. Leave the multiple rules per input to userland (e.g. dev uses foreach, array_walk, etc on a rules array or what have you)
2b. maybe add an alternative to filter_(input/var)array where it’s 1 input and multiple rules, e.g. filter(input|var)_multiple

If you wanted to follow 2b, I’d suggest perhaps tackling it as a separate RFC - improving what can be validated isn’t necessarily tied to how you define what you want validated.

Cheers

Stephen

What filter module is missing as validator currently are:

Whitelisting concept (Implemented)

Multiple rules for a variable (Implemented)

String rules (Implemented)

Optional rule (To be implemented. Refactoring is needed)

These are the missing features and cannot be fixed without additional
functions. (W/o modifying current function behaviors)

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

Hi Stephen,

Adding a bunch of new functions is IMO the wrong approach to this type of thing.
The existing filter_var/filter_input infrastructure works well, if you want to define more rules I would definitely encourage building on/improving that system not adding a bunch of extra functions.

Do you really think filter module works well as optimal validator?

It’s not perfect, but nothing is. As I said, I believe the issues can largely be resolved by building on the existing functionality.

It cannot enforce even whitelisting well…

VALIDATE_INT already accepts $max and $min options. Those options could be applied to VALIDATE_FLOAT, and $charset, $accepted_chars, $max_len, $min_len could be implemented on a new VALIDATE_STRING filter.

But it trims input because it is filter based. For example, int input like
' 1234 '
must be invalid with whitelisting approach, but it's allowed with
current implementation.

I understand the use-case for multiple validation per input, and for validating multiple inputs, but frankly the way this implements that is both confusing to use, and has a less than ideal error-mode.

I agree.
As filter, it works well mostly. As validator, it's unuseable due to
filter like behavior/non whitelisting behaviors.

The “filter spec” input is an array of arrays of arrays, most of which will also contain an array for ‘options’. To me that’s getting dangerously close to JavaScript’s callback hell for impossible to read code.

I can understand your concern. Issue would be callback validator, but
callback nesting would not be needed unlike JS callback hell. Since
it's simple array, content/rule can be viewed easily also.

The error mode is also not ideal in a real world use case in my opinion. If I am validating a dozen input fields, I do not want to know just the first one that failed. Can you imagine using a web form that made you submit 12 times to tell you each time you got a field wrong, rather than trying to validate them ALL and telling you ALL the errors at once?

You can omit validation where you would like. So your concern wouldn't
be problem. (I strongly suggest to validate all inputs for all entry
points, though)

Personally I think a better approach is:

improve/adding to the filters available, and if desired, add extra flags/options e.g, to throw an exception on failure (which, btw was requested via bugs.php.net 6 years ago), to set min/max values for FILTER_VALIDATE_FLOAT, etc.

2a. Leave the multiple rules per input to userland (e.g. dev uses foreach, array_walk, etc on a rules array or what have you)
2b. maybe add an alternative to filter_(input/var)array where it’s 1 input and multiple rules, e.g. filter(input|var)_multiple

If you wanted to follow 2b, I’d suggest perhaps tackling it as a separate RFC - improving what can be validated isn’t necessarily tied to how you define what you want validated.

Thank you for the suggestion. There are issues this approach.

Existing validator does not perform strict validations. We need new
validator rules. e.g.

FILTER_VALIDATE_STRICT_FLOAT

Do not allow white space prefix/postfix, raise exception.
FILTER_VALIDATE_STRICT_INT
Do not allow white space prefix/postfix, raise exception.

Having
FILTER_VALIDATE_STRICT_FLOAT
and
FILTER_VALIDATE_FLOAT
would be problematic.

Current filter functions fallback to FILTER_UNSAFE_RAW when something
goes wrong which is unacceptable for validator.

2a. Letting user to use foreach will results in the same, at least
similar, rule definition array. Please take a look at the last reply
to Rowan's post.

2b. This sounds good. I should think about implementation. 2a issue
that result in the same or similar array definition remains though.

Thank you for suggestions!

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

Hi Stephen,

Having
FILTER_VALIDATE_STRICT_FLOAT
and
FILTER_VALIDATE_FLOAT
would be problematic.

I forgot to mention filter module uses 32 bit int for filter flags. It
means we have only up to 31 filters. We don't have much space for new
filter flags.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Stephen Reay — view source

unread

Hi Yasuo,

Hi Stephen,

Adding a bunch of new functions is IMO the wrong approach to this type of thing.
The existing filter_var/filter_input infrastructure works well, if you want to define more rules I would definitely encourage building on/improving that system not adding a bunch of extra functions.

Do you really think filter module works well as optimal validator?

It’s not perfect, but nothing is. As I said, I believe the issues can largely be resolved by building on the existing functionality.

It cannot enforce even whitelisting well…

VALIDATE_INT already accepts $max and $min options. Those options could be applied to VALIDATE_FLOAT, and $charset, $accepted_chars, $max_len, $min_len could be implemented on a new VALIDATE_STRING filter.

But it trims input because it is filter based. For example, int input like
' 1234 '
must be invalid with whitelisting approach, but it's allowed with
current implementation.

I understand the use-case for multiple validation per input, and for validating multiple inputs, but frankly the way this implements that is both confusing to use, and has a less than ideal error-mode.

I agree.
As filter, it works well mostly. As validator, it's unuseable due to
filter like behavior/non whitelisting behaviours.

As you mentioned, there isn’t much space left for new flags (about 8 slots left by my count?) but surely a FILTER_FLAG_ALLOW_WHITESPACE could be utilised by multiple filters to achieve the same thing, while still allowing for legacy behaviour.

The “filter spec” input is an array of arrays of arrays, most of which will also contain an array for ‘options’. To me that’s getting dangerously close to JavaScript’s callback hell for impossible to read code.

I can understand your concern. Issue would be callback validator, but
callback nesting would not be needed unlike JS callback hell. Since
it's simple array, content/rule can be viewed easily also.

Sorry, maybe I wasn’t clear. My issue wasn’t with the callback filter specifically. Callback hell in Javascript is the issue where you have a lot of nested callbacks, which can make the code quite difficult to read, even with indenting, matching brace highlighting etc.

My concern here is that you’d have a very deeply nested data structure essentially to avoid a user-space loop.

The error mode is also not ideal in a real world use case in my opinion. If I am validating a dozen input fields, I do not want to know just the first one that failed. Can you imagine using a web form that made you submit 12 times to tell you each time you got a field wrong, rather than trying to validate them ALL and telling you ALL the errors at once?

You can omit validation where you would like. So your concern wouldn't
be problem. (I strongly suggest to validate all inputs for all entry
points, though)

Again, I apologise, maybe I wasn’t clear.

Let’s say I expect to get 5 posted parameters (from a form, a direct http api call, etc). 3 are required, of those 1 must be an email, and the other two must match (e.g. password/confirm) and have a custom callback to match (e.g. to prevent ‘aaaaaa’). the other two are optional, but have a maximum length of 45 characters each.

Now, I want to validate all those fields, and give meaningful error messages back to the user.

With your proposal, if they have made provided somehow invalid data in all five fields, they will have to make at least 6 http requests (assuming each time they get an error response they manage to fix that field on their first re-try).

I can definitely see benefit in allowing exceptions when validation fails. I use a validation system that does something similar internally, but it includes catches so effectively, validating an request with 5 inputs will potentially give you a single exception, which then gives access to the error encountered for each input (which is also expressed as an exception)

Personally I think a better approach is:

improve/adding to the filters available, and if desired, add extra flags/options e.g, to throw an exception on failure (which, btw was requested via bugs.php.net 6 years ago), to set min/max values for FILTER_VALIDATE_FLOAT, etc.

2a. Leave the multiple rules per input to userland (e.g. dev uses foreach, array_walk, etc on a rules array or what have you)
2b. maybe add an alternative to filter_(input/var)array where it’s 1 input and multiple rules, e.g. filter(input|var)_multiple

If you wanted to follow 2b, I’d suggest perhaps tackling it as a separate RFC - improving what can be validated isn’t necessarily tied to how you define what you want validated.

Thank you for the suggestion. There are issues this approach.

Existing validator does not perform strict validations. We need new
validator rules. e.g.

FILTER_VALIDATE_STRICT_FLOAT

Do not allow white space prefix/postfix, raise exception.
FILTER_VALIDATE_STRICT_INT

Do not allow white space prefix/postfix, raise exception.

Having
FILTER_VALIDATE_STRICT_FLOAT
and
FILTER_VALIDATE_FLOAT
would be problematic.

As I mentioned above, I would imagine a single new flag could be used to make “all” filters have the strict behaviour. Presumably FILTER_VALIDATE_BOOLEAN should treat ‘ true ‘ as an error too?

Current filter functions fallback to FILTER_UNSAFE_RAW when something
goes wrong which is unacceptable for validator.

I’m not sure I understand this. What do you mean by “when something goes wrong”. Can you elaborate on that?

2a. Letting user to use foreach will results in the same, at least
similar, rule definition array. Please take a look at the last reply
to Rowan's post.

2b. This sounds good. I should think about implementation. 2a issue
that result in the same or similar array definition remains though.

Not necessarily. While they could be in the format you specified, they could also be defined individually or defined inline.

I really believe that combining multiple fields and multiple rule definitions complicates things, a lot.
Making improvements to the filter system but only for this new “complicated” function feels like it goes against your intent of trying to make it easier for developers to make safe(r) applications.

Cheers

Stephen

Thank you for suggestions!

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

Hi Rowan,

This might be the largest difference.

To make something secure than it is now, adding additional security
layer is effective, not single location/code.

My instinct is that this extra location would be a maintenance nightmare, as
it would need to keep up with changes elsewhere in the application. In day
to day use, what would make this extra security layer any more likely to be
comprehensively maintained than the existing validation layer?

You are right and wrong.

WAF(Web application firewall) maintenance that is customized for
certain app is nightmare, I fully agree.

Dividing input validations and logic checks works. These two are
focusing on different objectives, format and logic. Finding
programmer's mistakes is a lot easier with software built-in input
validation, unlike WAF, because programmer knew what's the inputs.
UNIT tests help also. That's the reasons why it works much better
than WAF approach.

Suppose we have validation module. You are suggesting something like

$int = validate_int($var, $min, $max);
$bool = validate_bool($var, $allowed_bool_types);
// i.e. which type of bool 1/0, yes/no, on/off, true/false is allowed
// This isn't implemented. All of them are valid bools currently.
$str = validate_string($var, $min_len, $max_len);
$str = validate_string_encoding($var, $encoding);
$str = validate_string_chars($var, $allowed_chars);
$str = validate_string_regex($var, $regex);
$str = validate_string_degit($var, $min_len, $max_len);
$str = validate_string_callback($var, $callback);

No, I'm suggesting something like:

if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's
just a boolean-returning function
)
{
// ERROR
}

My opinion is "it's better to separate this kind of required format
check from logic" to make logic simpler and more maintainable.

Think of this format check as "forcing types". Even if you force DATE
type, you still might have to check logical errors in logic(model).
e.g. Reservation date for a service can only be set within certain
range.

If you're on the same boat as me. You don't have to return anything,
but catch exception. Above code would be

<?php
$validate_my_var = function($var) use ($callback) {
validate_int($var, $min, $max);
validate_bool($var, $allowed_bool_types);
validate_string($var, $min_len, $max_len);
validate_string_encoding($var, $encoding);
validate_string_chars($var, $allowed_chars);
validate_string_regex($var, $regex);
validate_string_degit($var, $min_len, $max_len);
$callback($var);
}

simply call it for a variable.
I would use array of definition as you described in previous mail, too.

$validation_def = [
'my_var' => $validate_my_var,
'some_var' => $validate_some_var,
'another_var' => $validate_another_var,
];

function validate_array($input, $definition) {
foreach($definition as $key => $func) {
// Cannot handle "optional" rule well with this code.
$func($input[$key]);
}
}

try {
validate_array($_POST, $validation_def);
} catch (InputValidationException $e) {
// cleanup and die
}
?>

It seems a lot like proposed code with a lot more typing. IMO.
However, I'm OK with this kind of implementation, too.
i.e. Providing basic validate_*() functions.

Half implemented filter module validator problem remains, though.

It was a deliberately narrow purpose as an example, but IMHO that's
perfectly readable, and the focus of the module would be on creating a good
set of validation rules, rather than worrying about how they're applied.

[...] Rule reuse and centralizing validation rule is easy.

The language provides us plenty of ways to reuse code. For instance:

function validate_my_field($var) {
return
validate_int($var, $min, $max)
&& validate_bool($var, $allowed_bool_types)
// etc

OK. I understood you prefer simple validation functions solution.

PHP is generic programming language, so most features could be
implemented by basic constructs. Code reuse works with simple
functions. However, how about definition reuse? It cannot be reused
easily to generate JavaScript validation code, for example. If example
code made definition reusable, it would look more like proposed code.

We have working filter, and half implemented validator by filter
module. Leaving it as it is now is not good, IMHO. Why not finish
validator implementation and provide ready to use tool? Resulting code
would be similar to simple validation functions implementation,
somewhat at least, anyway.

What filter module is missing as validator currently are:

Whitelisting concept (Implemented)
Multiple rules for a variable (Implemented)
String rules (Implemented)
Optional rule (To be implemented. Refactoring is needed)

My PR implements above 3 out of 4 mandatory features to filter module
validation.

I still fail to see what's wrong for improving/finishing filter module
implementation and what's wrong in my improvement proposal.
Suggestions are appreciated always!

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

No, I'm suggesting something like:

if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's
just a boolean-returning function

And I am looking for some way of packaging that into something I can
read and write dynamically for each $var ...

$var->set_validation_rules($rules); And $rules is going to be an array
of items which can then be used for related parallel activities such as
populating the browser validation.

So the above script is replaced by $var->is_valid(); or if you prefer it
throws an exception when you try and set the variable with an invalid
input ( or one that does not match a 'strict' rule ).

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

No, I'm suggesting something like:

if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's
just a boolean-returning function

And I am looking for some way of packaging that into something I can
read and write dynamically for each $var ...

This could be done by convention rather than configuration.
You need some rule for variable names. If var name is ID, it must be
numeric string always for example.

Convention is developer defined rule, so this is left to developer how to
do it.

$var->set_validation_rules($rules); And $rules is going to be an array
of items which can then be used for related parallel activities such as
populating the browser validation.

So the above script is replaced by $var->is_valid(); or if you prefer it
throws an exception when you try and set the variable with an invalid
input ( or one that does not match a 'strict' rule ).

I think convention rather than configuration works. However, not all checks
should/can be done by model because model treats data related to the
model leave other vars behind. Leftover could be cause of vulnerabilities.

IIRC, Magento had vulnerability that allows malicious access due to
internal redirects. This kind of problem can be mitigated by strict
input validation at the time inputs are accepted.

Anyway, your way would work with autoboxing.
https://wiki.php.net/rfc/autoboxing
and this proposal.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Christoph M. Becker — view source

unread

No, I'm suggesting something like:

if (
! validate_int($var, $min, $max)
|| ! validate_bool($var, $allowed_bool_types)
|| ! validate_string($var, $min_len, $max_len)
|| ! validate_string_encoding($var, $encoding)
|| ! validate_string_chars($var, $allowed_chars)
|| ! validate_string_regex($var, $regex)
|| ! validate_string_degit($var, $min_len, $max_len)
|| ! $callback($var) // Note: no need to wrap this callback, it's
just a boolean-returning function

And I am looking for some way of packaging that into something I can
read and write dynamically for each $var ...

$var->set_validation_rules($rules); And $rules is going to be an array
of items which can then be used for related parallel activities such as
populating the browser validation.

So the above script is replaced by $var->is_valid(); or if you prefer it
throws an exception when you try and set the variable with an invalid
input ( or one that does not match a 'strict' rule ).

Anyway, your way would work with autoboxing.
https://wiki.php.net/rfc/autoboxing
and this proposal.

And it can even work without autoboxing; just wrap the scalars in
objects manually.

--
Christoph M. Becker

8 years ago by Lester Caine — view source

unread

And it can even work without autoboxing; just wrap the scalars in
objects manually.

And we come full circle. YES everybody can add their own user land
wrappers to do this, but if code is built into the core to provide a
standard to work with then we don't have everybody re-inventing the
wheel. And there is no need to 'Add validation functions to filter
module' simply because that code already exists in the right place ...
wrapping the base variable.

Some of the elements of proper variable validation have been squeezed in
via the 'strict mode, and other soft type rules, where a simple
expansion to a basic element will cover just about everything everybody
is demanding? We don't need a strictly typed language but if strict
rules can be added to the core validation functions then we have much
better flexability to use them or not, and none of the debate of where
'strict mode=1' enables it. Just as a read_only rule gets rid of the
need for yet another whole family of classes.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Lester Caine — view source

unread

Why shouldn't we have more secure validation?

No argument about that ... only that ALL validation requires rules. If
you have rules for preventing 'injection attacks' they only need to be
applied to data that could allow that injection to be carried forward.
If I expect a valid email address, and the string supplied is not a
valid email address, then I kill anything that is provided instead.

The legacy code which I have had validation problems with have
basically just been poor design from simply mirroring the post data to a
new URL if they want to use some third party service. Heavy handed
filtering of injection paths also kill the data that the silly clone
mirroring can't be bothered to filter properly. Convincing others that
the correct approach IS to filter data properly is an up hill struggle
when they can't be bothered to learn the interface to the service they
are bouncing over to. "It's too difficult to maintain as the API's will
keep changing". But if PHP has a set of base rules that can be applied
in parallel to the same rules browser space, then one can simplify the
processing elements that can then be mirrored cleanly, or halted if the
material needed to create the mirror is no longer valid.

Taking care of tampered data by business logic will reduce both
readability and maintainability. And more importantly, make code
less secure because programmers tend to focus on logic
in model, not input data validations.

That one has a packet of data validated in the browser which one is now
processing in the server and it is subject to tampering is the extra
validation you are talking about. How do you distinguish between what
was valid, but has now been contaminated without also checking that the
expected strings ARE still valid?

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Marco Pivetta — view source

unread

Hey Yasuo,

Besides what reported above by Dan, my reasoning for voting "no" is that
this API can be implemented in userland, regardless if trivial or not

There is no reason good enough for justifying yet another added endpoint
that can even be implemented with simple function composition.

In addition to that, the lack of a strongly typed data structure for the
validation DSL makes this proposed functionality very error-prone and
obnoxious to use and maintain for future additional use-case scenarios that
may come up.

Performance impact in userland implementations can be mitigated via codegen
there (similar to what Nikic's FastRoute lib): still less complicated than
relying on the core API, maintaining it in C code, and having it locked
onto the installed PHP version.

Cheers,

Marco

Hi Dan,

Thank you for sharing idea!

On Mon, Aug 15, 2016 at 10:25 AM, Dan Ackroyd danack@basereality.com
wrote:

One more usual request.
Please describe reason(s) why you object proposal.

I'm not entirely sure why you ask for reasons when people vote no. The
reasons are almost always the same as the reasons given before the
voting starts.

Without feedback, there is no clue which way should go or improve.
I didn't realize some of your idea and I think your feedback is great.

But for posterity:

i) Validation error messages need to specify what is wrong.....which
is bespoke to the application. Which is a reason why validation code
belongs in userland.

When exception is enabled, offensive key name is written in exception
message.

When exception is disabled, your statement is true. This could be improved.
Good feedback.

ii) Validation error message need to be in the correct language for an
application. It is not a good approach for people to be trying to
match strings emitted by internal code and trying to convert them to
the correct language.

It seems there is misunderstanding.
These new functions are intended for "secure coding input validation" that
should never fail. It means something unexpected in input data that
cannot/shouldn't keep program running. Why do you need to parse
message?

All needed info, filter name, key and value, is in exception message and
exception object, BTW.

This one is good feedback, too.
I appreciate better error message suggestions.

iii) The argument that it needs to be fast could be applied to
anything and everything, and so is bogus. The RFC doesn't even show
that userland implementations are slow enought to be a concern.

I thought I don't have to have example of userland implementation, so
it's good feedback also.

Typical OO implementation uses number of setters to define validation rules.
In addition, it validates validation rule is OK for it. e.g. It will
check input data type at least. Setters and validation rule validation
makes execution slower obviously.

One may optimize validation rules to plain array (like I do).
In this case, performance is could be better than previously mentioned
validators do all in the production environment.

I also thought the performance issue is not much important because
there is no PHP feature to compare. All of us knew PHP function call
overheads are relatively large and proposed almost all in C implementation
would be faster than userland.

iv) The RFC makes an assumption that programs should exit when validation
fails.

"Input data validation should accept only valid and possible inputs.
If not, reject it and terminate program."

and the code example:

catch (FilterValidateException $e) {
var_dump($e->getMessage());
die('Invalid input detected!'); // Should terminate execution when
input validation fails
}

This assumption is bogus.

Any program that accepts data from users should provide useful error
messages when the data is wrong with someting as simple as a string
being too long.

There is misunderstanding on this.
As I wrote explicitly in the RFC, input validation and user input
mistakes must be handled differently.

"The input validation (or think it as assertion or requirement) error"
that this RFC is dealing, is should never happen conditions (or think
it as contract should never fail).

The point of having the input validation is accept only inputs that
program expects and can work correctly. Accepting unexpected
data that program cannot work correctly is pointless.

Don't misunderstood me. I'm not saying "You should reject user input
mistakes".
"User input mistakes" and "input validation error" is totally different
error.

v) I don't like the current filter functions, and recommend people
avoid using them. Adding to them with an even harder to use API is the
wrong way to go.

I didn't recommend it either because it could not be used for input
validation easily, escaping or sanitization could be done for
dedicated API.

Having new module is one of my idea also. However, I realized many of
filter module codes could be reused after investigation. That's the
reason why I added to filter module. I also named new functions to
have "validate_" prefix, rather than "filter_" to emphasis it's
for validations. I renamed them to "filter_*" to comply CODING_STANDARDS.

This feedback is great because I'm worrying about the same thing.
Please feedback this kind of things during discussion so that I can
do something on issues.

Thank you for comments.
I think it's very helpful for improvements!

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Stanislav Malyshev — view source

unread

Hi!

It seems there is misunderstanding.
These new functions are intended for "secure coding input validation" that
should never fail. It means something unexpected in input data that
cannot/shouldn't keep program running. Why do you need to parse
message?

I think the problem here is as follows: assume you accept use input. You
want it to conform to some set of rules. If it does not, you may want to
inform the user that the input is wrong, in an informative way. Now, if
you say these functions "should never fail", it implies that before
them, there would be other functions filtering user input (because user
input could always violate whatever rules you'd have) - and then the
question is, would you really want two sets of validators? You'd
probably want one.
Now, when you have one, you probably want it to validate the data and
return some information that would be useful for informing the user what
has gone wrong. That seems to be the issue here.
I do think having strong input validation is a good thing. However, we'd
also need to have them in a way that would make them useful in above
scenario - otherwise people would avoid them because they fail "too
hard" and the app does not retain enough control over the outcome.

There is misunderstanding on this.
As I wrote explicitly in the RFC, input validation and user input
mistakes must be handled differently.

"The input validation (or think it as assertion or requirement) error"
that this RFC is dealing, is should never happen conditions (or think
it as contract should never fail).

This is what I'm not sure I understand - when this approach would be
used? I.e. if I get data from the user, I surely can not claim I can
impose any conditions on the data that would never fail. Is it assumed
I'd pre-filter the data before passing it to this filter?

The point of having the input validation is accept only inputs that
program expects and can work correctly. Accepting unexpected
data that program cannot work correctly is pointless.

Well, that depends on what you mean by "accepting". The program should
exhibit sane behavior (i.e., useful error message, not whitescreen or
something like that) on bad input. That behavior can be different -
i.e., if you are given wrong password, you shouldn't be too helpful and
say "this password is wrong, the right password is this: ...." (you'd
laugh but there was a real application doing this, no, I have no idea
what the developers were thinking :) but at least you could say
"authentication details are wrong".

Don't misunderstood me. I'm not saying "You should reject user input
mistakes".
"User input mistakes" and "input validation error" is totally different
error.

Here, again, I am not sure I understand the difference.

Stas Malyshev
smalyshev@gmail.com

8 years ago by Lester Caine — view source

unread

There is misunderstanding on this.

As I wrote explicitly in the RFC, input validation and user input
mistakes must be handled differently.

"The input validation (or think it as assertion or requirement) error"
that this RFC is dealing, is should never happen conditions (or think
it as contract should never fail).

This is what I'm not sure I understand - when this approach would be
used? I.e. if I get data from the user, I surely can not claim I can
impose any conditions on the data that would never fail. Is it assumed
I'd pre-filter the data before passing it to this filter?

Keeping things simple ...
I like your nice flowchart ... BUT

Input logic is a LOT more complex than that. I need to be able to use
the rule set that you are hiding in the filter to CREATE the page that
your little man is looking at. Those rules create the browser side
validation everybody seems to think is pointless, but is essential in
modern web apps? Those rules may well flag that if some 'variable'
already exists browser side actions can amend the workflow and load the
selected data. At the very least input validation once the input array
reaches the server may require different rules based on some of the
responses, and 'business logic' requires 'sanitized' variables to carry
out that process. Sanitation that may vary depending on the workflow.

Basically the simplistic view that everything can be reduced to a fixed
single chain is not what happens in reality. The output array to amend
the stored data WILL have a different set of variables depending on the
route through, so any filter needs to be able to be built from the set
of rules that the variables define. And in my own storage process, the
set of variables being stored are individual records in table who's
wrapping transaction must complete or roll back and add new failure
flags to the set of variables before deciding what to return to the user
screen. You can not assume your output process will complete without
errors and those errors will amend the rest of the chain.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Yasuo Ohgaki — view source

unread

Hi Stas,

It seems there is misunderstanding.
These new functions are intended for "secure coding input validation" that
should never fail. It means something unexpected in input data that
cannot/shouldn't keep program running. Why do you need to parse
message?

I think the problem here is as follows: assume you accept use input. You
want it to conform to some set of rules. If it does not, you may want to
inform the user that the input is wrong, in an informative way. Now, if
you say these functions "should never fail", it implies that before
them, there would be other functions filtering user input (because user
input could always violate whatever rules you'd have) - and then the
question is, would you really want two sets of validators? You'd
probably want one.
Now, when you have one, you probably want it to validate the data and
return some information that would be useful for informing the user what
has gone wrong. That seems to be the issue here.
I do think having strong input validation is a good thing. However, we'd
also need to have them in a way that would make them useful in above
scenario - otherwise people would avoid them because they fail "too
hard" and the app does not retain enough control over the outcome.

I think this discussion relates to following questions.
I'll try to explain there.

There is misunderstanding on this.
As I wrote explicitly in the RFC, input validation and user input
mistakes must be handled differently.

"The input validation (or think it as assertion or requirement) error"
that this RFC is dealing, is should never happen conditions (or think
it as contract should never fail).

This is what I'm not sure I understand - when this approach would be
used? I.e. if I get data from the user, I surely can not claim I can
impose any conditions on the data that would never fail. Is it assumed
I'd pre-filter the data before passing it to this filter?

How and what rules could be imposed to inputs varies depending on
what kind of data should be sent from outsides of a software including
human users.

Let's say your app validate user written/chosen "Date" on client side by
JavaScript. Then browser must send whatever "Date" format you impose
to client. It may be "YYYYMMDD", for example.

Then programer should not accept "Date" format other than "YYYYMMDD"
because other format is invalid. Accepting format other than "YYYYMMDD"
does only bad and increase risks of program malfunctioning. i.e. All kinds
of injections like JavaScript, SQL, Null char, Newline, etc.

The basic idea of secure coding input validation is to remove all unnecessary
security risks at "Input Validation".

Even when "Date" field is plain <input> that user can write any chars,
Null char, CR/LF, TAB or any CNTRL chars should not be in there. There will
be no users type in 100 chars for "Date" field unless they were trying to tamper
application.

"Input validation" should reject all of them and does not have to inform users
(attackers) to "there is invalid input". If you need to tell legitimate users
"There is invalid input", then it should be treated by "Business logic", not by
"Input validation".

The point of having the input validation is accept only inputs that
program expects and can work correctly. Accepting unexpected
data that program cannot work correctly is pointless.

Well, that depends on what you mean by "accepting". The program should
exhibit sane behavior (i.e., useful error message, not whitescreen or
something like that) on bad input. That behavior can be different -
i.e., if you are given wrong password, you shouldn't be too helpful and
say "this password is wrong, the right password is this: ...." (you'd
laugh but there was a real application doing this, no, I have no idea
what the developers were thinking :) but at least you could say
"authentication details are wrong".

User authentication could do the similar to "Date" field for "User name"
and "Password".

"User name" and "Password" shouldn't have CNTRL chars or invalid char
encoding. Even when fields are plain <input>, there shouldn't be 500 chars
long inputs for them.

Anything else for "User name" and "Password" should be handled by
"Business logic". Logic part should display nice and proper error messages
like

User name is too long for 100 chars name.
Password is too long for 100 chars password.
User name and/or Password is wrong and failed to authenticate.

Don't misunderstood me. I'm not saying "You should reject user input
mistakes".
"User input mistakes" and "input validation error" is totally different
error.

Here, again, I am not sure I understand the difference.

The reason why I propose to divide input error checks into "Input validation"
and "Business logic" is for simplicity and maintainability.

"Input validation" should be done not only for human entered inputs, but
also automatically generated inputs by system.

Generally speaking, developers should not accept request that has

Invalid browser headers:

Invalid REFERER contains Illegal/CTNRL chars and/or too many chars.
Invalid ACCEPT-CHARSET contains Illegal/CNTRL chars and/or too many chars.
Invalid ACCEPT-ENCODING contains Illegal/CNTRL chars and/or too many chars.
Invalid ACCEPT-LANGUAGE contains Illegal/CNTRL chars and/or too many chars.
and so on.

Invalid POST/GET request:

Lacks required field by your program. e.g. If you set CSRF token
for POST always, but it's missing.
Multi page form inputs and lack/have invalid data that should have
been validated previously. Note: there is design choice for this
where/how to deal with invalid inputs.
Program written data is invalid. e.g.
//php.net/show_bug.php?id=[string contains CNTRL chars and/or 100
chars or more]
$_POST/$_GET has more than 20 elements. Note: most apps/code would
not have this many elements.

Invalid COOKIE:

$_COOKIE has more than 20 elements. Note: normal apps would not
have this many cookies.
Lacks required field by your program.
Invalid chars. e.g. CNTRL chars.

All of these have history of abuse by attackers and programs should not
accept them. Please note that secure coding requires to output
securely. Input validation and output sanitization should be treated
as individual task. e.g. Escape all variables at "Output" code when
you output something to other software. Never assume, "This var is
validated at input, so it is safe without escaping."

It's developer's choice how to validate inputs, e.g. they don't use
"CONNECTION" HTTP header at all and don't care, but all of secure
coding related guides that I know of recommends/requires to validate
"all inputs".

Validating all inputs that are irrelevant to "Business logic" makes
programs complicated and hard to maintain. Broken char encoding, too
long/short, CNTRL chars for <form> inputs are better to handled by
"Input validation" because the same thing might be done by different

<form>s repeatedly.

There are many possibility for software design. This RFC is designed
to encourage to do certain validation. However, this RFC does not
impose developers to do certain validation, but provides tools that
are needed for validations.

I would not encourage users to disable exception from
filter_require_var()/filter_require_var_array(), but I've changed them
not to raise exception optionally as a last minute change. This allows
developers to use new validator for wider purposes.

Regards,

P.S. I'll extend vote period because there is ongoing discussion.

BTW, ISO 27000/ISMS requires/recommends proposed input validation.
Latest ISO 27000 mentioned as "adopt secure programming". Older
ISO 27000 explained how to validate inputs. New ISO 27000 removed
detailed input validation method explanation because secure programming
is widely adopted and standardized.

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Stanislav Malyshev — view source

unread

Hi!

Let's say your app validate user written/chosen "Date" on client side by
JavaScript. Then browser must send whatever "Date" format you impose
to client. It may be "YYYYMMDD", for example.

I'm not sure what Javascript has to do with it. Many apps don't have any
client-side and have little to do with Javascript. Assuming that whole
world is browser applications running Javascript (controlled by you)
would be a big mistake.

Then programer should not accept "Date" format other than "YYYYMMDD"
because other format is invalid. Accepting format other than "YYYYMMDD"
does only bad and increase risks of program malfunctioning. i.e. All kinds
of injections like JavaScript, SQL, Null char, Newline, etc.

What you mean by "accept" here? I think you are under impression (please
correct me if I'm wrong) that there are only two ways for application to
work - either treating all inputs equally, or bailing out immediately
when incorrect input is detected. However, this is not the case, there
are many other ways for application to handle the situation of invalid
input - while knowing it is invalid - and exact manner of this handling
is application-dependent.

"Input validation" should reject all of them and does not have to inform users
(attackers) to "there is invalid input". If you need to tell legitimate users

I think we disagree here. I think not doing this makes my work as a
developer much much harder.

"There is invalid input", then it should be treated by "Business logic", not by
"Input validation".

Wait, input validation happens before business logic has a chance to
run, so if input validation bails, how business logic can treat anything?

"User name" and "Password" shouldn't have CNTRL chars or invalid char
encoding. Even when fields are plain <input>, there shouldn't be 500 chars
long inputs for them.

So your proposal seems to be having two input checking procedures
instead of one. I don't think people would find it very useful to have
two separate input checking procedures.

--
Stas Malyshev
smalyshev@gmail.com

8 years ago by Lester Caine — view source

unread

"Input validation" should reject all of them and does not have to inform users

(attackers) to "there is invalid input". If you need to tell legitimate users

I think we disagree here. I think not doing this makes my work as a
developer much much harder.

I'm with you on this Stanislav ... we need to know what failed in order
to decide what to do about it. While simply crashing out was acceptable
15 years ago, nowadays knowing what attackers are after can be important?

( and the javascript thing is more a case of upgrading PHP examples to
use html5 validation by default )

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Christoph M. Becker — view source

unread

( and the javascript thing is more a case of upgrading PHP examples to
use html5 validation by default )

And thereby suggesting that HTML5 or JavaScript validation reduces the
need to do proper input validation on the PHP side? No, please.

--
Christoph M. Becker

8 years ago by Lester Caine — view source

unread

( and the javascript thing is more a case of upgrading PHP examples to
use html5 validation by default )

And thereby suggesting that HTML5 or JavaScript validation reduces the
need to do proper input validation on the PHP side? No, please.

SIMPLY to help new users understand how the whole process works. NOT
helping users with the whole process is what is wrong. And if you look
at may recent messages you will see that this is just a small part of
handling inputs. NOT doing something because you don't like it is not
helping newcomers. Best practice should prevail rather than 'it's not
PHP so we ignore it'.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Christoph M. Becker — view source

unread

( and the javascript thing is more a case of upgrading PHP examples to
use html5 validation by default )

And thereby suggesting that HTML5 or JavaScript validation reduces the
need to do proper input validation on the PHP side? No, please.

SIMPLY to help new users understand how the whole process works. NOT
helping users with the whole process is what is wrong. And if you look
at may recent messages you will see that this is just a small part of
handling inputs. NOT doing something because you don't like it is not
helping newcomers. Best practice should prevail rather than 'it's not
PHP so we ignore it'.

Okay, I get your point, and I'm not opposed to adding this information.

--
Christoph M. Becker

8 years ago by Yasuo Ohgaki — view source

unread

Hi Stas,

Let's say your app validate user written/chosen "Date" on client side by
JavaScript. Then browser must send whatever "Date" format you impose
to client. It may be "YYYYMMDD", for example.

I'm not sure what Javascript has to do with it. Many apps don't have any
client-side and have little to do with Javascript. Assuming that whole
world is browser applications running Javascript (controlled by you)
would be a big mistake.

I think you wrote your JavaScript code to impose certain format for "date",
"phone", "zip", etc. It's not my JavaScript code, but your JavaScript code
that defines output of browser to your PHP web apps.

Then programer should not accept "Date" format other than "YYYYMMDD"
because other format is invalid. Accepting format other than "YYYYMMDD"
does only bad and increase risks of program malfunctioning. i.e. All kinds
of injections like JavaScript, SQL, Null char, Newline, etc.

What you mean by "accept" here? I think you are under impression (please

Accept means that allow program to process input data. (continue execution)

correct me if I'm wrong) that there are only two ways for application to
work - either treating all inputs equally, or bailing out immediately
when incorrect input is detected. However, this is not the case, there
are many other ways for application to handle the situation of invalid
input - while knowing it is invalid - and exact manner of this handling
is application-dependent.

If your JavaScript date picker uses "YYYYMMDD" format (date like

for a date, anything other than "YYYYMMDD" format is
attacker tampered inputs.

It may be considered "valid input" means expected inputs from legitimate
users. Anything other than "valid input" should not be accepted because
they come from non legitimate users. i.e. attackers.

Broken encoding
CNTRL chars
Bad format ( YYYYMMDD is the format for this case )
Too long or short ( Exactly 8 chars is the length for this case )
and so on

are examples of invalid inputs.

"Input validation" should reject all of them and does not have to inform users
(attackers) to "there is invalid input". If you need to tell legitimate users

I think we disagree here. I think not doing this makes my work as a
developer much much harder.

It may increase your work, but you'll get less risks in return.
It's all about avoiding/mitigating possible risks with additional costs.

I know you've fixed many vulnerabilities in PHP. What's the best way to
avoid broken char encoding attacks in some libraries? Validating string
char encoding is the best way as nobody can guarantee correct behavior
with broken char encoding in a system.

i.e. There are many codes that misbehave with broken encodings.
Software is changed continuously. Even if one had 100% broken
encoding attack free code at certain point, it could be vulnerable
due to software version ups.

"There is invalid input", then it should be treated by "Business logic", not by
"Input validation".

Wait, input validation happens before business logic has a chance to
run, so if input validation bails, how business logic can treat anything?

The input validation only reject invalid input.

If you use plain <input> for "date", then you should consider any valid
UTF-8 without CNTRL chars up to 100 char or so, not "YYYYMMDD".
(Assuming UTF-8 is the encoding)

"User name" and "Password" shouldn't have CNTRL chars or invalid char
encoding. Even when fields are plain <input>, there shouldn't be 500 chars
long inputs for them.

So your proposal seems to be having two input checking procedures
instead of one. I don't think people would find it very useful to have
two separate input checking procedures.

If you blindly follow best practice that "Control/validate all inputs", then
previously mentioned inputs like browser request headers

Invalid REFERER contains Illegal/CTNRL chars and/or too many chars.
Invalid ACCEPT-CHARSET contains Illegal/CNTRL chars and/or too many chars.
Invalid ACCEPT-ENCODING contains Illegal/CNTRL chars and/or too many chars.
Invalid ACCEPT-LANGUAGE contains Illegal/CNTRL chars and/or too many chars.
and so on.

must be validated. I don't think it's a business logic job since it's
unrelated to most business logic.

Attackers are trying to tamper via any inputs that software accepts.
Therefore, developers are needed to try to reduce possible attack path
as much as possible. i.e. Close doors, don't open doors needlessly.
Code may not use them at all, but nobody can make sure they will never
be used when group of people are developing a software.

Input validation is better to be done ASAP. Exception is
canonicalization. Otherwise, SSRF could be possible via Controller in
MVC architecture, for instance.

You may ignoring unused input and/or accepting inputs, but there are
applications what requires lock solid security.

This input validation is a mitigation for "Oops!" and people does
"Oops!" on occasions like we do. It's very strong mitigation for
"Oops!". Therefore, input validation is listed as #1 item.

Software design is upto developers. There are many softwares that do
not follow best practices. Nobody enforce to use the validator as I
explains. It's okay to me this is used by only users who need. As I
mentioned, ISO 27000/ISMS requires this kind of validations, not few
users may need this.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Stanislav Malyshev — view source

unread

Hi!

I think you wrote your JavaScript code to impose certain format for "date",
"phone", "zip", etc. It's not my JavaScript code, but your JavaScript code
that defines output of browser to your PHP web apps.

There are a lot of use cases that don't have Javascript (or browser, for
that matter) frontends. Not that it should matter, as browser is not
under your code's control, it's under user's control, and you never know
on the PHP side if such thing as browser exists at all.

Accept means that allow program to process input data. (continue execution)

Then we disagree here. I strongly object to the notion that the program
should stop execution at any unexpected data. This is only marginally
better than crashing, and is not very helpful behavior. Modern
application is expected to handle data in a more intelligent manner.

If your JavaScript date picker uses "YYYYMMDD" format (date like

for a date, anything other than "YYYYMMDD" format is
attacker tampered inputs.

You keep returning to Javascript. What I am asking you to consider is
that we're not talking about Javascript, we are talking about PHP, and
PHP has no relation to Javascript date picker. Some apps use Javascript
date pickers, true, but there is a whole world of applications that do
nothing of the sort. Javascript does not add much to the picture here.

It may be considered "valid input" means expected inputs from legitimate
users. Anything other than "valid input" should not be accepted because
they come from non legitimate users. i.e. attackers.

Broken encoding

CNTRL chars

Bad format ( YYYYMMDD is the format for this case )

Too long or short ( Exactly 8 chars is the length for this case )

and so on

are examples of invalid inputs.

I think this has a smell of blacklisting, which is always almost wrong
approach to dealing with data filtering/validation. Blacklists almost
always lose to whitelists. If you have a whitelist filter that validates
the data, you don't need to worry about chars, lengths and such
separately. However, there's nothing here that requires the whitelist
filter would bring down the app on failure. It should tell the business
logic "this string you gave me is not a valid date" and business logic
should deal with it. There's nothing special here in encodings, control
chars, etc. that I can see that needs any special handling.

It may increase your work, but you'll get less risks in return.

I don't see how. I can write intelligent code that produces helpful
messages and be the same - in fact, more, see above about whitelisting -
robust than blackbox code that explodes on bad inputs.

The input validation only reject invalid input.

If you use plain <input> for "date", then you should consider any valid
UTF-8 without CNTRL chars up to 100 char or so, not "YYYYMMDD".
(Assuming UTF-8 is the encoding)

But why? If I just check for YYYYMMDD I automatically get all invalid
UTF-8 etc. rejected, without even thinking about it.

Software design is upto developers. There are many softwares that do
not follow best practices. Nobody enforce to use the validator as I
explains. It's okay to me this is used by only users who need. As I
mentioned, ISO 27000/ISMS requires this kind of validations, not few
users may need this.

I'm not sure what ISO says there, but I'm pretty sure ISO does not
specify which exactly code you should use to validate your inputs. The
objections are not about validating inputs as a concept, nobody would
object to that, it's to specific model of doing this that you propose -
namely, doing two separate validation passes, doing blacklisting and
bailing out on validation failure. At least I would not do input
validation in this manner.

--
Stas Malyshev
smalyshev@gmail.com

8 years ago by Yasuo Ohgaki — view source

unread

Hi Stas,

On Thu, Aug 18, 2016 at 10:34 AM, Stanislav Malyshev
smalyshev@gmail.com wrote:

I think you wrote your JavaScript code to impose certain format for "date",
"phone", "zip", etc. It's not my JavaScript code, but your JavaScript code
that defines output of browser to your PHP web apps.

There are a lot of use cases that don't have Javascript (or browser, for
that matter) frontends. Not that it should matter, as browser is not
under your code's control, it's under user's control, and you never know
on the PHP side if such thing as browser exists at all.

Even when there is no JavaScript nor HTML5 forms, input validations
can be done. It's matter of definition of "valid inputs" for <input type="text" name="var" />. If page encoding is UTF-8, web browsers
must return response by UTF-8 encoding. (Unless other encoding is
explicitly specified. Or some very very old browsers that can return
only SJIS, etc) 1MB input for <input type="text" name="var" /> is
possible. However, apps should accept 1MB data from <input type="text" name="var" /> should be rare if not none. 1KB would be far too large
for almost all apps. 100 bytes may be too large and acceptable to stop
normal program execution.

Accept means that allow program to process input data. (continue execution)

Then we disagree here. I strongly object to the notion that the program
should stop execution at any unexpected data. This is only marginally
better than crashing, and is not very helpful behavior. Modern
application is expected to handle data in a more intelligent manner.

We recently added number of
php_error_docref(E_ERROR, "Cannot process too large data");
in PHP core to avoid possible memory destruction attacks.

Why not for apps written by PHP?

Broken char encoding shouldn't came from legitimate users. Text
contains CNTRL chars from <input type="text" name="var" /> shouldn't
come from legitimate users. 1MB data from <input type="text" name="var" /> shouldn't come from legitimate users. Numeric database
record ID that is set by app shouldn't contain anything other than
digits. And so on.

If this kind of data is sent to apps, something like this message may
be displayed "Invalid inputs are detected. This incident is reported
to administrator to investigate the cause." and finish the page.

If your JavaScript date picker uses "YYYYMMDD" format (date like

for a date, anything other than "YYYYMMDD" format is
attacker tampered inputs.

You keep returning to Javascript. What I am asking you to consider is
that we're not talking about Javascript, we are talking about PHP, and
PHP has no relation to Javascript date picker. Some apps use Javascript
date pickers, true, but there is a whole world of applications that do
nothing of the sort. Javascript does not add much to the picture here.

I agree that there is data that cannot be validated by input
validator. File input is the one. However, there are many inputs that
can be validated and can terminate normal execution like previously
mentioned.

It may be considered "valid input" means expected inputs from legitimate
users. Anything other than "valid input" should not be accepted because
they come from non legitimate users. i.e. attackers.

Broken encoding

CNTRL chars

Bad format ( YYYYMMDD is the format for this case )

Too long or short ( Exactly 8 chars is the length for this case )

and so on

are examples of invalid inputs.

I think this has a smell of blacklisting, which is always almost wrong
approach to dealing with data filtering/validation. Blacklists almost
always lose to whitelists. If you have a whitelist filter that validates

I completely agree!!
Programmer must think of whitelist, not blacklist.

That's the reason why I used bold for whitelist way in
https://wiki.php.net/rfc/add_validate_functions_to_filter#secure_coding_basics

Broken char encoding (Accept only valid encoding)
NUL, etc control chars in string. (Accept only chars allowed)
Too long or too short string. e.g. JS validated values and values set
by server programs like <select>/<input type=radio>/etc, 100 chars for
username, 1000 chars for password, empty ID for a database record,
etc. (Accept only strings within range)

the data, you don't need to worry about chars, lengths and such
separately. However, there's nothing here that requires the whitelist
filter would bring down the app on failure. It should tell the business
logic "this string you gave me is not a valid date" and business logic
should deal with it. There's nothing special here in encodings, control
chars, etc. that I can see that needs any special handling.

It may increase your work, but you'll get less risks in return.

I don't see how. I can write intelligent code that produces helpful
messages and be the same - in fact, more, see above about whitelisting -
robust than blackbox code that explodes on bad inputs.

I agree that blacklist is erroneous. It can be broken easily. My RFC
description may not be good enough to emphasis on this, but I totally
agree that blacklist should be avoided whenever it is possible.

The difference between us is:

How to deal with bad inputs.

You seem you would like to treat as normal input.
I would reject bad inputs that shouldn't came from legitimate users.

You can do your way, since I made exception could be optional. Missing
part is getting useful error info when something wrong in inputs. I
can add function that retrieves error info. (I've removed it in favor
of exception object. So it can be added again easily)

The input validation only reject invalid input.

If you use plain <input> for "date", then you should consider any valid
UTF-8 without CNTRL chars up to 100 char or so, not "YYYYMMDD".
(Assuming UTF-8 is the encoding)

But why? If I just check for YYYYMMDD I automatically get all invalid
UTF-8 etc. rejected, without even thinking about it.

When plain <input> is used, users may type in any valid UTF-8 char by mistake.
For example, this wouldn't happen for date field, but autocomplete may
fill my name "大垣靖男" to name field that supposed to contain alphabets
only.

Software design is upto developers. There are many softwares that do
not follow best practices. Nobody enforce to use the validator as I
explains. It's okay to me this is used by only users who need. As I
mentioned, ISO 27000/ISMS requires this kind of validations, not few
users may need this.

I'm not sure what ISO says there, but I'm pretty sure ISO does not
specify which exactly code you should use to validate your inputs. The
objections are not about validating inputs as a concept, nobody would
object to that, it's to specific model of doing this that you propose -
namely, doing two separate validation passes, doing blacklisting and
bailing out on validation failure. At least I would not do input
validation in this manner.

No it does not. It explains how/what to check input.

If developers try to validate "all inputs", validation in MVC model is
not efficient nor reasonable. It does not make sense to validate
browser request headers in db model, for example. Ideally, input
validation is better to be done as fast as possible to maximize the
mitigation effect.

Defense in depth (multiple layers of defense) is common technique in
security. Implementation is developers choice. Ideal way does not have
to be implemented.

I'll revise patch so that it would be more useful for user input error
checks. Then, new features will be able to use for wider purposes.

I need to return useful error messages. Return value is already used.
Options are

Add reference parameter that will hold error info.
Add function that retrieves the last error info.

If anyone have better idea, it will be appreciated.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Stanislav Malyshev — view source

unread

Hi!

Even when there is no JavaScript nor HTML5 forms, input validations
can be done. It's matter of definition of "valid inputs" for <input type="text" name="var" />. If page encoding is UTF-8, web browsers
must return response by UTF-8 encoding. (Unless other encoding is

I think you're still missing my point. The point is that it is
absolutely irrelevant what browser might or might not do, since PHP does
not have any means to know if browsers even exist. PHP doesn't talk to
browser, it talks to HTTP channel (provided we're in webserver
scenario), what's on the other end is unknown and irrelevant. So there's
no point discussing browsers.

We recently added number of
php_error_docref(E_ERROR, "Cannot process too large data");
in PHP core to avoid possible memory destruction attacks.

We added it because we didn't have choice. PHP does not have generic
error mechanism that allows to fail an arbitrary function and still
continue execution. It's because PHP is highly complex C code and C is
not the most friendly language out there. Your app is not in C, so it
can do it differently.

If you talk about such situations, fine, but it's not input validation -
it's limitation of the environment (since PHP can't support arbitrary
length string). If your application has such limitations - fine, but it
would be application-defined and will not apply for most cases of input
validation.

Broken char encoding shouldn't came from legitimate users. Text
contains CNTRL chars from <input type="text" name="var" /> shouldn't
come from legitimate users. 1MB data from <input type="text" name="var" /> shouldn't come from legitimate users. Numeric database
record ID that is set by app shouldn't contain anything other than
digits. And so on.

I think you are mixing abnormal situations due to physical limitations
of software (like memory limits, etc.) with business logic. Numeric
format validation and size limits are clearly business logic. Encoding
may be not, depending on what the input is and used for.

Broken char encoding (Accept only valid encoding)
NUL, etc control chars in string. (Accept only chars allowed)
Too long or too short string. e.g. JS validated values and values set
by server programs like <select>/<input type=radio>/etc, 100 chars for
username, 1000 chars for password, empty ID for a database record,
etc. (Accept only strings within range)

These all fine filters/validators, and may be very useful in many
situations. What I still don't understand is insistence of application
dropping everything and exiting when one of them fails. We already have
sanitization/filtering infrastructure, we can add new filters and flags

what I don't understand, why we need parallel infrastructure which
seems to be only different by an unhelpful feature of crashing each time
it sees something unexpected. Am I missing something?

How to deal with bad inputs.

You seem you would like to treat as normal input.

No, you didn't understand. I would like to treat is as erroneous input,
but not stop the application immediately, but return error status to the
business logic and let it sort things out.

When plain <input> is used, users may type in any valid UTF-8 char by mistake.
For example, this wouldn't happen for date field, but autocomplete may
fill my name "大垣靖男" to name field that supposed to contain alphabets
only.

If the software is properly internationalized (like my email client)
there's absolutely nothing wrong with this string. If it is not, it
should check that the text matches its expectations - that's part of
business logic.

If developers try to validate "all inputs", validation in MVC model is
not efficient nor reasonable. It does not make sense to validate
browser request headers in db model, for example. Ideally, input
validation is better to be done as fast as possible to maximize the
mitigation effect.

If you use browser headers, you validate them. If you don't use them, no
point validating them, of course, since they are not your inputs.

--
Stas Malyshev
smalyshev@gmail.com

8 years ago by Lester Caine — view source

unread

Broken char encoding shouldn't came from legitimate users. Text

contains CNTRL chars from <input type="text" name="var" /> shouldn't
come from legitimate users. 1MB data from <input type="text" name="var" /> shouldn't come from legitimate users. Numeric database
record ID that is set by app shouldn't contain anything other than
digits. And so on.
I think you are mixing abnormal situations due to physical limitations
of software (like memory limits, etc.) with business logic. Numeric
format validation and size limits are clearly business logic. Encoding
may be not, depending on what the input is and used for.

Currently if the post data contained a large block of text how is that
handled in the $_POST array? If we have specified a validator that say
['note'] has a 1k limit, then only the first 1024 characters will be
usable so anything else can be scrapped. Yes I know that we have a
chicken and egg in that $_POST['note'] has to be created before we can
augment it with other information, and currently that happens by copying
$_POST['note'] to a well defined $note further down the chain, but how
difficult would it be for a set of annotations to be picked up as part
of the process of creating $_POST['note'] in the first place?

Even strict typing does not help here since all we have is 'string'
where even something as simple as 'short_string' for a 256 byte limit
string would help, but adding even a simple set of limits to the base
variables addresses the majority of what is being discussed? Even if you
leave the finer validation rules such as 'valid email' to later
'business' logic? But is it really that difficult to go from
'short_string' to 'email' as a validation rule?

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

Broken char encoding shouldn't came from legitimate users. Text

contains CNTRL chars from <input type="text" name="var" /> shouldn't
come from legitimate users. 1MB data from <input type="text" name="var" /> shouldn't come from legitimate users. Numeric database
record ID that is set by app shouldn't contain anything other than
digits. And so on.
I think you are mixing abnormal situations due to physical limitations
of software (like memory limits, etc.) with business logic. Numeric
format validation and size limits are clearly business logic. Encoding
may be not, depending on what the input is and used for.

Currently if the post data contained a large block of text how is that
handled in the $_POST array? If we have specified a validator that say
['note'] has a 1k limit, then only the first 1024 characters will be
usable so anything else can be scrapped. Yes I know that we have a
chicken and egg in that $_POST['note'] has to be created before we can
augment it with other information, and currently that happens by copying
$_POST['note'] to a well defined $note further down the chain, but how
difficult would it be for a set of annotations to be picked up as part
of the process of creating $_POST['note'] in the first place?

If $_POST['note'] is limited to 1KB by business logic and there is no
restriction on client side, have only a memo that "You can enter up to
1KB text", then I'll treat up to 10KB of text or more as "valid input".

Even strict typing does not help here since all we have is 'string'
where even something as simple as 'short_string' for a 256 byte limit
string would help, but adding even a simple set of limits to the base
variables addresses the majority of what is being discussed? Even if you
leave the finer validation rules such as 'valid email' to later
'business' logic? But is it really that difficult to go from
'short_string' to 'email' as a validation rule?

If you validate "email" as email on client side, then you can have
validation rule that rejects anything other than client side validation
rule. If you don't have client side validation(rule), then you should
treat them as normal strings in the input validation.

The input validation we are discussing is "Input/output rules between
client and server". It decides what's valid/invalid.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

The input validation we are discussing is "Input/output rules between
client and server". It decides what's valid/invalid.

I think I'm getting two things confused and am mixing your array
filtering RFC up with this one. There is so much speculative stuff being
discussed rather than trying to nail down key elements?

I am looking at the whole process, so I have client side validation with
is built from a set of rules added to the smarty templates. This still
has a couple of gaps where manual creation of javascript is still
needed, but that relates more to getting the validation working with
botstrap3. This gives me a clean set of post data, and if one could
ignore the morons then working with the $_POST array would be a doddle,
but because we live in the real world, it's the BUILDING of the $_POST
array when one can't trust the provider that we want to filter, and in
an ideal world the rules would be used for each variable as they are
added to the array, rather than post creating the array. One could
almost envisage a check that the post data packed IS too big for the set
of variables being returned and crash out, but simply throwing away
suspect data as each variable is built and having the logic to simply
create an exception on the first failure, only pass those fields that
are valid ensures the $_POST array matches the clients data array.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

The input validation we are discussing is "Input/output rules between
client and server". It decides what's valid/invalid.

I think I'm getting two things confused and am mixing your array
filtering RFC up with this one. There is so much speculative stuff being
discussed rather than trying to nail down key elements?

I am looking at the whole process, so I have client side validation with
is built from a set of rules added to the smarty templates. This still
has a couple of gaps where manual creation of javascript is still
needed, but that relates more to getting the validation working with
botstrap3. This gives me a clean set of post data, and if one could
ignore the morons then working with the $_POST array would be a doddle,
but because we live in the real world, it's the BUILDING of the $_POST
array when one can't trust the provider that we want to filter, and in
an ideal world the rules would be used for each variable as they are
added to the array, rather than post creating the array. One could
almost envisage a check that the post data packed IS too big for the set
of variables being returned and crash out, but simply throwing away
suspect data as each variable is built and having the logic to simply
create an exception on the first failure, only pass those fields that
are valid ensures the $_POST array matches the clients data array.

I might misunderstood you.
It seems you would like to validate inputs as convention rather than
configuration. e.g. Use variable names that specify what it should be,
for instance i_age is integer where "i_" is for integer. Or you would
like to build validation rule on the fly like if there is "age" in
input array, automatically validate it as "integer", "minimum=0",
"maximum=130".

If above is what you would like to achieve, you can do it by building
validation rule array on the fly. Something like

$validation_rules = get_default_rule_for_this_request();
foreach ($_POST as $key=>$value) {
if (!empty($valudation_rules[$key])) {
throw new Exception('You cannot override default rule of '.$key);
}
$validation_rules[$key] = get_validation_rule($key);
}
assert(filter_check_definition($validation_rules));
$mypost = filter_require_var_array($_POST, $validation_rules);

Is this what you want?

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

Hi Stas,

Even when there is no JavaScript nor HTML5 forms, input validations
can be done. It's matter of definition of "valid inputs" for <input type="text" name="var" />. If page encoding is UTF-8, web browsers
must return response by UTF-8 encoding. (Unless other encoding is

I think you're still missing my point. The point is that it is
absolutely irrelevant what browser might or might not do, since PHP does
not have any means to know if browsers even exist. PHP doesn't talk to
browser, it talks to HTTP channel (provided we're in webserver
scenario), what's on the other end is unknown and irrelevant. So there's
no point discussing browsers.

It's possible to design web pages/services to "unknown clients", but
it's exceptional cases.

Exceptions do not negate best practices. If there are cases that
should be handled exceptionally, it should be applied to that case
only, not in general.

Almost all systems have intended clients. If protocol is HTTP/HTTPS,
developers may reject strange data that cannot be right for
HTTP/HTTPS. Even higher level than PHP does this. i.e. HTTP servers
will rejects malformed and/or prohibited request and terminates
execution. Web Application Firewall does more fancy things and
terminates connection. (It does not even allow to reach web server) If
web apps check their requirements and terminate request does not
fulfill its requirements wouldn't matter at all.

Those who like WAF(Web Application Firewall), they may use WAF to
check more web server apps inputs. i.e. WAF filters are designed to
check inputs that attack signature and Web Apps does not
validate/check, in general. IMHO, use of WAF is more burden and
costly than the input validation that I'm proposing.

We recently added number of
php_error_docref(E_ERROR, "Cannot process too large data");
in PHP core to avoid possible memory destruction attacks.

We added it because we didn't have choice. PHP does not have generic
error mechanism that allows to fail an arbitrary function and still
continue execution. It's because PHP is highly complex C code and C is
not the most friendly language out there. Your app is not in C, so it
can do it differently.

If you talk about such situations, fine, but it's not input validation -
it's limitation of the environment (since PHP can't support arbitrary
length string). If your application has such limitations - fine, but it
would be application-defined and will not apply for most cases of input
validation.

Whether it is input or output validation is irrelevant. "Programs
terminate for insane input/output", like no available memory(PHP),
broken/insane HTTP/HTTPS request(HTTP server), impossible/invalid
inputs to Web apps(WAF).

My point is "program (or even connection) terminates" everywhere when
there is invalid data.

Web application developers have right to define "valid" inputs. ("have
right" does not mean "can do anything") PHP script termination for
invalid input is just one of terminations. It's nothing special.

Broken char encoding shouldn't came from legitimate users. Text
contains CNTRL chars from <input type="text" name="var" /> shouldn't
come from legitimate users. 1MB data from <input type="text" name="var" /> shouldn't come from legitimate users. Numeric database
record ID that is set by app shouldn't contain anything other than
digits. And so on.

I think you are mixing abnormal situations due to physical limitations
of software (like memory limits, etc.) with business logic. Numeric
format validation and size limits are clearly business logic. Encoding
may be not, depending on what the input is and used for.

I would impose certain limits in "the input validation", but if
program must return nice response for any request, then it must be in
business logic. I agree that. It's your rule after all.

Broken char encoding (Accept only valid encoding)
NUL, etc control chars in string. (Accept only chars allowed)
Too long or too short string. e.g. JS validated values and values set
by server programs like <select>/<input type=radio>/etc, 100 chars for
username, 1000 chars for password, empty ID for a database record,
etc. (Accept only strings within range)

These all fine filters/validators, and may be very useful in many
situations. What I still don't understand is insistence of application
dropping everything and exiting when one of them fails. We already have
sanitization/filtering infrastructure, we can add new filters and flags

what I don't understand, why we need parallel infrastructure which
seems to be only different by an unhelpful feature of crashing each time
it sees something unexpected. Am I missing something?

I think your premise is "Show nice error message for any errors,
proceed as normal case". (Handle invalid/insane data just like mistakes)

My premise is "Shouldn't show nice messages to attacker, terminate as
abnormal case". (Treat them as attack or serious system bug)

It's design choice. Either way is possible.

How to deal with bad inputs.

You seem you would like to treat as normal input.

No, you didn't understand. I would like to treat is as erroneous input,
but not stop the application immediately, but return error status to the
business logic and let it sort things out.

Now we are close to it!
Premise differs so opinion/view differs.

My premise is "Client and server have certain rules. Client inputs do
not follow rules(requirements) should be treated abnormal cases and
shouldn't be treated by business logic". Please note that

Valid input != logically correct or no mistakes

A rule could be "an integer may be any valid integers", but developer
may/can impose that an int value must be between 0 to 120, for
instance. Age 300 can't be true for human age, but if any integer is
allowed, this is valid input.

When plain <input> is used, users may type in any valid UTF-8 char by mistake.
For example, this wouldn't happen for date field, but autocomplete may
fill my name "大垣靖男" to name field that supposed to contain alphabets
only.

If the software is properly internationalized (like my email client)
there's absolutely nothing wrong with this string. If it is not, it
should check that the text matches its expectations - that's part of
business logic.

Error checks should be treated by business logic differs by
rules/requirements that developers can impose to client. Since it
depends on developer defined rules/requirements, let's talk about what
kind of rules/requirements can be defined.

If developers try to validate "all inputs", validation in MVC model is
not efficient nor reasonable. It does not make sense to validate
browser request headers in db model, for example. Ideally, input
validation is better to be done as fast as possible to maximize the
mitigation effect.

If you use browser headers, you validate them. If you don't use them, no
point validating them, of course, since they are not your inputs.

It's ok to design that way.

To maximize Input validation mitigation effect, developers are advised
to validate "all inputs" regardless of usage in business logic or
output code. It may be used in the future or may be used already by
some code you don't realize.

Let's talk about what could be validated because things cannot be
validated at input code do not belong to "the input validation"
anyway.

We know there are many inputs that could be validated by input code, don't we?

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Stanislav Malyshev — view source

unread

Hi!

It's possible to design web pages/services to "unknown clients", but
it's exceptional cases.

No, it's not. Every service exposed to the internet is for unknown clients.

Almost all systems have intended clients. If protocol is HTTP/HTTPS,
developers may reject strange data that cannot be right for
HTTP/HTTPS. Even higher level than PHP does this. i.e. HTTP servers
will rejects malformed and/or prohibited request and terminates
execution. Web Application Firewall does more fancy things and

True, because we can have no PHP business logic that can work on HTTPS
level. We can have PHP business logic that works on PHP level though.

validate/check, in general. IMHO, use of WAF is more burden and
costly than the input validation that I'm proposing.

I'm not sure where WAF comes in - these are two completely different uses.

Web application developers have right to define "valid" inputs. ("have
right" does not mean "can do anything") PHP script termination for
invalid input is just one of terminations. It's nothing special.

You can perfectly well have your app do anything you like - including
termination - on filter failure. I don't see how it necessitates making
new set of filters though?

I think your premise is "Show nice error message for any errors,
proceed as normal case". (Handle invalid/insane data just like mistakes)

My premise is "Shouldn't show nice messages to attacker, terminate as
abnormal case". (Treat them as attack or serious system bug)

It's design choice. Either way is possible.

Sure, and it's possible with current filters too. If the check fails,
and you want to terminate, what prevents you from doing:

if(!filter_var($var, FILTER_VALIDATE_INT) === false) { exit(); }

Am I missing some important point here?

Stas Malyshev
smalyshev@gmail.com

8 years ago by Yasuo Ohgaki — view source

unread

Hi Stas,

It's possible to design web pages/services to "unknown clients", but
it's exceptional cases.

No, it's not. Every service exposed to the internet is for unknown clients.

Right and wrong.

Every service exposed to the internet is for unknown clients, but we
can impose certain input/output rules defined by web application
developers and/or standards.

Almost all systems have intended clients. If protocol is HTTP/HTTPS,
developers may reject strange data that cannot be right for
HTTP/HTTPS. Even higher level than PHP does this. i.e. HTTP servers
will rejects malformed and/or prohibited request and terminates
execution. Web Application Firewall does more fancy things and

True, because we can have no PHP business logic that can work on HTTPS
level. We can have PHP business logic that works on PHP level though.

validate/check, in general. IMHO, use of WAF is more burden and
costly than the input validation that I'm proposing.

I'm not sure where WAF comes in - these are two completely different uses.

Not really.

The more application validates inputs, the less WAF rules for web app
are required. There is close relationship between web apps and WAF.

WAF can do more than web application like mitigating protocol/web
server/language vulnerabilities. This differs, but most web app
protection rules can be implemented in web app. Whitelist parameter
validation in WAF is hard, but it's easy in web app because developers
know what parameter should be exactly. e.g. "id" parameter could be
integers or strings like userid, WAF admins have to refer to code to
be precise while app developers know.

Web application developers have right to define "valid" inputs. ("have
right" does not mean "can do anything") PHP script termination for
invalid input is just one of terminations. It's nothing special.

You can perfectly well have your app do anything you like - including
termination - on filter failure. I don't see how it necessitates making
new set of filters though?

I think your premise is "Show nice error message for any errors,
proceed as normal case". (Handle invalid/insane data just like mistakes)

My premise is "Shouldn't show nice messages to attacker, terminate as
abnormal case". (Treat them as attack or serious system bug)

It's design choice. Either way is possible.

Sure, and it's possible with current filters too. If the check fails,
and you want to terminate, what prevents you from doing:

if(!filter_var($var, FILTER_VALIDATE_INT) === false) { exit(); }

Am I missing some important point here?

There is no string validation filter while string is the most dangerous input.

It does not allow multiple rules for an array element with filter_*_array().

Current validation filters are designed for filter and convert. e.g.
FILTER_VALIDATE_BOOL converts empty to FALSE.

Do you think it's nicer to have many lines of

// Assuming log_validation_error_and_exit($key_name, $value)
// is implemented properly.
if(!filter_var($_POST['int'], FILTER_VALIDATE_INT) === false) {
log_validation_error_and_exit('int', $_POST['int']); }
if(!filter_var($_POST['bool'], FILTER_VALIDATE_BOOL) === false) {
log_validation_error_and_exit('bool', $_POST['bool']); }
if(!filter_var($_POST['float'], FILTER_VALIDATE_FLOAT) === false) {
log_validation_error_and_exit('float', $_POST['float']); }
if(!filter_var($_POST['string'],
FILTER_VALIDATE_REGEX,array("options"=>array("regexp"=>"/^M(.*)/")))
=== false) { log_validation_error_and_exit('string',
$_POST['string']); }

rather than this?

// Assuming exception handler is set properly.
$post_def = array(
'int' => FILTER_VALIDATE_INT,
'bool' => FILTER_VALIDTE_BOOL,
'float' => FILTER_VALIDATE_FLOAT,
'string' => array(FILTER_VALIDTE_REGEX,
array("options"=>array("regexp"=>"/^M(.*)/")
),
);
filter_require_var_array($_POST, $post_def);

Try to implement "strict validation rules" with current filter,
especially with filter_*_array(). They do not work well as I mentioned
in the RFC.

add_validate_functions_to_filter#why_not_compare_filter_var_array_result

Why not compare filter_var_array() result?

Following code may seem to work, but it would not.

$ret = filter_var_array($arr, $validation_spec);
if ($ret != $arr) {
die('Input does not validate');
}

One should never compare float equality. (Float string is converted to
float type. Think of huge string value and result of float converted
value comparison.)
They are filter(conversion) functions. e.g. URLs are converted to lowercase.
It allows empty input by default and add NULL element.
int/float/bool validation filters trim and convert type. (They cannot
match by “==” comparison)

For these reasons, comparing original and return(filtered) value is
not suitable for strict input validation.

It's obvious to me current filter module needs improvements to write
better validation code.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

The input validation only reject invalid input.

If you use plain <input> for "date", then you should consider any valid
UTF-8 without CNTRL chars up to 100 char or so, not "YYYYMMDD".
(Assuming UTF-8 is the encoding)

But why? If I just check for YYYYMMDD I automatically get all invalid
UTF-8 etc. rejected, without even thinking about it.

Yasuo - If there is a bug in the client side process what ever that is
which causes something which YOU think is an invalid input then you
would consider everything is broken? Just where do you draw the line
between invalid input and incorrect input. If the YYYYMMDD has a couple
of duff UTF8 characters appended you crash out rather than simply simply
flagging the error? How do you distinguish between an attacker and a
naive user who simply does not know you can't use cut and paste to copy
something over because the OS will also copy all the hidden html along
with it?

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Yasuo Ohgaki — view source

unread

Hi Marco,

Besides what reported above by Dan, my reasoning for voting "no" is that
this API can be implemented in userland, regardless if trivial or not

There is no reason good enough for justifying yet another added endpoint
that can even be implemented with simple function composition.

OK. Thank you. You prefer full userland implementation.

In addition to that, the lack of a strongly typed data structure for the
validation DSL makes this proposed functionality very error-prone and
obnoxious to use and maintain for future additional use-case scenarios that
may come up.

This is good argument.
Filter module uses definition array already. The RFC adds check function for
them, too. New check function does not take care semantics, it may be
improved by having class for defining validation rule.

If we have to add more complex, yet robust input validation definitions as
array, all we have to do is adding version number, totally different
array structure or object for it.

Do you want me to drop filter_check_definition() proposal?
It's easy to write PHP code that does the same, and write it in the manual.

It's only there, because I'm expecting comments like you've made.
"There is no check function for rule definition array. It's dangerous"
or something like this.

Performance impact in userland implementations can be mitigated via codegen
there (similar to what Nikic's FastRoute lib): still less complicated than
relying on the core API, maintaining it in C code, and having it locked onto
the installed PHP version.

This is debatable how far PHP should implement mandatory features for
web applications.

Some may prefer PHP to be like Python or other normal languages that
does not have web application support in core at all.

Having a router in core is too much to me, too. However, I prefer PHP to
have basic features that is mandatory to write simple web forms. For
example,

<?php
require_once('my_exception_error_handler.php');
require_once('my_input_spec_def_for_this_file.php');

// Validate general requirement that cannot covered by filter_requrie*()
validate_inputs();
// These are came from this RFC. Validate inputs.
// What's to validate is design decision, but validating them all is the
// best way.
filter_require_var_array($_GET, $get_spec);
filter_require_var_array($_POST, $post_spec);
filter_require_var_array($_COOKIE, $cookie_spec);
filter_require_var_array($_SERVER, $server_spec);

session_start(['use_csrf_protection'=>1]); // There is RFC for this.

function check_user_input_error($today) {
if (strtotime($today) != date('Ymd')) {
$err_msg[] = 'You have entered invalid date. '. $today;
return $err_msg;
}
}

if ($_POST['submit']) {
$err_msg = check_user_input_error($_POST['Today']);
if (!(empty($err_msg))) {
// Save CSRF protected data into some DB
} else {
$_POST=array();
}
}

// We do need shorter/simpler/consistent escape functions somehow.
?>

<html><head></head> <body> <?php if (!empty($_POST)): ?> // Display client info and date You're using <?=html($_SERVER['USER_AGENT']; ?> <?php if (!empty($err_msg)) { <?=html($err_msg) ?> <?php else: ?> Yes, today is <?=html($_POST['date']); ?> <?php endif; ?> <?php else: ?> <form action="<?=html($_SERVER['REQUEST_URI'])?>" method="post" > Enter today's date: <input type="text" name="today" /> </form> <?php endif; ?> </body>

Simple web forms should be able to be written by PHP core feature
only. IMHO.

It's impossible to teach beginners how to write code for input validations.
As a result, the most important security feature, input validation, is
omitted in beginner courses/examples/etc.

It's great for beginners to understand what's going on Web apps and
what developers should do. It's useful for small web service that
requires the best performance possible as well.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

<?php
require_once('my_exception_error_handler.php');
Actually load framework ... and that is the first can of worms?

require_once('my_input_spec_def_for_this_file.php');
THIS is my sticking point ...
What it needs to load is the rules for all of fields that were in the
page build. For simple forms that is fine, but it does not scale. For
more complex form it has to scan the input array and build a list of
keys ... why not carry out checks while doing that scan? Makes building
an array of rules pointless, just use the individual rules ... except we
do not know where to get them.

// Validate general requirement that cannot covered by filter_requrie*()
validate_inputs();
// These are came from this RFC. Validate inputs.
// What's to validate is design decision, but validating them all is the
// best way.
filter_require_var_array($_GET, $get_spec);
filter_require_var_array($_POST, $post_spec);
filter_require_var_array($_COOKIE, $cookie_spec);
filter_require_var_array($_SERVER, $server_spec);
I have less of a problem with the STATIC stuff covered here EXCEPT that
would be handled as part of loading the framework and include checking
what state of handling the form you ARE at. In my case the session has
elements of handling multiple page forms.

session_start(['use_csrf_protection'=>1]); // There is RFC for this.

function check_user_input_error($today) {
if (strtotime($today) != date('Ymd')) {
$err_msg[] = 'You have entered invalid date. '. $today;
return $err_msg;
}
}

I will not waste time going through your long discussion on handling
dates it's the bain of my life! Especially when you get given 12/8/2016
and have no idea if it is August or December! So forms use date pickers
and browser validation. And the "'You have entered invalid date. '.
$today;" forms part of the set of rules used to build the form. Validate
that christening date is later than birth date has a suitable error
message as part of that rule set. Now cross checking those rules at this
stage is probably necessary but only if some hacker is trying to be
cleaver? But in my case business rules can be cross checked in the
database ... apps other than PHP may also be accessing it ( think
pigging phone apps :( ) ... so the same rules are needed in the database
business logic. And that is why populating the PHP rules data from the
database schema makes sense.

Simple web forms should be able to be written by PHP core feature
only. IMHO.

Just how many versions of 'login' or 'register' pages exist and all have
checks for valid username, password and email. And all use different
styles of managing the rules.

It's impossible to teach beginners how to write code for input validations.
As a result, the most important security feature, input validation, is
omitted in beginner courses/examples/etc.

It's great for beginners to understand what's going on Web apps and
what developers should do. It's useful for small web service that
requires the best performance possible as well.

We are on the same book, just coming at this from different ends. It's
that 'my_input_spec_def_for_this_file.php' which is the problem starting
with just where you get the set of rules from and not having a simple
beginner friendly method of adding those rules to a variable. There are
hundreds of legacy examples of how to do a login form, and the vast
majority use styles of programming that start a user down the wrong path
from day one ... so can we agree on something simple and easily
expandable? filter_require_var_array($_POST, $post_spec); could then use
those rules to build the $post_spec array, but why not just use the
rules direct?

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

<?php
require_once('my_exception_error_handler.php');
Actually load framework ... and that is the first can of worms?

require_once('my_input_spec_def_for_this_file.php');
THIS is my sticking point ...
What it needs to load is the rules for all of fields that were in the
page build. For simple forms that is fine, but it does not scale. For
more complex form it has to scan the input array and build a list of
keys ... why not carry out checks while doing that scan? Makes building
an array of rules pointless, just use the individual rules ... except we
do not know where to get them.

Almost all inputs for a request has fixed set of input that is defined by
application specification.

Your example is special case. You can choose whatever method fits
your needs and requirements.

// Validate general requirement that cannot covered by filter_requrie*()
validate_inputs();
// These are came from this RFC. Validate inputs.
// What's to validate is design decision, but validating them all is the
// best way.
filter_require_var_array($_GET, $get_spec);
filter_require_var_array($_POST, $post_spec);
filter_require_var_array($_COOKIE, $cookie_spec);
filter_require_var_array($_SERVER, $server_spec);
I have less of a problem with the STATIC stuff covered here EXCEPT that
would be handled as part of loading the framework and include checking
what state of handling the form you ARE at. In my case the session has
elements of handling multiple page forms.

session_start(['use_csrf_protection'=>1]); // There is RFC for this.

function check_user_input_error($today) {
if (strtotime($today) != date('Ymd')) {
$err_msg[] = 'You have entered invalid date. '. $today;
return $err_msg;
}
}

I will not waste time going through your long discussion on handling
dates it's the bain of my life! Especially when you get given 12/8/2016
and have no idea if it is August or December! So forms use date pickers

I'm lost. The contract between client and server is "$today must have
'YYYYMMDD' format".

and browser validation. And the "'You have entered invalid date. '.
$today;" forms part of the set of rules used to build the form. Validate
that christening date is later than birth date has a suitable error
message as part of that rule set. Now cross checking those rules at this
stage is probably necessary but only if some hacker is trying to be
cleaver? But in my case business rules can be cross checked in the

The design I'm presenting does not handle errors that has to display
errors message to users.

database ... apps other than PHP may also be accessing it ( think
pigging phone apps :( ) ... so the same rules are needed in the database
business logic. And that is why populating the PHP rules data from the
database schema makes sense.

It's the same input validation method as Rails. It works mostly, but it
does not sometimes.

Mass Assignment is one example.
(It's the same vulnerability as register_globals=On.
Difference is whether attack target is program variable or database field.
There is mitigation called strong parameters that prevents unwanted
database field updates.)

[CVE-2016-2098] Possible remote code execution vulnerability in Action Pack
https://groups.google.com/forum/#!topic/rubyonrails-security/ly-IH-fxr_Q
(It's "ID" validation oops! "ID" was used w/o validation and resulted
in arbitrary code execution.)

These fatal vulnerabilities could be prevented if Rails adopt the input
validation method described in the RFC.

Validations in model has issues:

Security model is based on RDB table definitions. Input data that
are not stored in RDB tends to be used without validation.
Validation is performed when data is stored into RDB. Validation
could be too late to be useful.
Since there is no validation in controller, it tends to allow
redirection to other systems without parameter validations. (SSRF)
Since security model is strongly tied to RDB, validation rule tends
to be weak. e.g. Defines only "require"(=NOT NULL) and don't care the
contents.
Model without RDB tends to be weaker/have issues because validation
rules for RDB tables may not be suitable to the task implemented by
the model.

Even though it has issues, validations in model works. However, it
does not work well if you try to validate "all inputs" from outside. For
instance, validating browser request headers and/or environment
variables in ActiveRecord does not make sense.

Rails security model is not optimized for security but for ease of use.
(The DRY - Don't Repeat Yourself) It works well for many apps, so it's
Ok to choose this security model. I'm presenting security optimized
input validation design. Security has tradeoff relation to cost and ease
of use. No wonder that you feel more burden for adopting more secure
way.

Simple web forms should be able to be written by PHP core feature
only. IMHO.

Just how many versions of 'login' or 'register' pages exist and all have
checks for valid username, password and email. And all use different
styles of managing the rules.

It's impossible to teach beginners how to write code for input validations.
As a result, the most important security feature, input validation, is
omitted in beginner courses/examples/etc.

It's great for beginners to understand what's going on Web apps and
what developers should do. It's useful for small web service that
requires the best performance possible as well.

We are on the same book, just coming at this from different ends. It's
that 'my_input_spec_def_for_this_file.php' which is the problem starting
with just where you get the set of rules from and not having a simple
beginner friendly method of adding those rules to a variable. There are
hundreds of legacy examples of how to do a login form, and the vast
majority use styles of programming that start a user down the wrong path
from day one ... so can we agree on something simple and easily
expandable? filter_require_var_array($_POST, $post_spec); could then use
those rules to build the $post_spec array, but why not just use the
rules direct?

"Direct" means like this?

$mybool = filter_require_var($_POST['mybool'], FILTER_VALIDATE_BOOLEAN);

It's possible write code with filter_var()/filter_requrie_var(). It
requires many lines looks mostly the same.

Anyway, it seems you have strong opinion for your security model. It's
okay to adopt design fits your needs and requirements. For example,
authentication by username & password is weaker than 2 factor
authentication, but it's still acceptable. DbC is excellent way to
build secure and fast apps, but adoption is choice of developers.

Nobody forces you to use "better way". There are many "better way" in fact.
I wouldn't forces you. You are better not to force your "better way" because
it may not be better to others. If requirements/objectives differ, "better way"
changes.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

We are on the same book, just coming at this from different ends. It's

that 'my_input_spec_def_for_this_file.php' which is the problem starting
with just where you get the set of rules from and not having a simple
beginner friendly method of adding those rules to a variable. There are
hundreds of legacy examples of how to do a login form, and the vast
majority use styles of programming that start a user down the wrong path
from day one ... so can we agree on something simple and easily
expandable? filter_require_var_array($_POST, $post_spec); could then use
those rules to build the $post_spec array, but why not just use the
rules direct?
"Direct" means like this?

$mybool = filter_require_var($_POST['mybool'], FILTER_VALIDATE_BOOLEAN);

It's possible write code with filter_var()/filter_requrie_var(). It
requires many lines looks mostly the same.

Anyway, it seems you have strong opinion for your security model. It's
okay to adopt design fits your needs and requirements. For example,
authentication by username & password is weaker than 2 factor
authentication, but it's still acceptable. DbC is excellent way to
build secure and fast apps, but adoption is choice of developers.

Actually I have no objection to your array checker. What I am trying to
point out is that POPULATING the array to make your filter work is much
better populated from a 'by variable' storage of the full set of rules
needed to complete ALL validating checks.

MY understanding is that the $_REQUEST array is built from the content
of the post message? This involves building a set of variables to work
with? I am looking at the dump of a current $_REQUEST and rather than
extracting the keys for the array, processing to build your filter rule
array, and applying that against the $_REQUEST array, I would prefer
that the sort of filtering you are talking about happened prior to
creating the variables IN the array ... and one can potentially add the
full validation rules to the variables, in the process.

What we are differing on is just where unusable data can be ignored, and
if you want to kill processing the $_REQUEST array as soon as you detect
corruptions in the data why not do that if the first variable CREATED is
corrupt? No need to load all the rest of the post/get data? On the other
hand one can create all variables and only populate the ones that do
have valid data. This is perhaps where the exception model is simply on
alternate to an error handling model. In my model I can stop processing
the $_REQUEST array if I establish there is an existing record so there
is no need to process any of the rest of the post data. Or at least I
could if the basic variables had the ability to carry out validation
while BUILDING the $_REQUEST array. In exception mode the first
exception kills the process. In error mode we simply decide how to
handle the problem which may then involve checking what attack mechanism
is being used ... or simply say invalid data and reload the the form. If
it was just a problem with one field then that field can be ignored
while the rest can be mirrored back to the form.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Pierre Joye — view source

unread

Hi Dan,

I understood about RFC process.

On Wed, Aug 17, 2016 at 12:23 PM, Dan Ackroyd danack@basereality.com
wrote:

Additionally, you seem to completely have ignored this:

Dan Ackroyd wrote:

And I strongly object to the idea of stopping and starting voting on
RFCS. Please leave the vote open and if it fails take some time to think
about the feedback.

It would benefit everyone if you stopped responding immediately and
instead took time to actually think about what people have been
saying. This RFC isn't going to be in PHP 7.1, so it is fine to wait 3
months to present a new version of the RFC.

It seems I've marked "already read" by mistake.
Thank you for reminding.
I got that you prefer userland implementation.

I'm planning to propose "Filter module deprecation" when this RFC
is declined, because current validation filter is not good enough to
do the job and makes situation worse than better... If deprecation
RFC is declined also, then I might try to improve this RFC again.

I already can say I will vote no for that one.

I use filter a lot and it fits my needs quite well. I would prefer better
api like method->get... But what we have already allows me to do quick&eazy
filtering.

8 years ago by Yasuo Ohgaki — view source

unread

Hi Pierre,

I'm planning to propose "Filter module deprecation" when this RFC
is declined, because current validation filter is not good enough to
do the job and makes situation worse than better... If deprecation
RFC is declined also, then I might try to improve this RFC again.

I already can say I will vote no for that one.

I use filter a lot and it fits my needs quite well. I would prefer better
api like method->get... But what we have already allows me to do quick&eazy
filtering.

Thank you for comment.
Current filter is good enough if app design allows to convert some
inputs to safe inputs. I agree. I would like to keep filter module, so
I'm proposing strict validator features.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

[RFC][VOTE] Add validation functions to filter module

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

Regards,

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

Regards,

Regards,

Regards,

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

Here, again, I am not sure I understand the difference.

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

Am I missing some important point here?

https://wiki.php.net/rfc/add_validate_functions_to_filter#why_not_compare_filter_var_array_result

For these reasons, comparing original and return(filtered) value is not suitable for strict input validation.

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

For these reasons, comparing original and return(filtered) value is
not suitable for strict input validation.

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL