Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:95244 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 20423 invoked from network); 16 Aug 2016 23:10:04 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Aug 2016 23:10:04 -0000 Authentication-Results: pb1.pair.com smtp.mail=yohgaki@ohgaki.net; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=yohgaki@ohgaki.net; sender-id=pass Received-SPF: pass (pb1.pair.com: domain ohgaki.net designates 180.42.98.130 as permitted sender) X-PHP-List-Original-Sender: yohgaki@ohgaki.net X-Host-Fingerprint: 180.42.98.130 ns1.es-i.jp Received: from [180.42.98.130] ([180.42.98.130:33969] helo=es-i.jp) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 45/40-18246-A4D93B75 for ; Tue, 16 Aug 2016 19:10:04 -0400 Received: (qmail 6654 invoked by uid 89); 16 Aug 2016 23:09:59 -0000 Received: from unknown (HELO mail-qk0-f170.google.com) (yohgaki@ohgaki.net@209.85.220.170) by 0 with ESMTPA; 16 Aug 2016 23:09:59 -0000 Received: by mail-qk0-f170.google.com with SMTP id z190so54347956qkc.0 for ; Tue, 16 Aug 2016 16:09:58 -0700 (PDT) X-Gm-Message-State: AEkoousT7m0v2Ahb2W5qPHySAow/Ga2aLrYfePZM+oiX4qexDoS6HpvEghQKXXDHubCcakxNUQ/QigXx20z0Qg== X-Received: by 10.55.135.198 with SMTP id j189mr24893506qkd.60.1471388992963; Tue, 16 Aug 2016 16:09:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.85.242 with HTTP; Tue, 16 Aug 2016 16:09:12 -0700 (PDT) In-Reply-To: <7795ca21-bd70-fe65-9519-af95fdfee33f@gmail.com> References: <7795ca21-bd70-fe65-9519-af95fdfee33f@gmail.com> Date: Wed, 17 Aug 2016 08:09:12 +0900 X-Gmail-Original-Message-ID: Message-ID: To: Stanislav Malyshev Cc: Marco Pivetta , Dan Ackroyd , PHP Internals List Content-Type: text/plain; charset=UTF-8 Subject: Re: [PHP-DEV] Re: [RFC][VOTE] Add validation functions to filter module From: yohgaki@ohgaki.net (Yasuo Ohgaki) Hi Stas, On Mon, Aug 15, 2016 at 2:17 PM, Stanislav Malyshev wrote: >> It seems there is misunderstanding. >> These new functions are intended for "secure coding input validation" that >> should never fail. It means something unexpected in input data that >> cannot/shouldn't keep program running. Why do you need to parse >> message? > > I think the problem here is as follows: assume you accept use input. You > want it to conform to some set of rules. If it does not, you may want to > inform the user that the input is wrong, in an informative way. Now, if > you say these functions "should never fail", it implies that before > them, there would be other functions filtering user input (because user > input could always violate whatever rules you'd have) - and then the > question is, would you really want *two* sets of validators? You'd > probably want one. > Now, when you have one, you probably want it to validate the data and > return some information that would be useful for informing the user what > has gone wrong. That seems to be the issue here. > I do think having strong input validation is a good thing. However, we'd > also need to have them in a way that would make them useful in above > scenario - otherwise people would avoid them because they fail "too > hard" and the app does not retain enough control over the outcome. I think this discussion relates to following questions. I'll try to explain there. > >> There is misunderstanding on this. >> As I wrote explicitly in the RFC, input validation and user input >> mistakes must be handled differently. >> >> "The input validation (or think it as assertion or requirement) error" >> that this RFC is dealing, is should never happen conditions (or think >> it as contract should never fail). > > This is what I'm not sure I understand - when this approach would be > used? I.e. if I get data from the user, I surely can not claim I can > impose any conditions on the data that would never fail. Is it assumed > I'd pre-filter the data before passing it to this filter? How and what rules could be imposed to inputs varies depending on what kind of data should be sent from outsides of a software including human users. Let's say your app validate user written/chosen "Date" on client side by JavaScript. Then browser must send whatever "Date" format you impose to client. It may be "YYYYMMDD", for example. Then programer should not accept "Date" format other than "YYYYMMDD" because other format is invalid. Accepting format other than "YYYYMMDD" does only bad and increase risks of program malfunctioning. i.e. All kinds of injections like JavaScript, SQL, Null char, Newline, etc. The basic idea of secure coding input validation is to remove all unnecessary security risks at "Input Validation". Even when "Date" field is plain that user can write any chars, Null char, CR/LF, TAB or any CNTRL chars should not be in there. There will be no users type in 100 chars for "Date" field unless they were trying to tamper application. "Input validation" should reject all of them and does not have to inform users (attackers) to "there is invalid input". If you need to tell legitimate users "There is invalid input", then it should be treated by "Business logic", not by "Input validation". > >> The point of having the input validation is accept only inputs that >> program expects and can work correctly. Accepting unexpected >> data that program cannot work correctly is pointless. > > Well, that depends on what you mean by "accepting". The program should > exhibit sane behavior (i.e., useful error message, not whitescreen or > something like that) on bad input. That behavior can be different - > i.e., if you are given wrong password, you shouldn't be too helpful and > say "this password is wrong, the right password is this: ...." (you'd > laugh but there *was* a real application doing this, no, I have no idea > what the developers were thinking :) but at least you could say > "authentication details are wrong". User authentication could do the similar to "Date" field for "User name" and "Password". "User name" and "Password" shouldn't have CNTRL chars or invalid char encoding. Even when fields are plain , there shouldn't be 500 chars long inputs for them. Anything else for "User name" and "Password" should be handled by "Business logic". Logic part should display nice and proper error messages like - User name is too long for 100 chars name. - Password is too long for 100 chars password. - User name and/or Password is wrong and failed to authenticate. >> Don't misunderstood me. I'm not saying "You should reject user input >> mistakes". >> "User input mistakes" and "input validation error" is totally different >> error. > > Here, again, I am not sure I understand the difference. The reason why I propose to divide input error checks into "Input validation" and "Business logic" is for simplicity and maintainability. "Input validation" should be done not only for human entered inputs, but also automatically generated inputs by system. Generally speaking, developers should not accept request that has Invalid browser headers: - Invalid REFERER contains Illegal/CTNRL chars and/or too many chars. - Invalid ACCEPT-CHARSET contains Illegal/CNTRL chars and/or too many chars. - Invalid ACCEPT-ENCODING contains Illegal/CNTRL chars and/or too many chars. - Invalid ACCEPT-LANGUAGE contains Illegal/CNTRL chars and/or too many chars. - and so on. Invalid POST/GET request: - Lacks required field by your program. e.g. If you set CSRF token for POST always, but it's missing. - Multi page form inputs and lack/have invalid data that should have been validated previously. Note: there is design choice for this where/how to deal with invalid inputs. - Program written data is invalid. e.g. //php.net/show_bug.php?id=[string contains CNTRL chars and/or 100 chars or more] - $_POST/$_GET has more than 20 elements. Note: most apps/code would not have this many elements. Invalid COOKIE: - $_COOKIE has more than 20 elements. Note: normal apps would not have this many cookies. - Lacks required field by your program. - Invalid chars. e.g. CNTRL chars. All of these have history of abuse by attackers and programs should not accept them. Please note that secure coding requires to output securely. Input validation and output sanitization should be treated as individual task. e.g. Escape all variables at "Output" code when you output something to other software. Never assume, "This var is validated at input, so it is safe without escaping." It's developer's choice how to validate inputs, e.g. they don't use "CONNECTION" HTTP header at all and don't care, but all of secure coding related guides that I know of recommends/requires to validate "all inputs". Validating all inputs that are irrelevant to "Business logic" makes programs complicated and hard to maintain. Broken char encoding, too long/short, CNTRL chars for
inputs are better to handled by "Input validation" because the same thing might be done by different s repeatedly. There are many possibility for software design. This RFC is designed to encourage to do certain validation. However, this RFC does not impose developers to do certain validation, but provides tools that are needed for validations. I would not encourage users to disable exception from filter_require_var()/filter_require_var_array(), but I've changed them not to raise exception optionally as a last minute change. This allows developers to use new validator for wider purposes. Regards, P.S. I'll extend vote period because there is ongoing discussion. BTW, ISO 27000/ISMS requires/recommends proposed input validation. Latest ISO 27000 mentioned as "adopt secure programming". Older ISO 27000 explained how to validate inputs. New ISO 27000 removed detailed input validation method explanation because secure programming is widely adopted and standardized. -- Yasuo Ohgaki yohgaki@ohgaki.net