Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:95284 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 71858 invoked from network); 18 Aug 2016 06:10:19 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 18 Aug 2016 06:10:19 -0000 Authentication-Results: pb1.pair.com header.from=yohgaki@ohgaki.net; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=yohgaki@ohgaki.net; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain ohgaki.net designates 180.42.98.130 as permitted sender) X-PHP-List-Original-Sender: yohgaki@ohgaki.net X-Host-Fingerprint: 180.42.98.130 ns1.es-i.jp Received: from [180.42.98.130] ([180.42.98.130:37317] helo=es-i.jp) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 84/D6-23968-74155B75 for ; Thu, 18 Aug 2016 02:10:18 -0400 Received: (qmail 97470 invoked by uid 89); 18 Aug 2016 06:10:12 -0000 Received: from unknown (HELO mail-qt0-f178.google.com) (yohgaki@ohgaki.net@209.85.216.178) by 0 with ESMTPA; 18 Aug 2016 06:10:12 -0000 Received: by mail-qt0-f178.google.com with SMTP id u25so3938465qtb.1 for ; Wed, 17 Aug 2016 23:10:11 -0700 (PDT) X-Gm-Message-State: AEkoouvqLYOgxENJu1x6fuvTmSUd2CklUhC7uoxu73Mnna/jLENgNEcPUoUJjkMtPZTMbVSh7+6nFuxdDwxAmQ== X-Received: by 10.237.53.206 with SMTP id d14mr528341qte.83.1471500603915; Wed, 17 Aug 2016 23:10:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.85.242 with HTTP; Wed, 17 Aug 2016 23:09:22 -0700 (PDT) In-Reply-To: <40279244-a1ba-2680-8a14-89708bcd1852@gmail.com> References:

<7795ca21-bd70-fe65-9519-af95fdfee33f@gmail.com>

<40279244-a1ba-2680-8a14-89708bcd1852@gmail.com> Date: Thu, 18 Aug 2016 15:09:22 +0900 X-Gmail-Original-Message-ID: Message-ID: To: Stanislav Malyshev Cc: Marco Pivetta , Dan Ackroyd , PHP Internals List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Re: [RFC][VOTE] Add validation functions to filter module From: yohgaki@ohgaki.net (Yasuo Ohgaki) Hi Stas, On Thu, Aug 18, 2016 at 10:34 AM, Stanislav Malyshev wrote: >> I think you wrote your JavaScript code to impose certain format for "dat= e", >> "phone", "zip", etc. It's not my JavaScript code, but your JavaScript co= de >> that defines output of browser to your PHP web apps. > > There are a lot of use cases that don't have Javascript (or browser, for > that matter) frontends. Not that it should matter, as browser is not > under your code's control, it's under user's control, and you never know > on the PHP side if such thing as browser exists at all. Even when there is no JavaScript nor HTML5 forms, input validations can be done. It's matter of definition of "valid inputs" for . If page encoding is UTF-8, web browsers must return response by UTF-8 encoding. (Unless other encoding is explicitly specified. Or some very very old browsers that can return only SJIS, etc) 1MB input for is possible. However, apps should accept 1MB data from should be rare if not none. 1KB would be far too large for almost all apps. 100 bytes may be too large and acceptable to stop normal program execution. > >> Accept means that allow program to process input data. (continue executi= on) > > Then we disagree here. I strongly object to the notion that the program > should stop execution at any unexpected data. This is only marginally > better than crashing, and is not very helpful behavior. Modern > application is expected to handle data in a more intelligent manner. We recently added number of php_error_docref(E_ERROR, "Cannot process too large data"); in PHP core to avoid possible memory destruction attacks. Why not for apps written by PHP? Broken char encoding shouldn't came from legitimate users. Text contains CNTRL chars from shouldn't come from legitimate users. 1MB data from shouldn't come from legitimate users. Numeric database record ID that is set by app shouldn't contain anything other than digits. And so on. If this kind of data is sent to apps, something like this message may be displayed "Invalid inputs are detected. This incident is reported to administrator to investigate the cause." and finish the page. > >> If your JavaScript date picker uses "YYYYMMDD" format (date like >> 20160817) for a date, anything other than "YYYYMMDD" format is >> attacker tampered inputs. > > You keep returning to Javascript. What I am asking you to consider is > that we're not talking about Javascript, we are talking about PHP, and > PHP has no relation to Javascript date picker. Some apps use Javascript > date pickers, true, but there is a whole world of applications that do > nothing of the sort. Javascript does not add much to the picture here. I agree that there is data that cannot be validated by input validator. File input is the one. However, there are many inputs that can be validated and can terminate normal execution like previously mentioned. > >> It may be considered "valid input" means expected inputs from _legitimat= e_ >> users. Anything other than "valid input" should not be accepted because >> they come from non legitimate users. i.e. attackers. >> >> - Broken encoding >> - CNTRL chars >> - Bad format ( YYYYMMDD is the format for this case ) >> - Too long or short ( Exactly 8 chars is the length for this case ) >> - and so on >> >> are examples of invalid inputs. > > I think this has a smell of blacklisting, which is always almost wrong > approach to dealing with data filtering/validation. Blacklists almost > always lose to whitelists. If you have a whitelist filter that validates I completely agree!! Programmer must think of whitelist, not blacklist. That's the reason why I used bold for whitelist way in https://wiki.php.net/rfc/add_validate_functions_to_filter#secure_coding_bas= ics Broken char encoding (Accept only valid encoding) NUL, etc control chars in string. (Accept only chars allowed) Too long or too short string. e.g. JS validated values and values set by server programs like /etc, 100 chars for username, 1000 chars for password, empty ID for a database record, etc. (Accept only strings within range) > the data, you don't need to worry about chars, lengths and such > separately. However, there's nothing here that requires the whitelist > filter would bring down the app on failure. It should tell the business > logic "this string you gave me is not a valid date" and business logic > should deal with it. There's nothing special here in encodings, control > chars, etc. that I can see that needs any special handling. > >> It may increase your work, but you'll get less risks in return. > > I don't see how. I can write intelligent code that produces helpful > messages and be the same - in fact, more, see above about whitelisting - > robust than blackbox code that explodes on bad inputs. I agree that blacklist is erroneous. It can be broken easily. My RFC description may not be good enough to emphasis on this, but I totally agree that blacklist should be avoided whenever it is possible. The difference between us is: How to deal with bad inputs. - You seem you would like to treat as normal input. - I would reject bad inputs that shouldn't came from legitimate users. You can do your way, since I made exception could be optional. Missing part is getting useful error info when something wrong in inputs. I can add function that retrieves error info. (I've removed it in favor of exception object. So it can be added again easily) > >> The input validation only reject invalid input. >> >> If you use plain for "date", then you should consider any valid >> UTF-8 without CNTRL chars up to 100 char or so, not "YYYYMMDD". >> (Assuming UTF-8 is the encoding) > > But why? If I just check for YYYYMMDD I automatically get all invalid > UTF-8 etc. rejected, without even thinking about it. When plain is used, users may type in any valid UTF-8 char by mista= ke. For example, this wouldn't happen for date field, but autocomplete may fill my name "=E5=A4=A7=E5=9E=A3=E9=9D=96=E7=94=B7" to name field that supp= osed to contain alphabets only. > >> Software design is upto developers. There are many softwares that do >> not follow best practices. Nobody enforce to use the validator as I >> explains. It's okay to me this is used by only users who need. As I >> mentioned, ISO 27000/ISMS requires this kind of validations, not few >> users may need this. > > I'm not sure what ISO says there, but I'm pretty sure ISO does not > specify which exactly code you should use to validate your inputs. The > objections are not about validating inputs as a concept, nobody would > object to that, it's to specific model of doing this that you propose - > namely, doing two separate validation passes, doing blacklisting and > bailing out on validation failure. At least I would not do input > validation in this manner. No it does not. It explains how/what to check input. If developers try to validate "all inputs", validation in MVC model is not efficient nor reasonable. It does not make sense to validate browser request headers in db model, for example. Ideally, input validation is better to be done as fast as possible to maximize the mitigation effect. Defense in depth (multiple layers of defense) is common technique in security. Implementation is developers choice. Ideal way does not have to be implemented. I'll revise patch so that it would be more useful for user input error checks. Then, new features will be able to use for wider purposes. I need to return useful error messages. Return value is already used. Options are - Add reference parameter that will hold error info. - Add function that retrieves the last error info. If anyone have better idea, it will be appreciated. Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net