Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:95332 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 96699 invoked from network); 20 Aug 2016 07:17:00 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 Aug 2016 07:17:00 -0000 Authentication-Results: pb1.pair.com smtp.mail=yohgaki@ohgaki.net; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=yohgaki@ohgaki.net; sender-id=pass Received-SPF: pass (pb1.pair.com: domain ohgaki.net designates 180.42.98.130 as permitted sender) X-PHP-List-Original-Sender: yohgaki@ohgaki.net X-Host-Fingerprint: 180.42.98.130 ns1.es-i.jp Received: from [180.42.98.130] ([180.42.98.130:42669] helo=es-i.jp) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 98/B2-03566-8E308B75 for ; Sat, 20 Aug 2016 03:16:59 -0400 Received: (qmail 110473 invoked by uid 89); 20 Aug 2016 07:16:52 -0000 Received: from unknown (HELO mail-qk0-f174.google.com) (yohgaki@ohgaki.net@209.85.220.174) by 0 with ESMTPA; 20 Aug 2016 07:16:52 -0000 Received: by mail-qk0-f174.google.com with SMTP id t7so58457589qkh.1 for ; Sat, 20 Aug 2016 00:16:51 -0700 (PDT) X-Gm-Message-State: AEkoouuPcMPYnCsnNP9khSxubpPeY04eeMTspSsvUVbrVJaOKzY65J52GD+8dJ/fdFFpAa1DGxZon3UdBQVRag== X-Received: by 10.55.39.81 with SMTP id n78mr13086425qkn.10.1471677405822; Sat, 20 Aug 2016 00:16:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.85.242 with HTTP; Sat, 20 Aug 2016 00:16:04 -0700 (PDT) In-Reply-To: References: <7795ca21-bd70-fe65-9519-af95fdfee33f@gmail.com> <40279244-a1ba-2680-8a14-89708bcd1852@gmail.com> Date: Sat, 20 Aug 2016 16:16:04 +0900 X-Gmail-Original-Message-ID: Message-ID: To: Stanislav Malyshev Cc: Marco Pivetta , Dan Ackroyd , PHP Internals List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Re: [RFC][VOTE] Add validation functions to filter module From: yohgaki@ohgaki.net (Yasuo Ohgaki) Hi Stas, On Thu, Aug 18, 2016 at 3:54 PM, Stanislav Malyshev w= rote: >> Even when there is no JavaScript nor HTML5 forms, input validations >> can be done. It's matter of definition of "valid inputs" for > type=3D"text" name=3D"var" />. If page encoding is UTF-8, web browsers >> must return response by UTF-8 encoding. (Unless other encoding is > > I think you're still missing my point. The point is that it is > absolutely irrelevant what browser might or might not do, since PHP does > not have any means to know if browsers even exist. PHP doesn't talk to > browser, it talks to HTTP channel (provided we're in webserver > scenario), what's on the other end is unknown and irrelevant. So there's > no point discussing browsers. It's possible to design web pages/services to "unknown clients", but it's exceptional cases. Exceptions do not negate best practices. If there are cases that should be handled exceptionally, it should be applied to that case only, not in general. Almost all systems have intended clients. If protocol is HTTP/HTTPS, developers may reject strange data that cannot be right for HTTP/HTTPS. Even higher level than PHP does this. i.e. HTTP servers will rejects malformed and/or prohibited request and terminates execution. Web Application Firewall does more fancy things and terminates connection. (It does not even allow to reach web server) If web apps check their requirements and terminate request does not fulfill its requirements wouldn't matter at all. Those who like WAF(Web Application Firewall), they may use WAF to check more web server apps inputs. i.e. WAF filters are designed to check inputs that attack signature and Web Apps does not validate/check, in general. IMHO, use of WAF is more burden and costly than the input validation that I'm proposing. > >> We recently added number of >> php_error_docref(E_ERROR, "Cannot process too large data"); >> in PHP core to avoid possible memory destruction attacks. > > We added it because we didn't have choice. PHP does not have generic > error mechanism that allows to fail an arbitrary function and still > continue execution. It's because PHP is highly complex C code and C is > not the most friendly language out there. Your app is not in C, so it > can do it differently. > > If you talk about such situations, fine, but it's not input validation - > it's limitation of the environment (since PHP can't support arbitrary > length string). If your application has such limitations - fine, but it > would be application-defined and will not apply for most cases of input > validation. Whether it is input or output validation is irrelevant. "Programs terminate for insane input/output", like no available memory(PHP), broken/insane HTTP/HTTPS request(HTTP server), impossible/invalid inputs to Web apps(WAF). My point is "program (or even connection) terminates" everywhere when there is invalid data. Web application developers have right to define "valid" inputs. ("have right" does not mean "can do anything") PHP script termination for invalid input is just one of terminations. It's nothing special. > >> Broken char encoding shouldn't came from legitimate users. Text >> contains CNTRL chars from shouldn't >> come from legitimate users. 1MB data from > name=3D"var" /> shouldn't come from legitimate users. Numeric database >> record ID that is set by app shouldn't contain anything other than >> digits. And so on. > > I think you are mixing abnormal situations due to physical limitations > of software (like memory limits, etc.) with business logic. Numeric > format validation and size limits are clearly business logic. Encoding > may be not, depending on what the input is and used for. I would impose certain limits in "the input validation", but if program must return nice response for any request, then it must be in business logic. I agree that. It's your rule after all. > >> Broken char encoding (Accept only valid encoding) >> NUL, etc control chars in string. (Accept only chars allowed) >> Too long or too short string. e.g. JS validated values and values set >> by server programs like /etc, 100 chars for >> username, 1000 chars for password, empty ID for a database record, >> etc. (Accept only strings within range) > > These all fine filters/validators, and may be very useful in many > situations. What I still don't understand is insistence of application > dropping everything and exiting when one of them fails. We already have > sanitization/filtering infrastructure, we can add new filters and flags > - what I don't understand, why we need parallel infrastructure which > seems to be only different by an unhelpful feature of crashing each time > it sees something unexpected. Am I missing something? I think your premise is "Show nice error message for any errors, proceed as normal case". (Handle invalid/insane data just like mistakes) My premise is "Shouldn't show nice messages to attacker, terminate as abnormal case". (Treat them as attack or serious system bug) It's design choice. Either way is possible. > >> How to deal with bad inputs. >> - You seem you would like to treat as normal input. > > No, you didn't understand. I would like to treat is as erroneous input, > but not stop the application immediately, but return error status to the > business logic and let it sort things out. Now we are close to it! Premise differs so opinion/view differs. My premise is "Client and server have certain rules. Client inputs do not follow rules(requirements) should be treated abnormal cases and shouldn't be treated by business logic". Please note that - Valid input !=3D logically correct or no mistakes A rule could be "an integer may be any valid integers", but developer may/can impose that an int value must be between 0 to 120, for instance. Age 300 can't be true for human age, but if any integer is allowed, this is valid input. > >> When plain is used, users may type in any valid UTF-8 char by mi= stake. >> For example, this wouldn't happen for date field, but autocomplete may >> fill my name "=E5=A4=A7=E5=9E=A3=E9=9D=96=E7=94=B7" to name field that s= upposed to contain alphabets >> only. > > If the software is properly internationalized (like my email client) > there's absolutely nothing wrong with this string. If it is not, it > should check that the text matches its expectations - that's part of > business logic. Error checks should be treated by business logic differs by rules/requirements that developers can impose to client. Since it depends on developer defined rules/requirements, let's talk about what kind of rules/requirements can be defined. > >> If developers try to validate "all inputs", validation in MVC model is >> not efficient nor reasonable. It does not make sense to validate >> browser request headers in db model, for example. Ideally, input >> validation is better to be done as fast as possible to maximize the >> mitigation effect. > > If you use browser headers, you validate them. If you don't use them, no > point validating them, of course, since they are not your inputs. It's ok to design that way. To maximize Input validation mitigation effect, developers are advised to validate "all inputs" regardless of usage in business logic or output code. It may be used in the future or may be used already by some code you don't realize. Let's talk about what could be validated because things cannot be validated at input code do not belong to "the input validation" anyway. We know there are many inputs that could be validated by input code, don't = we? Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net