Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:100438 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 37699 invoked from network); 7 Sep 2017 11:50:16 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 7 Sep 2017 11:50:16 -0000 Authentication-Results: pb1.pair.com header.from=yohgaki@ohgaki.net; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=yohgaki@ohgaki.net; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain ohgaki.net designates 180.42.98.130 as permitted sender) X-PHP-List-Original-Sender: yohgaki@ohgaki.net X-Host-Fingerprint: 180.42.98.130 ns1.es-i.jp Received: from [180.42.98.130] ([180.42.98.130:33006] helo=es-i.jp) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 24/AF-10715-57231B95 for ; Thu, 07 Sep 2017 07:50:16 -0400 Received: (qmail 84164 invoked by uid 89); 7 Sep 2017 11:50:10 -0000 Received: from unknown (HELO mail-io0-f171.google.com) (yohgaki@ohgaki.net@209.85.223.171) by 0 with ESMTPA; 7 Sep 2017 11:50:10 -0000 Received: by mail-io0-f171.google.com with SMTP id b142so3578241ioe.1 for ; Thu, 07 Sep 2017 04:50:09 -0700 (PDT) X-Gm-Message-State: AHPjjUg3RT1ivv1FXBmoSvX3vq+TS3gObUw87VbVhfH20ohjsjim+e86 AbeEG2cmM2ZV3yUIdrYCSqXJXIC9qw== X-Google-Smtp-Source: ADKCNb6QzXb/LSJpKpc7yTNBGrEgcin+ARf9elUJ35nMPNzldNpYIQizlBP31Vyg40fl/1Nys1KFiOhkqRYddMv2uXo= X-Received: by 10.107.180.131 with SMTP id d125mr2904105iof.131.1504785003432; Thu, 07 Sep 2017 04:50:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.72.5 with HTTP; Thu, 7 Sep 2017 04:49:22 -0700 (PDT) In-Reply-To: <3C.DD.10715.4E501B95@pb1.pair.com> References: <0C7F986C-B0BC-4315-98ED-B4FD003B9399@gmail.com> <2a4491b4-e6f5-4297-beec-363f373a93e6@lsces.co.uk> <3f8be7b1-0e59-21c6-4fe8-8299b2c05645@rhsoft.net> <6ba62d62-f1ab-9e7b-93f0-a1a9238c47a6@lsces.co.uk> <0db9cfa3-2b31-ee41-713c-889b7cc06406@lsces.co.uk> <3C.DD.10715.4E501B95@pb1.pair.com> Date: Thu, 7 Sep 2017 20:49:22 +0900 X-Gmail-Original-Message-ID: Message-ID: To: Tony Marston Cc: "internals@lists.php.net" Content-Type: multipart/alternative; boundary="001a114f890a97d0960558980fd7" Subject: Re: [PHP-DEV] A validator module for PHP7 From: yohgaki@ohgaki.net (Yasuo Ohgaki) --001a114f890a97d0960558980fd7 Content-Type: text/plain; charset="UTF-8" Hi Tony, On Thu, Sep 7, 2017 at 5:40 PM, Tony Marston wrote: > "Dan Ackroyd" wrote in message news:CA+kxMuSL1kEW60S7DFJb06+r > 2Q3rC1ueeWU1jAP78FY65aJoDg@mail.gmail.com... > >> >> On 6 September 2017 at 13:31, Rowan Collins >> wrote: >> >>> I'm going to assume that the code you posted was something of a straw >>> man, and you're not actually advocating people copy 20 lines of code for >>> every variable they want to validate. >>> >> >> You assume wrong. No it's not, and yes I am. >> >> I can point a junior developer at the function and they can understand it. >> >> If I ask that junior developer to add an extra rule that doesn't >> currently exist, they can without having to dive into a full library >> of validation code. >> >> If I need to modify the validation based on extra input (e.g whether >> the user has already made several purchases, or whether they're a >> brand new signup), it's trivial to add that to the function. >> >> This is one of the times where code re-use through copying and pasting >> is far superior to trying to make stuff "simple" by going through an >> array based 'specification'. It turns out that that doesn't save much >> time to begin with, and then becomes hard to manage when your >> requirements get more complication. >> > > As a person who has been developing database applications for several > decades and with PHP since 2003 I'd like to chip in with my 2 cent's worth. > Firstly I agree with Dan's statement: > > This type of library should be done in PHP, not in C. > > Secondly, there is absolutely no way that you can construct a standard > library which can execute all the possible validation rules that may exist. > In my not inconsiderable experience there are two types of validation: > 1) Primary validation, where each field is validated against the column > specifications in the database to ensure that the value can be written to > that column without causing an error. For example this checks that a number > is a number, a data is a date, a required field is not null, etc. > 2) Secondary validation, where additional validation/business rules are > applied such as comparing the values from several fields. For example, to > check that START_DATE is not later tyhan END_DATE. > > Primary validation is easy to automate. I have a separate class for each > database table, and each class contains an array of field specifications. > This is never written by hand as it is produced by my Data Dictionary which > imports data from the database schema then exports that data in the form of > table class files and table structure files. When data is sent to a table > class for inserting or updating in the database I have written a standard > validation procedure which takes two arrays - an array of field=value pairs > and a array of field=specifications - and then checks that each field > conforms to its specifications. This validation procedure is built into the > framework and executed automatically before any data is written to the > database, so requires absolutely no intervention by the developer. > > Secondary validation cannot be automated, so it requires additional code > to be inserted into the relevant validation method. There are several of > these which are defined in my abstract table class and which are executed > automatically at a predetermined point in the processing cycle. These > methods are defined in the abstract class but are empty. If specific code > is required then the empty class can be copied from the abstract class to > the concrete class where it can be filled with the necessary code. > > If there are any developers out there who are still writing code to > perform primary validation then you may learn something from my > implementation. > > If there are any developers out there who think that secondary validation > can be automated I can only say "dream on". > Please let me explain rationale behind input validation at outermost trust boundary. There are 3 reasons why I would like propose the validation. All of 3 requires validation at outermost trust boundary. 1. Security reasons Input validation should be done with Fail Fast manner. 2. Design by Contract (DbC or Contract Programming) In order DbC to work, validations at outermost boundary is mandatory. With DbC, all inputs are validated inside functions/methods to make sure correct program executions. However, almost all checks (in fact, all checks done by DbC support) are disabled for production. How to make sure program works correctly? All inputs data must be validated at outermost boundary when DbC is disabled. Otherwise, DbC may not work. (DbC is supposed to achieve both secure and efficient code execution.) 3. Native PHP Types Although my validate module is designed not to do unwanted conversions, but it converts basic types to PHP native types by default. (This can be disabled) With this conversion at outermost trust boundary, native PHP type works fluently. Although, my current primary goal is 1, but 2 and 3 is important as well. 2 is important especially. Providing DbC without proper basic validation feature does not make much sense, and could be disaster. Users may validate input with their own validation library, but my guess is pessimistic. User wouldn't do proper validation due to too loose validation libraries and rules. There are too few validators that do true validations that meet requirements for 1 and 2. IMHO, even if there are good enough validators, PHP should provide usable validator for core features. (DbC is not implemented, though) I hope you understand my intentions and accept the feature in core. Feature for core should be in core. IMO. > 1) Primary validation, where each field is validated against the column specifications in the database to ensure that the value can be written to that column without causing an error. For example this checks that a number is a number, a data is a date, a required field is not null, etc. > 2) Secondary validation, where additional validation/business rules are applied such as comparing the values from several fields. For example, to check that START_DATE is not later than END_DATE. Validation rules for input, logic and database may differ. Suppose you validate "user comment" data. Input: 0 - 10240 bytes - Input might have to allow larger size than logic. i.e. lacks client side validation. Logic: 10 - 1024 bytes - Logic may require smaller range as correct data. Database: 0 - 102400 bytes - Database may allow much larger size for future extension. Under ideal situation, all of these may be the same but they are not in real world. I wouldn't aim to consolidate all validations, but I would like to avoid unnecessary incompatibilities so that different validations can cooperate if it is possible. I'm very interested in PDO level validation because SQLite3 could be very dangerous. (i.e. Type affinity allows store strings in int/float/date/etc) It may be useful if PDO can simply use "validate" module's rule or API. BTW, Input validation should only validate format(used char, length, range, encoding) if we follow single responsibility principle. Logical correctness is upto logic. i.e. Model in MVC. Anyway, goal is providing usable basic validator for core features and security. Required trade offs may be allowed. Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net --001a114f890a97d0960558980fd7--