Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:98891 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 81422 invoked from network); 27 Apr 2017 10:00:32 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 27 Apr 2017 10:00:32 -0000 Authentication-Results: pb1.pair.com smtp.mail=et.code@ethome.sk; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=et.code@ethome.sk; sender-id=unknown Received-SPF: error (pb1.pair.com: domain ethome.sk from 92.240.253.144 cause and error) X-PHP-List-Original-Sender: et.code@ethome.sk X-Host-Fingerprint: 92.240.253.144 smtpout6.dnsserver.eu Received: from [92.240.253.144] ([92.240.253.144:31165] helo=smtpout6.dnsserver.eu) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 9C/B7-24892-D31C1095 for ; Thu, 27 Apr 2017 06:00:30 -0400 Received: from [92.240.253.67] (helo=smtp3s109.dnsserver.eu) by smtpout6.dnsserver.eu with esmtp (Exim 4.84 (FreeBSD)) (envelope-from ) id 1d3gDp-0006HF-2Y for internals@lists.php.net; Thu, 27 Apr 2017 12:00:25 +0200 Received: from [80.242.44.220] (helo=eto-mona.office.smartweb.sk) by smtp3s109.dnsserver.eu with esmtpsa (TLSv1.2:AES256-GCM-SHA384:256) (Exim 4.83 (FreeBSD)) (envelope-from ) id 1d3gDq-000F8u-3d for internals@lists.php.net; Thu, 27 Apr 2017 12:00:26 +0200 Date: Thu, 27 Apr 2017 11:50:41 +0200 To: internals@lists.php.net Message-ID: <20170427115041.06339340@eto-mona.office.smartweb.sk> Reply-To: et.code@ethome.sk Organization: ethome.sk MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 80.242.44.220 X-SA-Exim-Mail-From: et.code@ethome.sk X-SA-Exim-Scanned: No (on smtp3s109.dnsserver.eu); SAEximRunCond expanded to false Subject: [RFC] concept: further improvement of filter extension, "generalising" filter definitons while adding new callback filter type From: et.code@ethome.sk ("Martin \"eto\" Misuth") **This is concept for RFC improve filter extension further** Aim of this RFC is to give filter extension small nudge. # Introduction: ## Status quo ### Status quo: input "variables" source control Author of this RFC is one of those "three weird people on the internet", who (for his own php projects) almost completely disable "automagic" request variable registration in php.ini, by setting: variables_order = "CS" request_order = "" or even: variables_order = "S" request_order = "" depending on deployment type. That way, most "magic" variables are not even created. This roughly equates to "by default deny" principle. For example, because REQUEST source for filter extension is not even implemented yet, it is always crystal clear, which "input source" each variable is fetched from. No programmer on team can somehow mix or swap two variable's sources, unless consciously trying. Some processing time is also "shaved off", because interpreter is not doing any (useless) string processing for variables, that don't even make sense for given request handler (php script behind given URL). Each request handler is expected to be fully aware of variables it requires for further processing. ### Status quo: input variable filtering All variables entering the php application are filtered through filter_*() calls, making great use of this extension. For majority of input values, default filters are sufficent and are heavily used, but for some variables, specialty filters are needed. For those, currently provided FILTER_CALLBACK filter is suboptimal, as it occupies whole 'options' field by callable, "breaking generalisation" of filter API and unability to be passed custom options. ### Status quo: 'options' confusion Because of function calls parameter name '$options' and $options array/object field 'options' key/property, there is some confusion among users, on how to "construct" filters. ### Status quo: array/object duality In php, both array and object (properties) are essentially built around core structure of HashTable. From simplistic "viewpoint", object can be seen as an "glorified" array. This is advantage. It allows one to pass object as parameter to functions that accept arrays. Although php is equipped with "interface" and "trait", both are orthogonal and quite useless when object is used "as array". "interface" is missing machinery to express public poperties, "trait" is not standalone entity on it's own. However none of this is problem, if we consider object as special case of array. When object is used as array, in php, one can "intuitively" assume structural like (property based) type system. This is feature. One just needs to pass an object, without having to muck around with interfaces and whatnot. If object has required public properties set, it is processed as such, if not, it's same, as if array, with keys missing or keys having null values, was provided. Magic properties, and others 'specials', are not usually processed. By defining class one can easily enforce required fields to be existing, but null. Many array consuming APIs can consume object of any class. Thanks to this, these APIs ending up pretty general. Unfortunately current filter extension doesn't allow use of objects (instead of array) in all contexts. # Proposed improvements: 1. Introduction of filter 'definition' concept and parameters cleanup. 2. Introduction of new 'callback_extended' filter (while keeping compatibility with old code) 3. Ability to consume both arrays and objects in 'definition' parameters. ## 1. Introduction of filter 'definition' concept and cleanup. Ambigous parameter $type is renamed to $input_source. Parameter $filter is renamed into $filter_or_definition. Parameter $options is renamed to $definition. For each function call where 'int $filter' value is currently passed, new logic for processing $filter_or_definition is employed: Parameter $filter_or_definition can be of type (int), (array) or (object). Filter "usability" validation algorithm is as follows: 1. check if $filter_or_definition is an (int) - if yes, it is expected to be filter_id - if $definition is passed it is processed same way as $options were before 2. check if $filter_or_definition is (array|object) - if yes, check it for property/key (int) $filter - if yes, filter_id is extracted from $filter_or_definition->filter property(or key) and filter "definition" is considered "usable" - internally $definition is made point to $filter_or_definition and $definition from function call list is ignored. 3. for everything else, function call fails Further processing continues as currently, extracting flags and so on. Modification of C function code is minor. Because none of these parameters is named $options anymore, confusion is lessened. Function signatures thus became: filter_has_var(int $input_source, $variable_name) filter_input(int $input_source, string $variable_name, $filter_or_definition, $definition = null) filter_var($variable, $filter_or_definition, $definition = null) filter_input_array(int $input_source, $definition = null, $add_empty = true) filter_var_array(array $data, $definition = null, $add_empty = true) filter_list() filter_input call now has following possible invocations: filter_input(INPUT_GET, 'MY_VAR', FILTER_VALIDATE_BOOLEAN); filter_input(INPUT_GET, 'MY_VAR', FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE]); filter_input(INPUT_GET, 'MY_VAR', ['filter'=>FILTER_VALIDATE_BOOLEAN, 'flags'=>FILTER_NULL_ON_FAILURE, 'options'=>['default'=>false] ]); $dfn = new stdClass(); $dfn->filter = FILTER_VALIDATE_BOOLEAN; $dfn->flags = FILTER_NULL_ON_FAILURE; $dfn->options = new stdClass(); $dfn->options->defaul = false; filter_input(INPUT_GET, 'MY_VAR', $defn); ## 2. Introduction of new 'callback_extended' filter New filter FILTER_CALLBACK_EXTENDED:"callback_extended" is introduced. It expects 'definition' defined as such: $defn = [ 'filter' => (int) FILTER_CALLBACK_EXTENDED, 'flags' => (int) FILTER_NULL_ON_FAILLURE, 'callback' => (callable) $callable_ex, 'options' => [ 'default' => 42, 'min'=> -1, 'max'=> 64, ], ]; This filter has new id (FILTER_CALLBACK_EXTENDED=FILTER_CALLBACK++). Instead of "abusing" field 'options', for storing callable, it inspects 'definition' itself, searching for new field 'callback', that is "outside" of 'options' subcomponent. Field 'options' is passed as is, as second parameter to $callable_ex callable. Thus callable prototype call looks like this: $filtered_value = $callable_ex($value, $options) This design **immensely(!)** simplifies development of 'per input variable type' configurable callback filters. It also allows user to tie everything related to variable filtering, validation and sanitization with single unified API interface provided by filter extension. In essence, in case of FILTER_CALLBACK_EXTENDED, filter identity is unique value, actually composed from two subvalues: filter_id (FILTER_CALLBACK_EXTENDED) and $callback callable signature. Besides for allowing "huge from" processors using nested $definition array like in case of filter_input_array() API, it allows other, much more flexible uses. For example, if using objects to store filter 'definitions', highly, expressive, "composable" and reusable "filter libraries" can be constructed: $def_v1 = (object) [ 'filter' => FILTER_CALLBACK_EXTENDED, 'callback' => $filter_v1_handler, 'options' => (object) ['x'=>1,'y'=>2], ]; $def_v2 = (object) [ 'filter' => FILTER_CALLBACK_EXTENDED, 'callback' => $filter_v2_handler, ]; $def_s1 = (object) [...]; $def_s2 = (object) [...]; $usr_validating_filters = [UVFLT_1=>$def_v1, UVFLT_2=>$def_v2]; $usr_sanitizing_filters = [USFLT_2=>$def_s1, USFLT_2=>$def_s2]; filter_input(INPUT_GET, 'MY_VAR', $usr_validating_filters[UVFLT_1]); By moving callack's callable storage outside of 'options' component, proper semantic separation is achieved, and sensible hierarchy of filter 'definition' is maintained, while at the same time, callback is allowed much needed, invocation customisations. Actual implementation is relatively straight forward. Requiring only one new internal function addition, while reusing much of the filter extension machinery already present (with slight modification). ## 3. Ability to consume both arrays and objects in 'definition' parameters. Extension code was reread, and what could be called 'definition' processing, was modified, to allow both array and object consumption, by means of HASH_OF() macro. # Conclusion Experimental implementation seems pretty usable, passing all current ext/filter/tests (with small modifications due to modified semantics). More experiments are to be done, especially stress testing memory access for usage and corruption. So far debug+maintainer-zts builds have not found problems. Logic, usability and compatibility was prioritised over performance. Still, some small performance gains might be actually observed, as prameter parsing was converted to FAST_ZPP. Especially for high cadence of successive filter_has_var(). However no effort was done on this front. Compared to advantages gained, code changes are relatively minor. Attempt was made to maintain backwards compatibility, when using FILTER_CALLBACK, although users will be suggested to "upgrade" to FILTER_CALLBACK_EXTENDED. Hidden errors in legacy scripts, due to change of $filter (now $filter_or_definition) prameter processing, are not evaluated, and are considered severe bugs anyway. $filter should have been an (int). Reflection API using sniffers will break (if they expect certain filter API layout), but that is expected (or should be expected), by reflection consumers and thus is not considered a problem. Nobody should be, probably, using Reflection API to drive 'decision tree' in production code, invoked several hundreaths (or thousands) requests per second. Should this RFC pass, filter documentation is going to be updated to match new semantics. --------------------------------------------------------------------------- By posting this draft, I am asking for comments. Should this draft be considered worth inclusion among RFCs, I am asking for karma, to be able to add it into wiki. After that, git fork will be provided, for reviewers, to evaluate the code. After successful review, I am asking for final voting. My intended upstream inclusion target window is "before" PHP_7.2. However I am not interested into speed of inclusion as much, as I am in sensibly improving quality of (awesome) filter extrension. It would be great, if it went through, given advantages it has for userland consumers. Thank you for reading and consideration, in advance. eto