Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:98895 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 35126 invoked from network); 28 Apr 2017 02:59:00 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 28 Apr 2017 02:59:00 -0000 Authentication-Results: pb1.pair.com smtp.mail=et.code@ethome.sk; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=et.code@ethome.sk; sender-id=unknown Received-SPF: error (pb1.pair.com: domain ethome.sk from 92.240.253.144 cause and error) X-PHP-List-Original-Sender: et.code@ethome.sk X-Host-Fingerprint: 92.240.253.144 smtpout6.dnsserver.eu Received: from [92.240.253.144] ([92.240.253.144:17479] helo=smtpout6.dnsserver.eu) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 25/11-14645-2FFA2095 for ; Thu, 27 Apr 2017 22:58:59 -0400 Received: from [92.240.253.67] (helo=smtp3s109.dnsserver.eu) by smtpout6.dnsserver.eu with esmtp (Exim 4.84 (FreeBSD)) (envelope-from ) id 1d3w7S-000FVY-V6 for internals@lists.php.net; Fri, 28 Apr 2017 04:58:55 +0200 Received: from [80.242.44.220] (helo=eto-mona.office.smartweb.sk) by smtp3s109.dnsserver.eu with esmtpsa (TLSv1.2:AES256-GCM-SHA384:256) (Exim 4.83 (FreeBSD)) (envelope-from ) id 1d3w7T-0008iM-44 for internals@lists.php.net; Fri, 28 Apr 2017 04:58:55 +0200 Date: Fri, 28 Apr 2017 04:49:10 +0200 To: internals@lists.php.net Message-ID: <20170428044910.01fc1d66@eto-mona.office.smartweb.sk> In-Reply-To: References: <20170427115041.06339340@eto-mona.office.smartweb.sk> Reply-To: et.code@ethome.sk Organization: ethome.sk MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 80.242.44.220 X-SA-Exim-Mail-From: et.code@ethome.sk X-SA-Exim-Scanned: No (on smtp3s109.dnsserver.eu); SAEximRunCond expanded to false Subject: Re: [PHP-DEV] [RFC] concept: further improvement of filter extension, "generalising" filter definitons while adding new callback filter type From: et.code@ethome.sk ("Martin \"eto\" Misuth") > On 27 April 2017 at 10:50, Martin "eto" Misuth wrote: > > > > By posting this draft, I am asking for comments. > > > > What is the argument for doing this as part of PHP core, Sorry, when asked to explain myself, I am prone to "infodumping" and am often told that I do sound quite "combative" as if having low selfesteem. While that is not my conscious intention, I apologise, if this is hard to read, due to that. > rather than doing it as a userland library? > > cheers > Dan > > (apologies for possible duplicate reply) > We actually have userland library exactly like that. By library I mean "shelves" of functions doing various things to input vars. I believe my proposal would make maintaining such library much easier while making filter extension "better", still keeping backward compatibility at the same time. First I will explain our situation: We have "CMS system" operating several thousand domains by now. Some domains have thousands of subpages and relatively big images. Origins date back more than decade. System is currently operating at php 7.1. As such, it is not really that "big", but it's not that "small" either. It acts somewhat like "static site" generator, but with a twist. Instead of using markdown or other more "current" templating language, the whole thing is built on top of "old" xml data files and xlst preprocesor. More "dynamic" data are stored as json fragments. At administration side, php acts like kind of mixer/preprocessor. It prepares xml variables, xml data files, json data files and db sources and transforms the data through xslt templates into "the site". Pages are processed in batches. Even though xslt pipeline might not be fastest there is, the point is, eventually static site output is generated (with few special dynamic handlers). This is "published" to "outside servers". There, data is "handed out" to clients very quickly, but naturally some requests still have to be processed. Or said other way: what can be served statically is served as such, with dynamic stuff handling only special cases. This dynamic stuff includes ajax handlers, form handlers and search. Xml and xslt combination proved to be extremely resilient, withstood test of time, and survived various webdesign trend shifts very well. Many customers have some xml data files (containing content) with mtimes dating several years back. Users have completely dynamic web administration at their disposal, that allows them to construct sites, point and click. One of the major features is ability to build arbitrary forms. As you can imagine, relatively simple xml data file, with 50-200 "virtual controls", can be very easily "blown up" by xslt processor, into quite complex html markup containining ajaxy controls, "subwidgets", plenty of css doodads and whatnot, scattering various parts of "the thing" into various targets (markup body, head, external scriptfiles, cssfiles, **php array filter template**, and so on). Thus output forms can get very complex, very quickly. Major customers, that are getting major hits, as well, also love to make these things huge (them being so easy to make). Most handling of this output naturally happens on client side, at least until one submits. For that, we implemented universal form handler that does nothing more, than processing these (sometimes humongous) forms, using 'filter definitions' cached in deeply nested arrays. On submit it does also does subtitutions in final output template. Due to mentioned user editability, any form can end up having many specialty validators, that have to be implemented using callbacks. These validators and sanitizers are, for example, for zip codes of various countries (these things are not compatible with each other and each one can be special snowflake) their various display modes, of for various resource schemes and stuff like that. So that is the current situation. I am excusing for this overly verbose description, but I hope, it conveyed our situation as clearly as possible. Whole system is not perfect, but it's not completely horrible either, there seem to be something to it. Now, the thing is: Even if I store "form definition" arrays in some memory cache (I tried apcu and ramdisk), so that their instantiation is relatively quick, at rate of requests we are getting, I see with debugging output enabled, that we are constantly marshalling and unmarshalling things from definition arrays (roughly mapping to xml tree), to accomodate various filter quirks (especially callback). I am reminding this is happening on "ouput" side of things, where there is no reason to burn cycles in generating markup, unless form is completely validated (for confirmation display). It would be immensely valuable to me, if we could have filter extension just intelligent enough, that it would process all request data according to template, from each input source in succession, **on it's own**, **in one go** (for each input source), only occasionally consulting callback, instead of building layer of hacks, on top of it (filter), that are marshaling and unmarshaling data from "nice" structure to "badly nested one", using foreach, array_walk or whatever. I believe this '$definition concept' + 'callback_extended' handler allows one to do exactly this. This way, filter can walk the hierarchy on it's own, while same filter definition "object" (it's really just hierarchy of hashes) can be passed to filter_input|var() directly. Anyway, I also believe, this modified behaviour would be useful to other filtering library writers as well, otherwise I wouldn't, naturally, bother you here. Some filter based, userland libraries are ridiculous (using classes with FILTER_CALLBACK_METHODS_LIKE_THIS, or massively abusing regexps (with catastrophic backtracking and such), just because callback (I came to believe) sucks). It doesn't mix well with other filters that can take options and flags. Also there is no way to "pull" filter being currently used at given defintion array index and give it to singular call to filter_var() (like in case of ajax) without transformation of it's structure. Although default filters are pretty powerful on their own, at certain point, you must resort to hacks like that, to cope with specialty corner cases. If anything, $filter_or_definition is single Z_TYPE_P(filter_or_definition) == IS_LONG test, keeping filter stuff working as before. Once processing get past that, I don't know what is worse: whether to search for filter_id in hash on C side or build marshalling layer on top of it in php. In my eyes it's the php userland layering that is wrong place for this. Experimenting with this, I was surprised, how relatively small amount of C level modification was needed, to blow off many lines of my php hacks. But I consider myself pretty dumb person, so there is high possiblity that I am, doing something wrong or seeing this in wrong light. I am not personally confident about my approach either, but I honestly do believe, that overall concept is pretty sound. As I am no wizard coder either, I would be very grateful for reviewers. If general opnion is that this is crappy approach overall, I am okay with being told NO. No problem. Anyway this sums my opinion and rationale. I hope you are still here. As you see I am, by default, ready for negative conclusion regarding this rfc draft, but I am also curious, why this might be perceived as problematic? Is it because of potential compatability breakage or it goes against spirit of filter extension too much? Thank you for your read even, and any reply in advance, and sorry for long post, hoping it answers most questions. eto