Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:27053 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 83535 invoked by uid 1010); 16 Dec 2006 22:45:19 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 83520 invoked from network); 16 Dec 2006 22:45:19 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Dec 2006 22:45:19 -0000 Authentication-Results: pb1.pair.com smtp.mail=andrei@gravitonic.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=andrei@gravitonic.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain gravitonic.com from 204.11.219.139 cause and error) X-PHP-List-Original-Sender: andrei@gravitonic.com X-Host-Fingerprint: 204.11.219.139 lerdorf.com Linux 2.5 (sometimes 2.4) (4) Received: from [204.11.219.139] ([204.11.219.139:44432] helo=lerdorf.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 03/7A-22993-8B674854 for ; Sat, 16 Dec 2006 17:44:43 -0500 Received: from [192.168.11.2] (c-24-6-96-18.hsd1.ca.comcast.net [24.6.96.18]) (authenticated bits=0) by lerdorf.com (8.13.8/8.13.8/Debian-3) with ESMTP id kBGMhtmo017377; Sat, 16 Dec 2006 14:43:55 -0800 In-Reply-To: <45838527.3050606@lerdorf.com> References: <20061216131857.5001.RUI_HIROKAWA@ybb.ne.jp> <45838527.3050606@lerdorf.com> Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-ID: <8E9680A2-042A-4C54-9E64-2522CCB9D4B9@gravitonic.com> Cc: Rui Hirokawa , Pierre , PHP internals Content-Transfer-Encoding: 7bit Date: Sat, 16 Dec 2006 14:43:54 -0800 To: Rasmus Lerdorf X-Mailer: Apple Mail (2.752.2) Subject: Re: [PHP-DEV] php6: input encoding, filter and making JIT really JIT From: andrei@gravitonic.com (Andrei Zmievski) Pursuant an IRC discussion with Rasmus. It seems to be that in order to do any sort of error differentiation we need to have a variable-level JIT decoding/filtering. It needs to be smart though, because we want to issue errors only on the first access to the variable. One way to approach this would be to decode/ filter the $_POST['foo'] value when it's accessed and then replace the $_POST['foo'] with this filtered result so that the next access gets the value directly, without invoking the JIT mechanism. -Andrei On Dec 15, 2006, at 9:33 PM, Rasmus Lerdorf wrote: > The main issue, as I already discussed with Andrei (sorry, our > discussions are stealth since I see him almost every day even though I > try hard to avoid him) is how we handle encoding errors if we jit at > runtime and process the entire array at that time. I agree that > this is > architecturally the right approach, but if someone injects some bogus > GET data, for example, even though the app doesn't even try to access > it, it is going to be encoded when the app tries to get at the > first GET > arg and at that point there would be an error if that extra GET > data was > bogus. > > We obviously don't want it to be possible to arbitrarily create errors > like that, but at the same time it needs to be possible for the > application to discover encoding errors. So we probably need to make > the error handling pretty smart. For example, treat errors > encoding the > actual entry they are trying to access as more serious than an error > encoding another element that just happened to be encoded at that > point. > And then later if they try to access a previously encoded element > that > had an error throw the more serious error at that point. Or something > along those lines. > > I suppose we could also jit right down to the single element level and > not actually do the entire array on the first access to that GPC > array. > > -Rasmus > > Rui Hirokawa wrote: >> I think #2 is better than #1. >> The current implementation of mbstring is based on the solution >> similar >> to #1. It is simple and stable, but, #2 has more flexibility. >> >> Rui >> >> On Thu, 14 Dec 2006 21:59:44 +0100 >> Pierre wrote: >> >>> Hello, >>> >>> Yesterday, Ilia, Andrei and I discussed the possible solutions to >>> solve >>> the input encoding in php6 (unicode). I will try to describe them >>> here. >>> >>> I do not go too deep in the details, the goal is to choose one >>> solution and then propose a patch to test. Our preference goes to >>> the solution #2. >>> >>> -- >>> Solution #1: >>> ------------ >>> The idea here is to detect encoding, encode and register the >>> variable >>> during the request initialization (before the script gets the hand). >>> Besides the encoding detection, it is how it works in the actual >>> implementation (all php versions). >>> >>> * Init >>> - Parse the request into an array. >>> - locate _charset_ or use unicode.request_encoding >>> - filter/decode/register the variable like it is done now >>> >>> * Runtime >>> Just like now, the auto_globals (with or without jit) are >>> declared and >>> ready to be used. >>> >>> This solution has one advantage, it requires only a few changes in >>> the engine. The request processing functions need to be changed >>> to detect the encoding. >>> >>> The main disadvantages are: >>> - the lack of flexibility, encoding must be set before the script >>> gets >>> the hand, using vhost config or htaccess >>> - the possible bad encoding detection will force the user to >>> manually >>> parse the raw request (when available). >>> >>> >>> Solution #2: add (true) JIT support for GET/POST/COOKIE/... >>> ------------ >>> Instead of doing all the precessing during the init phase, it >>> will be >>> done on demand when a input variable is requested, at runtime. >>> >>> * Init >>> - don't parse the request but simply store it for later processing >>> >>> * Runtime >>> - when a input variable is fetched: >>> - encoding is defined using unicode.request_encoding >>> - filter/decode/register the complete array (post,get,...) >>> >>> The way JIT works has to be changed. It has to process the data >>> at runtime instead of register them at compile time. This is the >>> only >>> way to be sure that the users has set the input encoding correctly >>> (or has the opportunity to set it). >>> >>> The main advantage of this solution is the absence of magic for >>> the user. The encoding detection can be checked and/or set in time >>> by the user before the input processing, it is safe and flexible. >>> >>> I would also suggest to add a function: filter_input_encoding >>> ($type) to >>> define the encoding type at runtime instead of using ini_set >>> (which is >>> often disabled). >>> >>> There is no real technical disadvantages but requires more work and >>> changes in the engine. But these changes will also bring some more >>> performance improvements (if (0) $t = $_ENV['foo']; will not trigger >>> jit). >>> >>> -- >>> >>> I would like to hear your ideas, opinions and comments. Especially >>> about the possible changes in the engine. Feel free to ask more >>> details if my explanations were unclear :) >>> >>> Regards, >>> --Pierre >> > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php