Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:26965 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 15388 invoked by uid 1010); 14 Dec 2006 21:06:28 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 15373 invoked from network); 14 Dec 2006 21:06:28 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Dec 2006 21:06:28 -0000 Authentication-Results: pb1.pair.com smtp.mail=pierre.php@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=pierre.php@gmail.com; sender-id=pass; domainkeys=good Received-SPF: pass (pb1.pair.com: domain gmail.com designates 66.249.92.170 as permitted sender) DomainKey-Status: good X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: pierre.php@gmail.com X-Host-Fingerprint: 66.249.92.170 ug-out-1314.google.com Linux 2.4/2.6 Received: from [66.249.92.170] ([66.249.92.170:61261] helo=ug-out-1314.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 7C/70-11783-FACB1854 for ; Thu, 14 Dec 2006 16:06:26 -0500 Received: by ug-out-1314.google.com with SMTP id 71so684595ugh for ; Thu, 14 Dec 2006 13:05:48 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:mime-version:content-type:content-transfer-encoding:content-disposition; b=emMqcka58GeGRLGhnh+IpduOqe7HphHY9akG/foiud8MuJNZ1xPrWfoct92H7QrsshVO6vcTdXGoxusixT7TSX6n9wdZKn3shuGmi+KUcktCPqwTvlpzTpMlVgP0t3xo99owju4d/IbjqxmhBvMcET4qET5hxMREkbUiVa6x+g4= Received: by 10.78.151.3 with SMTP id y3mr1149470hud.1166129984628; Thu, 14 Dec 2006 12:59:44 -0800 (PST) Received: by 10.78.123.19 with HTTP; Thu, 14 Dec 2006 12:59:44 -0800 (PST) Message-ID: Date: Thu, 14 Dec 2006 21:59:44 +0100 To: "PHP internals" Cc: "Zeev Suraski" , "Andi Gutmans" , "Dmitry Stogov" , "Rasmus Lerdorf" , "Ilia Alshanetsky" MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: php6: input encoding, filter and making JIT really JIT From: pierre.php@gmail.com (Pierre) Hello, Yesterday, Ilia, Andrei and I discussed the possible solutions to solve the input encoding in php6 (unicode). I will try to describe them here. I do not go too deep in the details, the goal is to choose one solution and then propose a patch to test. Our preference goes to the solution #2. -- Solution #1: ------------ The idea here is to detect encoding, encode and register the variable during the request initialization (before the script gets the hand). Besides the encoding detection, it is how it works in the actual implementation (all php versions). * Init - Parse the request into an array. - locate _charset_ or use unicode.request_encoding - filter/decode/register the variable like it is done now * Runtime Just like now, the auto_globals (with or without jit) are declared and ready to be used. This solution has one advantage, it requires only a few changes in the engine. The request processing functions need to be changed to detect the encoding. The main disadvantages are: - the lack of flexibility, encoding must be set before the script gets the hand, using vhost config or htaccess - the possible bad encoding detection will force the user to manually parse the raw request (when available). Solution #2: add (true) JIT support for GET/POST/COOKIE/... ------------ Instead of doing all the precessing during the init phase, it will be done on demand when a input variable is requested, at runtime. * Init - don't parse the request but simply store it for later processing * Runtime - when a input variable is fetched: - encoding is defined using unicode.request_encoding - filter/decode/register the complete array (post,get,...) The way JIT works has to be changed. It has to process the data at runtime instead of register them at compile time. This is the only way to be sure that the users has set the input encoding correctly (or has the opportunity to set it). The main advantage of this solution is the absence of magic for the user. The encoding detection can be checked and/or set in time by the user before the input processing, it is safe and flexible. I would also suggest to add a function: filter_input_encoding($type) to define the encoding type at runtime instead of using ini_set (which is often disabled). There is no real technical disadvantages but requires more work and changes in the engine. But these changes will also bring some more performance improvements (if (0) $t = $_ENV['foo']; will not trigger jit). -- I would like to hear your ideas, opinions and comments. Especially about the possible changes in the engine. Feel free to ask more details if my explanations were unclear :) Regards, --Pierre