Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:26966 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 57482 invoked by uid 1010); 14 Dec 2006 23:15:48 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 57467 invoked from network); 14 Dec 2006 23:15:48 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Dec 2006 23:15:48 -0000 Authentication-Results: pb1.pair.com smtp.mail=andrei@gravitonic.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=andrei@gravitonic.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain gravitonic.com from 204.11.219.139 cause and error) X-PHP-List-Original-Sender: andrei@gravitonic.com X-Host-Fingerprint: 204.11.219.139 lerdorf.com Linux 2.5 (sometimes 2.4) (4) Received: from [204.11.219.139] ([204.11.219.139:44875] helo=lerdorf.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D5/92-40078-10BD1854 for ; Thu, 14 Dec 2006 18:15:48 -0500 Received: from [192.168.11.2] (c-24-6-96-18.hsd1.ca.comcast.net [24.6.96.18]) (authenticated bits=0) by lerdorf.com (8.13.8/8.13.8/Debian-3) with ESMTP id kBENEsAX001530; Thu, 14 Dec 2006 15:14:54 -0800 In-Reply-To: References: Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-ID: <5CAD781F-B625-49FD-A2BF-C986F50D4558@gravitonic.com> Cc: "PHP internals" , "Zeev Suraski" , "Andi Gutmans" , "Dmitry Stogov" , "Rasmus Lerdorf" , "Ilia Alshanetsky" Content-Transfer-Encoding: 7bit Date: Thu, 14 Dec 2006 15:14:52 -0800 To: Pierre X-Mailer: Apple Mail (2.752.2) Subject: Re: [PHP-DEV] php6: input encoding, filter and making JIT really JIT From: andrei@gravitonic.com (Andrei Zmievski) On Dec 14, 2006, at 12:59 PM, Pierre wrote: > The main disadvantages are: > - the lack of flexibility, encoding must be set before the script gets > the hand, using vhost config or htaccess > - the possible bad encoding detection will force the user to manually > parse the raw request (when available). Also: - no way to issue an error if conversion fails except by setting a flag that has to be retrieved with a function - much harder to get to _charset_ if it's at the end of the request > * Init > - don't parse the request but simply store it for later processing We can still parse the data, we just can't decode it. Parsing would populate arrays (internal or otherwise) with the binary data that can later be decoded and filtered in JIT fashion. > * Runtime > - when a input variable is fetched: > - encoding is defined using unicode.request_encoding Or via _charset_, or provided by user, etc. > The main advantage of this solution is the absence of magic for > the user. The encoding detection can be checked and/or set in time > by the user before the input processing, it is safe and flexible. And we can issue errors in consistent fashion. > There is no real technical disadvantages but requires more work and > changes in the engine. But these changes will also bring some more > performance improvements (if (0) $t = $_ENV['foo']; will not trigger > jit). I guess we need to know how hard it would be to implement runtime JIT for GET/POST/COOKIE registration. -Andrei >