Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:27034 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 28600 invoked by uid 1010); 16 Dec 2006 04:19:56 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 28585 invoked from network); 16 Dec 2006 04:19:56 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Dec 2006 04:19:56 -0000 Authentication-Results: pb1.pair.com smtp.mail=rui_hirokawa@ybb.ne.jp; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=rui_hirokawa@ybb.ne.jp; sender-id=pass; domainkeys=good Received-SPF: pass (pb1.pair.com: domain ybb.ne.jp designates 124.83.153.129 as permitted sender) DomainKey-Status: good X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: rui_hirokawa@ybb.ne.jp X-Host-Fingerprint: 124.83.153.129 ybbsmtp09.mail.ogk.yahoo.co.jp FreeBSD 4.7-5.2 (or MacOS X 10.2-10.3) (2) Received: from [124.83.153.129] ([124.83.153.129:41452] helo=ybbsmtp09.mail.ogk.yahoo.co.jp) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 8D/2A-10210-8C373854 for ; Fri, 15 Dec 2006 23:19:55 -0500 Received: (qmail 69914 invoked by alias); 16 Dec 2006 04:19:17 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=ybb20050223; d=ybb.ne.jp; b=JMFzw67S/woFxZr3JF9RQWbIN5p9hkprC2OPKdHzB9tc7D/rYDn4V0yk8hjENdgVbABEOoloBqqsPJ2BfVe8qMGyF+0IGAo28uwC2YrrjI0Fk5b4gZmRn3LBtCurS8Uf ; Received: from unknown (HELO ?192.168.1.142?) (219.204.92.5 with poptime) by ybbsmtp09.mail.ogk.yahoo.co.jp with SMTP; 16 Dec 2006 04:19:17 -0000 X-Apparently-From: X-yjpVirusScan: Scanned Date: Sat, 16 Dec 2006 13:19:17 +0900 To: Pierre Cc: "PHP internals" In-Reply-To: References: X-Mailer-Plugin: BkASPil for Becky!2 Ver.2.068 Message-ID: <20061216131857.5001.RUI_HIROKAWA@ybb.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.27 [ja] Subject: Re: [PHP-DEV] php6: input encoding, filter and making JIT really JIT From: rui_hirokawa@ybb.ne.jp (Rui Hirokawa) I think #2 is better than #1. The current implementation of mbstring is based on the solution similar to #1. It is simple and stable, but, #2 has more flexibility. Rui On Thu, 14 Dec 2006 21:59:44 +0100 Pierre wrote: > Hello, > > Yesterday, Ilia, Andrei and I discussed the possible solutions to solve > the input encoding in php6 (unicode). I will try to describe them here. > > I do not go too deep in the details, the goal is to choose one > solution and then propose a patch to test. Our preference goes to > the solution #2. > > -- > Solution #1: > ------------ > The idea here is to detect encoding, encode and register the variable > during the request initialization (before the script gets the hand). > Besides the encoding detection, it is how it works in the actual > implementation (all php versions). > > * Init > - Parse the request into an array. > - locate _charset_ or use unicode.request_encoding > - filter/decode/register the variable like it is done now > > * Runtime > Just like now, the auto_globals (with or without jit) are declared and > ready to be used. > > This solution has one advantage, it requires only a few changes in > the engine. The request processing functions need to be changed > to detect the encoding. > > The main disadvantages are: > - the lack of flexibility, encoding must be set before the script gets > the hand, using vhost config or htaccess > - the possible bad encoding detection will force the user to manually > parse the raw request (when available). > > > Solution #2: add (true) JIT support for GET/POST/COOKIE/... > ------------ > Instead of doing all the precessing during the init phase, it will be > done on demand when a input variable is requested, at runtime. > > * Init > - don't parse the request but simply store it for later processing > > * Runtime > - when a input variable is fetched: > - encoding is defined using unicode.request_encoding > - filter/decode/register the complete array (post,get,...) > > The way JIT works has to be changed. It has to process the data > at runtime instead of register them at compile time. This is the only > way to be sure that the users has set the input encoding correctly > (or has the opportunity to set it). > > The main advantage of this solution is the absence of magic for > the user. The encoding detection can be checked and/or set in time > by the user before the input processing, it is safe and flexible. > > I would also suggest to add a function: filter_input_encoding($type) to > define the encoding type at runtime instead of using ini_set (which is > often disabled). > > There is no real technical disadvantages but requires more work and > changes in the engine. But these changes will also bring some more > performance improvements (if (0) $t = $_ENV['foo']; will not trigger > jit). > > -- > > I would like to hear your ideas, opinions and comments. Especially > about the possible changes in the engine. Feel free to ask more > details if my explanations were unclear :) > > Regards, > --Pierre -- Rui Hirokawa