Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:55467 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 90137 invoked from network); 16 Sep 2011 08:07:06 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Sep 2011 08:07:06 -0000 Authentication-Results: pb1.pair.com header.from=tyra3l@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=tyra3l@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.213.42 as permitted sender) X-PHP-List-Original-Sender: tyra3l@gmail.com X-Host-Fingerprint: 209.85.213.42 mail-yw0-f42.google.com Received: from [209.85.213.42] ([209.85.213.42:61634] helo=mail-yw0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 4E/3A-20496-9A3037E4 for ; Fri, 16 Sep 2011 04:07:06 -0400 Received: by ywa8 with SMTP id 8so3172494ywa.29 for ; Fri, 16 Sep 2011 01:07:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=n6bvvJNDSDlIUX16Gv6RRRVGkDy9N00p/8m5V6RQOeA=; b=BN/T/PknmsYRhguNkVrAXSF5qzkL5l3PlsQTP2sSAqxEQ+6luyflZ39iHAe6clhP0y +N0aJWJnbEWM5hMpWwf+qEVAFgKV7BXJfnqUXQgTukRo9jNO2b5zKz7CQlHp6ku8ADVI ZwNJeohqTiVwCK7uaKr+9plA4Cvb+ka3l2lNU= MIME-Version: 1.0 Received: by 10.147.9.5 with SMTP id m5mr1749078yai.18.1316160422666; Fri, 16 Sep 2011 01:07:02 -0700 (PDT) Received: by 10.147.125.13 with HTTP; Fri, 16 Sep 2011 01:07:02 -0700 (PDT) In-Reply-To: References: <4E6FB55E.4060906@oracle.com> Date: Fri, 16 Sep 2011 10:07:02 +0200 Message-ID: To: Hannes Magnusson Cc: Nikita Popov , Christopher Jones , internals@lists.php.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Revert Tokenizer behavior for 5.4 From: tyra3l@gmail.com (Ferenc Kovacs) > > Wait wait wait. Thats the point here? > __COMPILER_HALT_OFFSET__ already tells you where the data starts. > > -Hannes > I didn't sent this message first, but after reading the mail from Chris, I think maybe it would clear the confusion: It is about tokenizing a file which has __halt_compiler(); in it. before the fix of the original bugreport, one could get the warning "Unexpected character in input" if he tried to token_get_all() a script which had binary data after the __halt_compiler(); iliaa's fix was trivial: break from the tokenizer if __halt_compiler token is found. but it isn't good enough, because as the original bugreporter pointed out: 1, now the token_get_all() won't return the (); after __halt_compiler, which means that if you rebuild the code from the tokens, you will have invalid php code. 2, you have no way to get the binary data after the __halt_compiler via the tokenizer, so you can't rebuild the original file using only the tokenizer. (for example one could use the tokenizer to strip the whitespaces and comments from a given file in-place) both problems could be hacked around from userland, but imo it still worth fixing those. --=20 Ferenc Kov=C3=A1cs @Tyr43l - http://tyrael.hu